Dialog State Tracking Challenge

A research community challenge task for accurately estimating a user's goal in a spoken dialog system


Dialog state tracking challenge 2 is underway.  Information is available at the DSTC2 homepage. 

The page below is for the dialog state tracking challenge 2013, which has completed.  9 teams entered a total of 27 entries. Thanks again to all of the teams for making the challenge a success!

The materials from the challenge -- data, evaluation tools, and tracker output -- will remain available to the research community.  The test data now includes labels -- at the time of the challenge evaluation, the labels were not included.  The results of the evaluation and raw tracker output are available on the right, with teams anonymized.

For users of the data

If you use the data in your work (for dialog state tracking or anything else), we encourage you to cite this overview paper, which describes the data in detail:

  • Jason D. Williams, Antoine Raux, Deepak Ramachandran, and Alan Black.  2013.  The Dialog State Tracking Challenge.  In Proceedings 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), Metz, France.  [PDF]

Background and motivation

In dialog systems, "state tracking" – sometimes also called "belief tracking" – refers to accurately estimating the user’s goal as a dialog progresses. Accurate state tracking is desirable because it provides robustness to errors in speech recognition, and helps reduce ambiguity inherent in language within a temporal process like dialog. Dialog state tracking is an important problem for both traditional uni-modal dialog systems, as well as speech-enabled multi-modal dialog systems on mobile devices, on tablet computers, and in automobiles.

Recently, a host of models have been proposed for dialog state tracking. However, comparisons among models are rare, and different research groups use different data from disparate domains. Moreover, there is currently no common dataset which enables off-line dialog state tracking experiments, so newcomers to the area must first collect dialog data, which is expensive and time-consuming, or resort to simulated dialog data, which can be unreliable. All of these issues hinder advancing the state-of-the-art.

Challenge overview

In this challenge, participants will use a provided set of labeled dialogs to develop a dialog state tracking algorithm. Algorithm will then be evaluated on a common set of held-out dialogs, to enable comparisons [2].

The data for this challenge will be taken from the Spoken Dialog Challenge [1], which consists of human/machine spoken dialogs with real users (not usability subjects). Before the start of the challenge, a draft of the labeling guide and evaluation measures will be published, and comments will be invited from the community. The organizers will then perform the labeling.

At the start of the challenge – the development phase – participants will be provided with a training set of transcribed and labeled dialogs. Participants will also be given code that implements the evaluation measurements. Participants will then have several months to optimize their algorithms.

At the end of the challenge, participants will be given an untranscribed and unlabeled test set, and a short period to run their algorithm against the test set. Participants will submit their algorithms’ output to the organizers, who will then perform the evaluation. After the challenge, the test set transcriptions and labels will be made public.

Results of the evaluation will be reported to the community. In this challenge, the identities of participants will not be made public in any written results by the organizers or participants, except that participants may identify themselves (only) in their own written results.


1 July 2012 Beginning of comment period on labeling and evaluation metrics 
4-6 July 2012 YRRSDS and SigDial in Korea: announcement of comment period 
17 August 2012 End of comment period on labeling and evaluation metrics 
31 August 2012 Evaluation metrics and labeling guide published; labeling begins 
10 December 2012 Labeling ends; data available; challenge begins (14 weeks) 
3-5 December 2012 IEEE SLT 2012 (Miami): Information session, 5 Dec 3 PM
22 March 2013 Final system due; evaluation begins (1 week) 
29 March 2013

Evaluation output due to organizers

5 April 2013 Results sent to teams
3 May 2013 SigDial paper deadline; write up (4 weeks)
August 2013 SigDial 2013 


Jason Williams, Microsoft Research, USA (chair)
Alan Black, Carnegie Mellon University, USA
Deepak Ramachandran, Honda Research Institute, USA
Antoine Raux, Honda Research Institute, USA

Advisory committee

Daniel Boies, Microsoft, Canada
Paul Crook, Microsoft, USA
Maxine Eskenazi, Carnegie Mellon University, USA
Milica Gasic, University of Cambridge, UK
Dilek Hakkani-Tur, Microsoft, USA
Helen Hastie, Heriot Watt University, UK
Kee-Eung Kim, KAIST, Korea
Ian Lane, Carnegie Mellon University, USA
Sungjin Lee, Carnegie Mellon University, USA
Teruhisa Misu, NICT, Japan
Olivier Pietquin, SUPELEC, France
Joelle Pineau, McGill University, Canada
Blaise Thomson, University of Cambridge, UK
David Traum, USC Institute for Creative Technologies, USA
Luke Zettlemoyer, University of Washington, USA


[1] A. W. Black, S. Burger, A. Conkie, H. Hastie, S. Keizer, O. Lemon, N. Merigaud, G. Parent, G. Schubiner, B. Thomson, J. D. Williams, K. Yu, S. Young and M. Eskenazi. (2011) Spoken Dialog Challenge 2010: Comparison of Live and Control Test Results . SIGDial 2011.

[2] J. D. Williams. (2012) A belief tracking challenge task for spoken dialog systems. NAACL Workshop on Future directions and needs in the Spoken Dialog Community: Tools and Data. NAACL 2012.

Results from the challenge

Overview paper at SIGDIAL 2013 [pdf]

Evaluation of tracker output [download]

Complete raw tracker output [download]

The entries from team0 are the baselines

Download challenge materials

Handbook [download]

Training data:

Test data:

Baseline and evaluation scripts [download]

Bus timetable database active summer 2010 [download]

(*) Note: train1b and train1c are larger sets of calls with transcriptions but WITHOUT correctness labels.

Background documents

Jason Williams, Antoine Raux, Deepak Ramachadran, and Alan Black, The Dialog State Tracking Challenge, in Proceedings of the SIGDIAL 2013 Conference, Metz, France, August 2013.

Dialog state tracking challenge handbook (final)

Information for prospective participants (talk at IEEE SLT, December 2012)

Jason D. Williams, A belief tracking challenge task for spoken dialog systems, in NAACL HLT 2012 Workshop on Future directions and needs in the Spoken Dialog Community: Tools and Data, Association for Computational Linguistics, June 2012.

Mailing list

To join the mailing list, send an email to:


put “subscribe DSTC” in the body of the message (without the quotes).

To post a message, email: