A research community challenge task for accurately estimating a user's goal in a spoken dialog system
- 2013-03-22: Evaluation data has been released - see links on the right. The baseline and evaluation scripts have also been updated to include a tool "bin/validate" for checking tracker output.
- 2013-02-04: labels in train1a and train2 have been updated to fix bugs reported on the mailing list -- links are the same (see right)
- 2013-01-09: train3 data (see right) now includes correctness labels for all slu hyps/acts (not just slot values), in a "system-specific" dictionary.
- 2013-01-08: train1c released (see right). This is a large set of calls (7,545), with transcriptions but WITHOUT correctness labels.
- 2012-12-28: URLs for train1a and tain2 updated (see right)
- 2012-12-14: Challenge launched: Training data, scripts, and bus timetable database released (see right)
- 2012-12-06: Information for prospective participants (presentation from IEEE SLT) posted. Also, the DSTC Handbook has been updated to latest version.
- 2012-12-04: Start of challenge moved to 2012-12-10 to allow for community input at IEEE SLT workshop (on 2012-12-05), and to finish labeling
- 2012-11-26: (Minor) updates to schedule
- 2012-11-20: Further updates to advisory board
- 2012-11-12: Pre-release of a subset of the training data and helper scripts. To obtain the links, please email firstname.lastname@example.org. Comments invited - please note the data and the scripts are subject to change before the official start of DSTC at IEEE SLT in December.
- 2012-11-09: Dialog sate tracking challenge handbook published. This replaces the Labeling guide and evaluation procedure document.
- 2012-08-27: Updates to advisory board (listed below)
- 2012-08-20: First advisory board members agreed (listed below)
- 2012-07-17: DSTC annoucned to SigDial mailing list
- 2012-07-17: Comment period extended from August 3 to August 17
- 2012-07-12: Labeling guide and evaluation procedure published for comment
- 2012-07-06: DSTC announced at SigDial 2012
- 2012-07-05: DSTC endorsed by SigDial
Background and motivation
In dialog systems, "state tracking" – sometimes also called "belief tracking" – refers to accurately estimating the user’s goal as a dialog progresses. Accurate state tracking is desirable because it provides robustness to errors in speech recognition, and helps reduce ambiguity inherent in language within a temporal process like dialog. Dialog state tracking is an important problem for both traditional uni-modal dialog systems, as well as speech-enabled multi-modal dialog systems on mobile devices, on tablet computers, and in automobiles.
Recently, a host of models have been proposed for dialog state tracking. However, comparisons among models are rare, and different research groups use different data from disparate domains. Moreover, there is currently no common dataset which enables off-line dialog state tracking experiments, so newcomers to the area must first collect dialog data, which is expensive and time-consuming, or resort to simulated dialog data, which can be unreliable. All of these issues hinder advancing the state-of-the-art.
In this challenge, participants will use a provided set of labeled dialogs to develop a dialog state tracking algorithm. Algorithm will then be evaluated on a common set of held-out dialogs, to enable comparisons .
The data for this challenge will be taken from the Spoken Dialog Challenge , which consists of human/machine spoken dialogs with real users (not usability subjects). Before the start of the challenge, a draft of the labeling guide and evaluation measures will be published, and comments will be invited from the community. The organizers will then perform the labeling.
At the start of the challenge – the development phase – participants will be provided with a training set of transcribed and labeled dialogs. Participants will also be given code that implements the evaluation measurements. Participants will then have several months to optimize their algorithms.
At the end of the challenge, participants will be given an untranscribed and unlabeled test set, and a short period to run their algorithm against the test set. Participants will submit their algorithms’ output to the organizers, who will then perform the evaluation. After the challenge, the test set transcriptions and labels will be made public.
How to participate
To participate, email email@example.com to register. Registered participants will be provided with a link to download the data and evaluation tools, once they are available.
Results of the evaluation will be reported to the community. In this challenge, the identities of participants will not be made public in any written results by the organizers or participants, except that participants may identify themselves (only) in their own written results.
|1 July 2012||Beginning of comment period on labeling and evaluation metrics|
|4-6 July 2012||YRRSDS and SigDial in Korea: announcement of comment period|
|17 August 2012||End of comment period on labeling and evaluation metrics|
|31 August 2012||Evaluation metrics and labeling guide published; labeling begins|
|10 December 2012||Labeling ends; data available; challenge begins (14 weeks)|
|3-5 December 2012||IEEE SLT 2012 (Miami): Information session, 5 Dec 3 PM|
|22 March 2013||Final system due; evaluation begins (1 week)|
|29 March 2013||
Evaluation output due to organizers
|5 April 2013||Results sent to teams|
|3 May 2013||SigDial paper deadline; write up (4 weeks)|
|August 2013||SigDial 2013|
Jason Williams, Microsoft Research, USA (chair)
Alan Black, Carnegie Mellon University, USA
Deepak Ramachandran, Honda Research Institute, USA
Antoine Raux, Honda Research Institute, USA
Daniel Boies, Microsoft, Canada
Paul Crook, Microsoft, USA
Maxine Eskenazi, Carnegie Mellon University, USA
Milica Gasic, University of Cambridge, UK
Dilek Hakkani-Tur, Microsoft, USA
Helen Hastie, Heriot Watt University, UK
Kee-Eung Kim, KAIST, Korea
Ian Lane, Carnegie Mellon University, USA
Sungjin Lee, Carnegie Mellon University, USA
Teruhisa Misu, NICT, Japan
Olivier Pietquin, SUPELEC, France
Joelle Pineau, McGill University, Canada
Blaise Thomson, University of Cambridge, UK
David Traum, USC Institute for Creative Technologies, USA
Luke Zettlemoyer, University of Washington, USA
 A. W. Black, S. Burger, A. Conkie, H. Hastie, S. Keizer, O. Lemon, N. Merigaud, G. Parent, G. Schubiner, B. Thomson, J. D. Williams, K. Yu, S. Young and M. Eskenazi. (2011) Spoken Dialog Challenge 2010: Comparison of Live and Control Test Results . SIGDial 2011.
 J. D. Williams. (2012) A belief tracking challenge task for spoken dialog systems. NAACL Workshop on Future directions and needs in the Spoken Dialog Community: Tools and Data. NAACL 2012.
Download challenge materials
- train1a [download]
- train1b [download] (*)
- train1c [download] (*)
- train2 [download]
- train3 [download]
Baseline and evaluation scripts [download]
Bus timetable database active summer 2010 [download]
(*) Note: train1b and train1c are larger sets of calls with transcriptions but WITHOUT correctness labels.
Information for prospective participants (talk at IEEE SLT, December 2012)
Jason D. Williams, A belief tracking challenge task for spoken dialog systems, in NAACL HLT 2012 Workshop on Future directions and needs in the Spoken Dialog Community: Tools and Data, Association for Computational Linguistics, June 2012.