Hypotheses Ranking and State Tracking for a Multi-Domain Dialog System using Multiple ASR Alternates

  • Omar Zia Khan ,
  • Jean-Philippe Robichaud ,
  • Paul A. Crook ,
  • Ruhi Sarikaya

Proceedings of the 16th Annual Conference of the International Speech Communication Association (INTERSPEECH 2015) |

Published by ISCA - International Speech Communication Association

In this paper, we present an approach to improve the accuracy of multi-domain multi-turn spoken dialog system (SDS) by including alternate results from automatic speech recognition (ASR). Often, even if the top ranked result from the ASR is not correct, the correct result may still be available in the NBest list or in the word confusion network (WCN). Thus, the SDS performance can be improved by considering beyond the top ranked choice from the ASR. We employ late binding, such that multiple ASR choices are propagated through the SDS and knowledge fetch so that additional context can be utilized at later stages to determine the top choice that is good for the overall SDS. We rank alternate domain dependent semantic frames, multiple semantic frames per ASR choice, to determine the true SDS output. Using real-world data, extracted from the logs of Cortana personal digital assistant deployed to millions of users, we show that significant gains can be achieved in domain detection, intent determination, and slot tagging, by considering additional results from ASR.