Gokhan Tur, Anoop Deoras, and Dilek Hakkani-Tur
A challenge in large vocabulary spoken language understanding (SLU) is robustness to automatic speech recognition (ASR) errors. The state of the art approaches for semantic parsing rely on using discriminative sequence classification methods, such as conditional random fields (CRFs). Most dialog systems employ a cascaded approach where the best hypotheses from the ASR system are fed into the following SLU system. In our previous work, we have proposed the use of lattices towards joint recognition and parsing. In this paper, extending this idea, we propose to exploit word confusion networks (WCNs), compiled from ASR lattices for both CRF modeling and decoding. WCNs provide a compact representation of multiple aligned ASR hypotheses, without compromising recognition accuracy. For slot filling, we show significant semantic parsing performance improvements using WCNs compared to ASR 1-best output, approximating the oracle path performance.
|Publisher||Annual Conference of the International Speech Communication Association (Interspeech)|