Xiaodong He and Li Deng
31 May 2013
Automatic speech recognition is a central and common component of voice-driven information processing systems in human language technology, including spoken language translation, spoken language understanding, voice search, spoken document retrieval, and so on. Interfacing speech recognition with its downstream text-based processing tasks of translation, understanding, and information retrieval creates both challenges and opportunities in optimal design of the combined, speech-enabled systems. We present an optimization-oriented statistical framework for the overall system design where the interactions between the sub-systems in tandem are fully incorporated and where design consistency is established between the optimization objectives and the end-to-end system performance metrics. Techniques for optimizing such objectives in both the decoding and learning phases of the speech-centric information processing system design are described, in which the uncertainty in speech recognition sub-system’s outputs is fully considered and marginalized. This paper provides an overview of the past and current work in this area. Future challenges and new opportunities are also discussed and analyzed.
|Published in||Proceedings of the IEEE|