G. Zweig, D. Bohus, X. Li, and P. Nguyen
Due to speech recognition errors, repetition can be a frequent occurrence in voice-search applications. While a proper treatment of this phenomenon requires the joint modeling of two or more utterances simultaneously, currently deployed systems typically treat the utterances independently. In this paper, we analyze the structure of repetitions and find that in at least one commercial directory assistance application, repetitions follow simple structural transformations more than 70% of the time. We present preliminary results that suggest that significant gains are possible by explicitly modeling this structure in a joint decoding process.
|Published in||In Proceedings of Interspeech|