Kuansan Wang
December 2003
In this paper, we describe our recent effort in combining
the speech recognition and understanding into a single
pass decoding process. The goal is to utilize the semantic
structure not only to better handle disfluencies and
improve the overall understanding accuracy, but also to
shorten the response time and achieve higher interactivity.
Three related techniques are instrumental in our approach.
First, we employ the unified language model (ULM) to
incorporate semantic schema into the recognition
language model, and extend the search process from word
synchronous to semantic object synchronous (SOS)
decoding. Finally, we utilize sequential detection to defer,
reject, or accept semantic hypotheses and execute
consequent dialog actions while the user’s utterance is
ongoing. We incorporated these methods into SALT and
HTML and conducted comparative user studies based on
the MiPad scenarios. The experimental results show the
system can gracefully cope with spontaneous speech and
the users prefer the highly interactive nature of such
systems even though there are no significant differences in
the task completion rate and the understanding accuracy.
However, the interactive interface does allow a more
effective visual prompting strategy that contributes to the
significantly lower out of grammar utterances.
![]() PDF file |
In: Proc. of the IEEE Workshop on Automatic Speech Recognition and Understanding
| Type: | Inproceedings |