A Unified Context-Free Grammar And N-Gram Model For Spoken Language Processing
- Ye-Yi Wang ,
- Milind Mahajan ,
- Xuedong Huang
IEEE International Conference on Acoustics, Speech, and Signal Processing |
Published by Institute of Electrical and Electronics Engineers, Inc.
While context-free grammars (CFGs) remains as one of the most important grammars formalisms for interpreting natural language,a word n-gram models is are surprisingly powerful for domain-independent applications. We propose to unify these two grammars formalisms for both speech recognition and spoken language understanding (SLU). With portability as the major problem, we incorporated domain-specific CFGs into a domain-independent n-gram model that can improve generalizability of the CFG and specificity of the n-gram. In our study experiments, the unified model can significantly reduce the test set perplexity from 474378 to 90 in comparison with a domain-independent word trigram. The unified model converges well when the domain-specific data becomes available. The perplexity can be further reduced from 90 to 65 with a limited amount of domain-specific data. While we have demonstrated portability excellent portability, the full potential of our approach lies in its unified recognition and understanding that we are investigating.
© 2007 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.