A Unified Context-Free Grammar And N-Gram Model For Spoken Language Processing

  • Ye-Yi Wang ,
  • Milind Mahajan ,
  • Xuedong Huang

IEEE International Conference on Acoustics, Speech, and Signal Processing |

Published by Institute of Electrical and Electronics Engineers, Inc.

While context-free grammars (CFGs) remains as one of the most important grammars formalisms for interpreting natural language,a word n-gram models is are surprisingly powerful for domain-independent applications. We propose to unify these two grammars formalisms for both speech recognition and spoken language understanding (SLU). With portability as the major problem, we incorporated domain-specific CFGs into a domain-independent n-gram model that can improve generalizability of the CFG and specificity of the n-gram. In our study experiments, the unified model can significantly reduce the test set perplexity from 474378 to 90 in comparison with a domain-independent word trigram. The unified model converges well when the domain-specific data becomes available. The perplexity can be further reduced from 90 to 65 with a limited amount of domain-specific data. While we have demonstrated portability excellent portability, the full potential of our approach lies in its unified recognition and understanding that we are investigating.