Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
A unified context-free grammar and n-gram model for spoken language processing

Ye-Yi Wang, Milind Mahajan, and Xuedong Huang

Abstract

While context-free grammars (CFGs) remain as one of the most important formalisms for interpreting natural language, word ngram models are surprisingly powerful for domain-independent applications. We propose to unify these two formalisms for both speech recognition and spoken language understanding (SLU). With portability as the major problem, we incorporated domainspecific CFGs into a domain-independent n-gram model that can improve generalizability of the CFG and specificity of the ngram. In our experiments, the unified model can significantly reduce the test set perplexity from 378 to 90 in comparison with a domain-independent word trigram. The unified model converges well when the domain-specific data becomes available. The perplexity can be further reduced from 90 to 65 with a limited amount of domain-specific data. While we have demonstrated excellent portability, the full potential of our approach lies in its unified recognition and understanding that we are investigating.

Details

Publication typeInproceedings
Published inProc. of Int. Conf. on Acoustics, Speech, and Signal Processing
PublisherInstitute of Electrical and Electronics Engineers, Inc.
> Publications > A unified context-free grammar and n-gram model for spoken language processing