A unified context-free grammar and n-gram model for spoken language processing

Ye-Yi Wang, Milind Mahajan, and Xuedong Huang

Abstract

While context-free grammars (CFGs) remain as one of the most

important formalisms for interpreting natural language, word ngram

models are surprisingly powerful for domain-independent

applications. We propose to unify these two formalisms for both

speech recognition and spoken language understanding (SLU).

With portability as the major problem, we incorporated domainspecific

CFGs into a domain-independent n-gram model that can

improve generalizability of the CFG and specificity of the ngram.

In our experiments, the unified model can significantly

reduce the test set perplexity from 378 to 90 in comparison with a

domain-independent word trigram. The unified model converges

well when the domain-specific data becomes available. The

perplexity can be further reduced from 90 to 65 with a limited

amount of domain-specific data. While we have demonstrated

excellent portability, the full potential of our approach lies in its

unified recognition and understanding that we are investigating.

Details

Publication typeInproceedings
Published inProc. of Int. Conf. on Acoustics, Speech, and Signal Processing
PublisherInstitute of Electrical and Electronics Engineers, Inc.
> Publications > A unified context-free grammar and n-gram model for spoken language processing