A unified context-free grammar and n-gram model for spoken language processing

While context-free grammars (CFGs) remain as one of the most

important formalisms for interpreting natural language, word ngram

models are surprisingly powerful for domain-independent

applications. We propose to unify these two formalisms for both

speech recognition and spoken language understanding (SLU).

With portability as the major problem, we incorporated domainspecific

CFGs into a domain-independent n-gram model that can

improve generalizability of the CFG and specificity of the ngram.

In our experiments, the unified model can significantly

reduce the test set perplexity from 378 to 90 in comparison with a

domain-independent word trigram. The unified model converges

well when the domain-specific data becomes available. The

perplexity can be further reduced from 90 to 65 with a limited

amount of domain-specific data. While we have demonstrated

excellent portability, the full potential of our approach lies in its

unified recognition and understanding that we are investigating.

PDF file

In  Proc. of Int. Conf. on Acoustics, Speech, and Signal Processing

Publisher  Institute of Electrical and Electronics Engineers, Inc.
© 2007 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.


> Publications > A unified context-free grammar and n-gram model for spoken language processing