Share this page
Share this page E-mail this page Print this page RSS feeds
Home > Publications > Lexicon optimization for Chinese language modeling
Lexicon optimization for Chinese language modeling

In this paper, we present an approach to lexicon optimization for Chinese language modeling. The method is an iterative procedure consisting of two phases, namely lexicon generation and lexicon pruning. In the first phase, we extract appropriate new words from a very large training corpus by statistical approaches. In the second phase, we prune the lexicon to a pre-set memory limitation using a perplexity minimization criterion. Experimental results show up to 6% character perplexity reduction comparing to the baseline lexicon.

lexicon_optimization_modeling.pdf
PDF file

Details

Type: Inproceedings