Strategies for Training Large Scale Neural Network Language Models

We describe how to effectively train neural network

based language models on large data sets. Fast convergence

during training and better overall performance is observed when

the training data are sorted by their relevance. We introduce

hash-based implementation of a maximum entropy model, that

can be trained as a part of the neural network model. This

leads to significant reduction of computational complexity. We

achieved around 10% relative reduction of word error rate on

English Broadcast News speech recognition task, against large

4-gram model trained on 400M tokens.

ASRU-2011.pdf
PDF file

Publisher  IEEE Automatic Speech Recognition and Understanding Workshop

Details

TypeInproceedings
> Publications > Strategies for Training Large Scale Neural Network Language Models