Patrice Y. Simard, DPU Group, Microsoft Research
Dave Steinkraus, DPU Group, Microsoft Research
John C. Platt, CCSP Group, Microsoft Research
International Conference on Document Analysis and Recognition, pp. 958-962, (2003).
Neural networks are a powerful technology for classification
of visual inputs arising from documents. However, there is a confusing plethora
of different neural network methods that are used in the literature and in industry.
This paper describes a set of concrete best practices that document analysis
researchers can use to get good results with neural networks. The most important
practice is getting a training set as large as possible: we expand the training
set by adding a new form of distorted data. The next most important practice is
that convolutional neural networks are better suited for visual document tasks
than fully connected networks. We propose that a simple “do-it-yourself”
implementation of convolution with a flexible architecture is suitable for many
visual document problems. This simple convolutional neural network does not
require complex methods, such as momentum, weight decay, structure dependent learning
rates, averaging layers, tangent prop, or even finely-tuning the architecture.
The end result is a very simple yet general architecture which can yield state-of-the-art
performance for document analysis. We illustrate our claims on the MNIST set of
English digit images.
© 2003 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
PDF file (89 KB)