Mitigating the Paucity of Data Problem

Michele Banko; Eric Brill

Mitigating the Paucity of Data Problem

Michele Banko ,
Eric Brill

January 2001

Download BibTex

In this paper, we discuss experiments applying machine learning techniques to the task of confusion set disambiguation, using three orders of magnitude more training data than has previously been used for any disambiguation-in-string-context problem. In an attempt to determine when current learning methods will cease to benefit from additional training data, we analyze residual errors made by learners when issues of sparse data have been significantly mitigated. Finally, in the context of our results, we discuss possible directions for the empirical natural language research community.