Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Mitigating the Paucity of Data Problem

Eric Brill and Michele Banko


In this paper, we discuss experiments applying machine learning techniques to the task of confusion set disambiguation, using three orders of magnitude more training data than has previously been used for any disambiguation-in-string-context problem. In an attempt to determine when current learning methods will cease to benefit from additional training data, we analyze residual errors made by learners when issues of sparse data have been significantly mitigated. Finally, in the context of our results, we discuss possible directions for the empirical natural language research community.


Publication typeInproceedings
> Publications > Mitigating the Paucity of Data Problem