Ryan Georgi, Fei Xia, and William Lewis
As most of the world’s languages are under-resourced, projection algorithms offer an enticing way to bootstrap the resources available for one resource-poor language from a resource-rich language by means of parallel text and word alignment. These algorithms, however, make the strong assumption that the language pairs share common structures and that the parse trees will resemble one another. This assumption is useful but often leads to errors in projection. In this paper, we will address this weakness by using trees created from instances of Interlinear Glossed Text (IGT) to discover patterns of divergence between the languages. We will show that this method improves the performance of projection algorithms significantly in some languages by accounting for divergence between languages using only the partial supervision of a few corrected trees.
|Published in||Proceedings of ACL 2013|