Dan Roth and Wen-tau Yih
30 November 2007
Natural language decisions often involve assigning values to sets of variables, representing low level decisions and context dependent disambiguation. In most cases there are complex relationships among these variables representing dependencies that range from simple statistical correlations to those that are constrained by deeper structural, relational and semantic properties of the text. In this work we study a specific instantiation of this problem in the context of identifying named entities and relations between them in free form text. Given a collection of discrete random variables representing outcomes of learned local predictors for entities and relations, we seek an optimal global assignment to the variables that respects multiple constraints, including constraints on the type of arguments a relation can take, and the mutual activity of different relations. We develop a linear programming formulation to address this global inference problem and evaluate it in the context of simultaneously learning named entities and relations. We show that global inference improves stand-alone learning; in addition, our approach allows us to efficiently incorporate expressive domain and task specific constraints at decision time, resulting, beyond significant improvements in the accuracy, in “coherent” quality of the inference.
All copyrights reserved by MIT Press 2007.