A Supervised Learning Approach to Search of Definitions

  • Jun Xu ,
  • Yunbo Cao ,
  • Hang Li ,
  • Min Zhao ,
  • Yalou Huang

MSR-TR-2006-18 |

This paper addresses the issue of search of definitions. Specifically, given a term, we are to find definition candidates of the term and rank the candidates according to their likelihood of being good definitions. This is in contrast to the traditional approaches of either generating a single combined definition or outputting all retrieved definitions. Necessity of conducting the task in practice is pointed out. Definition ranking is essential for the task. A specification for judging the goodness of a definition is given. In the specification, a definition is categorized into one of the three levels: ‘good definition’, ‘indifferent defi-nition’, or ‘bad definition’. Methods for performing definition ranking are also proposed in this paper, which formalize the problem as either classification or ordinal regression. We employ SVM (Support Vector Machines) as the classification model and Ranking SVM as the ordinal regression model respec-tively, such that they rank definition candidates according to their likelihood of being good definitions. Features for constructing the SVM and Ranking SVM models are defined, which represent the character-istics of term, definition candidate, and their relationship. Experimental results indicate that the use of SVM and Ranking SVM can significantly outperform the baseline methods of using heuristic rules, em-ploying the conventional information retrieval method of Okapi, or using SVM regression. This is true both when the answers are paragraphs and when they are sentences. Experimental results also show that SVM or Ranking SVM models trained in one domain can be adapted to another domain, indicating that generic models for definition ranking can be constructed.