Zhongyuan Wang, Haixun Wang, and Zhirui Hu
Head and modifier detection is an important problem for applications that handle short texts such as search queries, ads keywords, titles, captions, etc. In many cases, short texts such as search queries do not follow grammar rules, and existing approaches for head and modifier detection are coarse-grained, domain specific, and/or require labeling of large amounts of training data. In this paper, we introduce a semantic approach for head and modifier detection. We first obtain a large number of instance level head-modifier pairs from search log. Then, we develop a conceptualization mechanism to generalize the instance level pairs to concept level. Finally, we derive weighted concept patterns that are concise, accurate, and have strong generalization power in head and modifier detection. Furthermore, we identify a subset of modifiers that we call constraints. Constraints are usually specific and not negligible as far as the intent of the short text is concerned, while nonconstraint modifiers are more subjective. The mechanism we developed has been used in production for search relevance and ads matching.We use extensive experiment results to demonstrate the effectiveness of our approach.
In International Conference on Data Engineering (ICDE)