Yeye He and Dong Xin
In this paper, we study the problem of expanding a set of given seed entities into a more complete set by discovering other entities that also belong to the same concept set. A typical example is to use “Canon” and “Nikon” as seed entities, and derive other entities (e.g., “Olympus”) in the same concept set of camera brands. In order to discover such relevant entities, we exploit several web data sources, including lists extracted from web pages and user queries from a web search engine. While these web data are highly diverse with rich information that usually cover a wide range of the domains of interest, they tend to be very noisy. We observe that previously proposed random walk based approaches do not perform very well on these noisy data sources. Accordingly, we propose a new general framework based on iterative similarity aggregation, and present detailed experimental results to show that, when using general-purpose web data for set expansion, our approach outperforms previous techniques in terms of both precision and recall.
|Published in||Proceedings of International World Wide Web Conference (WWW)|