Targeted Disambiguation of Ad-hoc, Homogeneous Sets of Named Entities

Chi Wang, Kaushik Chakrabarti, Tao Cheng, and Surajit Chaudhuri

Abstract

In many entity extraction applications, the entities to be

recognized are constrained to be from a list of “target entities”.

In many cases, these target entities are (i) ad-hoc,

i.e., do not exist in a knowledge base and (ii) homogeneous

(e.g., all the entities are IT companies). We study the following

novel disambiguation problem in this unique setting:

given the candidate mentions of all the target entities, determine

which ones are true mentions of a target entity. Prior

techniques only consider target entities present in a knowledge

base and/or having a rich set of attributes. In this paper,

we develop novel techniques that require no knowledge

about the entities except their names. Our main insight is to

leverage the homogeneity constraint and disambiguate the

candidate mentions collectively across all documents. We

propose a graph-based model, called MentionRank, for that

purpose. Furthermore, if additional knowledge is available

for some or all of the entities, our model can leverage it to

further improve quality. Our experiments demonstrate the

effectiveness of our model. To the best of our knowledge,

this is the first work on targeted entity disambiguation for

ad-hoc entities.

Details

Publication typeInproceedings
Published inWorld Wide Web Conference
> Publications > Targeted Disambiguation of Ad-hoc, Homogeneous Sets of Named Entities