Stephen Guo, Ming-Wei Chang, and Emre Kiciman
10 June 2013
Information extraction on microblog posts is an important task nowadays, as microblogs capture an unprecedented amount of information and provide a view into the pulse of the world. Given that the current definition of named entity recognition is too limited, we consider the task of Twitter entity linking in this paper.
In the current entity linking literature, mention detection and entity disambiguation are frequently cast as equally important but distinct problems. However, in our task, we find that mention detection is often the performance bottleneck. The reason is that messages on micro-blogs are short, noisy, and informal texts with little context, and often contain phrases with ambiguous meanings.
To rigorously address the Twitter entity linking problem, we propose a structural SVM algorithm for entity linking that jointly optimizes mention detection and entity disambiguation as a single end-to-end task. By combining structural learning and a variety of firstorder, second-order, and context-sensitive features, our system is able to outperform existing state-of-the art entity linking systems by 15% F1.
In NAACL-HLT 2013