Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Generative Models for Name Disambiguation

Yang Song, Jian Huang, Isaac G. Councill, Jia Li, and C. Lee Giles

Abstract

Name ambiguity is a special case of identity uncertainty where one person can be referenced by multiple name variations in different situations or evenshare the same name with other people. In this paper, we present an efficient framework by using two novel topic-based models, extended from Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA). Our models explicitly introduce a new variable for persons and learn the distribution of topics with regard to persons and words. Experiments indicate that our approach consistently outperforms other unsupervised methods including spectral and DBSCAN clustering. Scalability is addressed by disambiguating authors in over 750,000 papers from the entire CiteSeer dataset.

Details

Publication typeInproceedings
Published inthe 16th international conference on World Wide Web (WWW 2007)
PublisherAssociation for Computing Machinery, Inc.
> Publications > Generative Models for Name Disambiguation