Yanan Qian, Yunhua Hu, Jianling Cui, Qinghua Zheng, and Zaiqing Nie
Author disambiguation in digital libraries becomes increasingly difficult as the number of publications and consequently the number of ambiguous author names keep growing. The fully automatic author disambiguation approach could not give satisfactory results due to the lack of signals in many cases. Furthermore, human judgment on the basis of automatic algorithms is also not suitable because the automatically disambiguated results are often mixed and not understandable for humans. In this paper, we propose a Labeling Oriented Author Disambiguation approach, called LOAD, to combine machine learning and human judgment together in author disambiguation. LOAD exploits a framework which consists of high precision clustering, high recall clustering, and top dissimilar clusters selection and ranking. In the framework, supervised learning algorithms are used to train the similarity functions between publications and a clustering algorithm is further applied to generate clusters. To validate the effectiveness and efficiency of the proposed LOAD approach, comprehensive experiments are conducted. Comparing to conventional author disambiguation algorithms, the LOAD yields much more accurate results to assist human labeling. Further experiments show that the LOAD approach can save labeling time dramatically.
In Proceedings of International Conference on Information and Knowledge Management