University of Massachusetts Amherst
Computer Science Department
Joint Inference and Probabilistic Databases for Large-scale Knowledge-base Construction
Wikipedia’s impact has been revolutionary. The collaboratively edited encyclopedia has transformed the way many people learn, browse new interests, share knowledge and make decisions. Its information is mainly represented in natural language text. However, for many tasks more structured information is useful because it better supports pattern analysis and decision-making. In this talk I will describe multiple research components useful for building large, structured knowledge bases, including information extraction from text, entity resolution, joint inference with conditional random fields, probabilistic databases to manage uncertainty at scale, robust reasoning about human edits, tight integration of probabilistic inference and parallel/distributed processing, and probabilistic programming languages for easy specification of complex graphical models. I will also discuss applications of these methods to scientometrics and a new publishing model for science research.
Joint work with Michael Wick, Sameer Singh, Karl Schultz, Sebastian Riedel, Limin Yao, Brian Martin and Gerome Miklau.
Andrew McCallum is a Professor and Director of the Information Extraction and Synthesis Laboratory in the Computer Science Department at University of Massachusetts Amherst. He has published over 200 papers in many areas of AI, including natural language processing, machine learning, data mining and reinforcement learning, and his work has received over 25,000 citations. He obtained his PhD from University of Rochester in 1995 with Dana Ballard and a postdoctoral fellowship from CMU with Tom Mitchell and Sebastian Thrun. In the early 2000's he was Vice President of Research and Development at at WhizBang Labs, a 170-person start-up company that used machine learning for information extraction from the Web. He is a AAAI Fellow, the recipient of the UMass NSM Distinguished Research Award, the UMass Lilly Teaching Fellowship, and research awards from IBM, Microsoft and Google. He is the General Chair for the International Conference on Machine Learning (ICML) 2012, a member of the board of the International Machine Learning Society and the editorial board of the Journal of Machine Learning Research. For the past ten years, McCallum has been active in research on statistical machine learning applied to text, especially information extraction, co-reference, semi-supervised learning, topic models, and social network analysis. Work on search and bibliometric analysis of open-access research literature can be found at http://rexa.info. McCallum's web page is http://www.cs.umass.edu/~mccallum.