Learning to Construct and Reason with a Large Knowledge Base of Extracted Information

Carnegie Mellon University’s “Never Ending Language Learner” (NELL) has been running for over three years, and has automatically extracted from the web millions of facts concerning hundreds of thousands of entities and thousands of concepts. NELL works by coupling together many interrelated large-scale semi-supervised learning problems. In this talk I will discuss some of the technical problems we encountered in building NELL, and some of the issues involved in reasoning with this sort of large, diverse, and imperfect knowledge base. This is joint work with Tom Mitchell, Ni Lao, William Wang, and many other colleagues.

Speaker Details

William Cohen received his bachelor’s degree in Computer Science from Duke University in 1984, and a PhD in Computer Science from Rutgers University in 1990. From 1990 to 2000 Dr. Cohen worked at AT&T Bell Labs and later AT&T Labs-Research, and from April 2000 to May 2002 Dr. Cohen worked at Whizbang Labs, a company specializing in extracting information from the web. Dr. Cohen is President of the International Machine Learning Society, an Action Editor for the Journal of Machine Learning Research, and an Action Editor for the journal ACM Transactions on Knowledge Discovery from Data. He is also an editor, with Ron Brachman, of the AI and Machine Learning series of books published by Morgan Claypool. In the past he has also served as an action editor for the journal Machine Learning, the journal Artificial Intelligence, and the Journal of Artificial Intelligence Research. He was General Chair for the 2008 International Machine Learning Conference, held July 6-9 at the University of Helsinki, in Finland; Program Co-Chair of the 2006 International Machine Learning Conference; and Co-Chair of the 1994 International Machine Learning Conference. Dr. Cohen was also the co-Chair for the 3rd Int’l AAAI Conference on Weblogs and Social Media, which was held May 17-20, 2009 in San Jose, and was the co-Program Chair for the 4rd Int’l AAAI Conference on Weblogs and Social Media, which will be held May 23-26 at George Washington University in Washington, D. C. He is a AAAI Fellow, and in 2008, he won the SIGMOD “Test of Time” Award for the most influential SIGMOD paper of 1998.

Dr. Cohen’s research interests include information integration and machine learning, particularly information extraction, text categorization and learning from large datasets. He holds seven patents related to learning, discovery, information retrieval, and data integration, and is the author of more than 180 publications. and learning from large datasets.