Our goal is to make the information on the Web accessible and comprehensible at entity-relationship level.
Imagine a knowledge discovery task that aims at retrieving commonalities or broad relations between two, three or more entities of interest. An example could be the query that asks for the relation between Niels Bohr, Richard Feynman, and Enrico Fermi. Possible answers are that all of them were quantum physicists, theoretical physicists, members of the Manhattan Project, etc. State-of-the-art search engines would only return relevant results to such a query if the given entities and their relations were mentioned on the same Web sites. However, in general, the relevant pieces of information could be distributed across several Web pages and consequently, the standard page-oriented keyword-search paradigm is not sufficient to deal with such tasks. Hence, our focus is on a more general, two-fold approach to access the knowledge on the Web.
Construction of Large-Scale Knowledge Bases : we extract knowledge records about entities and relationships from various Web sources and integrate them consistently into a knowledge graph. The nodes of such a graph represent entities (e.g., people, products, locations, etc.) and the edges represent facts about entities (e.g., birth dates, birth places, inventions, product prices, etc.). For each extracted fact we maintain further metadata that can help us compute the truth value of that fact.
Dealing with Uncertainty: the extracted facts come with uncertainties. There are different sources of uncertainty e.g., the extraction tools, the Web pages from which the facts were extracted, or even the facts themselves. For example, it is difficult to say on what exact date Pythagoras was born. We can only estimate his birth date by investigating the historical context. We are investigating probabilistic models that allow us to derive truth values for the facts by taking the various sources of uncertainty into account.
- Gjergji Kasneci, Jurgen Van Gael, and Thore Graepel, DBrev: Dreaming of a Database Revolution, in the 5th Biennieal Conference on Innovative Datasystems Research (CIDR 2011), Association for Computing Machinery, Inc., 2011
- Weiwei Cheng, Gjergji Kasneci, Thore Graepel, David Stern, and Ralf Herbrich, Automated Feature Generation from Structured Knowledge, in the 20th ACM Conference on Information and Knowledge Management (CIKM 2011), ACM, 2011
- Gjergji Kasneci, Jurgen Van Gael, David Stern, and Thore Graepel, CoBayes: Bayesian Knowledge Corroboration with Assessors of Unknown Areas of Expertise, in the 4th ACM International Conference on Web Search and Data Mining (WSDM2011) , Association for Computing Machinery, Inc., 2011
- Gjergji Kasneci, Jurgen V. Gael, Ralf Herbrich, and Thore Graepel, Bayesian Knowledge Corroboration with Logical Rules and User Feedback, in Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), Springer Verlag, 2010