From Text to Entities and from Entities to Insight: a Perspective on Unstructured Big Data
Gerhard Weikum, Max-Planck Institute for Informatics
News, social media, web sites, and enterprise sources produce huge amounts of valuable contents in the form of text and speech. To tap this wealth of unstructured big data and obtain insights, a decisive step is to identify the entities that are referred to and relationships between entities. This allows linking unstructured contents with structured data. However, this step faces the fundamental problem that names and phrases are often highly ambiguous; mapping them to entities and relations is a challenging task. The talk will discuss the state of the art and open problems on disambiguating named entities and relational phrases. It will also put this line of research in perspective to the bigger picture of big data analytics.