Simple Models, Lots of Data: Mining semantics about entities using Web-Scale Data
Many areas in computer science like machine translation, speech recognition and computer vision are becoming more data-driven: statistical techniques that use simple models and use lots of data trump approaches that use complex models, deep algorithms or hand-coded rules. I believe that this is also true for mining semantics about entities. I will give some examples of such tasks like mining alternate names (aka "synonyms") of entities, finding descriptive phrases about entities, extracting semantic mentions of entities in documents and understanding attributes of entities and performing entity augmentation. I will discuss how we have used Web-scale data and simple, unsupervised algorithms to achieve high accuracy in these semantic tasks. This leads to several interesting research questions in statistical semantics and big data management.