Semi-unsupervised learning of taxonomic and non-taxonomic relationships from the web

Due to the size of the World Wide Web, it is necessary to develop tools for automatic or semi-automatic analyses of web data, such as finding patterns and implicit information in the web, a task usually known as Web Mining. In particular, web content mining consists of automatically mining data from textual web documents that can be represented with machine-readable semantic formalisms.

While more traditional approaches to Information Extraction from text, such as those applied to the Message Understanding Conferences during the nineties, relied on small collections of documents with many semantic annotations, the characteristics of the web (its size, redundancy and the lack of semantic annotations in most texts) favor efficient algorithms able to learn from unannotated data. Furthermore, new types of web content such as web forums, blogs and wikis, are also a source of textual information that contain an underlying structure from which specialist systems can benefit.

This talk will describe an ongoing project for automatically acquiring ontological knowledge (both taxonomic and non-taxonomic relationships) from the web in a partially unsupervised way. The proposed approach combines distributional semantics techniques with rote extractors. A particular focus will be set on an automatic addition of semantic tags to the Wikipedia with the aim of transforming it, with small effort, into a Semantic Wikipedia.

Speaker Details

Enrique Alfonseca is a postdoctoral researcher at the Tokyo Institute of Technology, Japan. He holds a Ph.D. from the Universidad Autonoma de Madrid (2003), and has been working on text mining for ontology learning, text summarization and the application of the Semantic Web in e-learning. He has served as program committee member in eight conferences and workshops, including two workshops on ontology learning, and has performed several research stays at the University of York.Enrique Alfonseca has published some 45 papers on various fields in Natural Language Processing and e-learning, and is the coauthor of one book.

Date:
Speakers:
Enrique Alfonseca
Affiliation:
Tokyo Institute of Technology