Information Extraction Crossing Language, Robustness and Domain Barriers

Speaker  Imed Zitouni

Host  Geoffrey Zweig

Affiliation  Microsoft - Bing

Duration  01:00:00

Date recorded  2 November 2012

Modern communication technologies have made massive amounts of real-time news information in several languages readily available. This led to the need to develop news-monitoring system that allows users to monitor multilingual news media in near real-time and search over stored content. One example of such a system is Translingual Automatic Language Exploration System, codenamed TALES. In this talk I will briefly describe the architecture of TALES and focus on its information extraction component. Information extraction is a crucial step toward understanding a text, as it identifies the important conceptual objects and relations between them in a discourse. I will address the portability of the used approach to different languages and show a method of propagating information into low resource languages from richer ones. Compared to other approaches that focuses on clean-text, I will also show the robustness of our technique to less-well-formed input. For example, information extraction in a multilingual broadcast processing system has to deal with inaccurate automatic transcription and translation. The resulting presence of non-target-language text in this case yields many false alarms, which raise the research problem of making information extraction robust to such noisy input text. If time permit, I will also discuss the application and adaptation of these techniques to health-care domain.

©2012 Microsoft Corporation. All rights reserved.
> Information Extraction Crossing Language, Robustness and Domain Barriers