*
Quick Links|Home|Worldwide
Microsoft*
Search for


Natural Language Processing


Overview

The goal of the Natural Language Processing (NLP) group is to design and build software that will analyze, understand, and generate languages that humans use naturally, so that eventually you will be able to address your computer as though you were addressing another person.

This goal is not easy to reach. "Understanding" language means, among other things, knowing what concepts a word or phrase stands for and knowing how to link those concepts together in a meaningful way. It's ironic that natural language, the symbol system that is easiest for humans to learn and use, is hardest for a computer to master. Long after machines have proven capable of inverting large matrices with speed and grace, they still fail to master the basics of our spoken and written languages.

The challenges we face stem from the highly ambiguous nature of natural language. As an English speaker you effortlessly understand a sentence like "Flying planes can be dangerous". Yet this sentence presents difficulties to a software program that lacks both your knowledge of the world and your experience with linguistic structures. Is the more plausible interpretation that the pilot is at risk, or that the danger is to people on the ground? Should "can" be analyzed as a verb or as a noun? Which of the many possible meanings of "plane" is relevant? Depending on context, "plane" could refer to, among other things, an airplane, a geometric object, or a woodworking tool. How much and what sort of context needs to be brought to bear on these questions in order to adequately disambiguate the sentence?

We address these problems using a mix of knowledge-engineered and statistical/machine-learning techniques to disambiguate and respond to natural language input. Our work has implications for applications like text critiquing, information retrieval, question answering, summarization, gaming, and translation. The grammar checkers in Office for English, French, German, and Spanish are outgrowths of our research; Encarta uses our technology to retrieve answers to user questions; Intellishrink uses natural language technology to compress cellphone messages; Microsoft Product Support uses our machine translation software to translate the Microsoft Knowledge Base into other languages. As our work evolves, we expect it to enable any area where human users can benefit by communicating with their computers in a natural way.

People

Primary Contact: Bill Dolan



Alvarez-Godinez,
Eduardo





Photo Not Available

Eetemadi,
Sauleh






Photo Not Available





   

Affiliate Members







Jessee,
Andrea

Krivosheev,
Gleb



  
 
Selected current projects

Machine Translation is currently a major focus of the group. In contrast to existing commercial MT systems, we are pursuing a data-driven approach which all translation knowledge is learned from existing bilingual text.

Textual Entailment Recognition was proposed recently as a generic task that captures major semantic inference needs across many natural language processing applications.

Paraphrase recognition and generation are crucial to creating applications that approximate our understanding of language. We have released a corpus of  approximately 5000 sentence pairs that have been annotated by humans to indicate whether or not they can be considered paraphrases. Alignment phrase tables created using the data described in Quirk et al. (2004) and Dolan et al. (2004) are now also available for download.

MindNet we have done quite a bit of work aimed at formalizing the representation of word meanings, developing methods for automatically building semantic networks from text and then exploring their structure. 

The Japanese NLP project page summarizes areas of research we are working on in processing Japanese.

Older projects

Amalgam is a novel system developed in the Natural Language Processing group at Microsoft Research for sentence realization during natural language generation that employs machine learning techniques. Sentence realization is the process of generating (realizing) a fluent sentence from a semantic representation.

IntelliShrink is a product that uses linguistic analysis to abbreviate an email message so that it can be displayed on a cell phone. IntelliShrink analyses messages in English, French, German or Spanish.

 
Publications

A full list of publications by the Natural Language Processing group is available.

Natural Language Links

The Microsoft Research Paraphrase Corpus and Microsoft Research Paraphrase Phrase Tables are now available.

@




©2008 Microsoft Corporation. All rights reserved. Terms of Use |Trademarks |Privacy Statement