Embedding information networks into low-dimensional spaces is potentially useful in many applications such as visualization, node classification, link prediction and recommendation. In this project, we proposed a large-scale information network embedding model called the "LINE", which is suitable for arbitrary types of information networks: undirected, directed, and/or weighted.
Building a computer system to automatically solve math word problems written in natural language.
Understanding MixEd LANguaGE
Our goal is to let normal users tell computers what to do using normal language. This problem space is strongly related to natural language understanding, program synthesis, and many other areas.
We introduce a novel approach for automatically generating image descriptions. Visual detectors, language models, and deep multimodal similarity models are learned directly from a dataset of image captions. Our system is state-of-the-art on the official Microsoft COCO benchmark, producing a BLEU-4 score of 29.1%. Human judges consider the captions to be as good as or better than humans 34% of the time.
This project aims to enable people to converse with their devices. We are trying to teach devices to engage with humans using human language in ways that appear seamless and natural to humans. Our research focuses on statistical methods by which devices can learn from human-human conversational interactions and can situate responses in the verbal context and in physical or virtual environments.
The Logical Form analysis produced by the NLPwin parser is very close in spirit to the level of semantic representation defined in AMR, Abstract Meaning Representation. The "NLPwin parses AMR" project is a conversion from LF to AMR in order to facilitate 1) evaluation of the NLPwin LF and 2) contribution the ongoing discussion of the specification of AMR. In this project, we include publications, as well as links to our LF training data converted to AMR and to the LF-AMR parser for English.
NLPwin is a software project at Microsoft Research that aims to provide Natural Language Processing tools for Windows (hence, NLPwin). The project was started in 1991, just as Microsoft inaugurated the Microsoft Research group; while active development of NLPwin continued through 2002, it is still being updated regularly, primarily in service of Machine Translation.
Website for the CIKM2014 tutorial on Deep Learning for Natural Language Processing: Theory and Practice (more content to be added)
A tool that analyze the Arabic text and generates the parse tree
This tool converts the colloquial Arabic to modern standard Arabic.
Project CodaLab is an open source platform that empowers communities to explore experiments together and create competitions designed to advance the state-of-the-art in machine learning.
Definition • Conversion of text from one script to another • Translation of named entities • Conversion of text from Romanized Arabic to native Arabic script
Natural Language Processing (NLP) is a foundational infrastructure for processing written text. This processing revolves around text analysis and understanding serving a multitude of sophisticated tasks such as Text Search, Document Management, Automatic Translation, Proofreading, Text Summarization and many more…
The global hub for sustainable development at Microsoft Research
The goal of this project is to provide easily usable models for lexical semantic relations, which have been developed at Microsoft Research. Currently the models include heterogeneous vector space models for measuring semantic word relatedness and the polarity inducing latent semantic analysis (LSA) model that judges whether two words or synonyms or antonyms.
In most organizations, staff spend many hours in meetings. This project addresses all levels of analysis and understanding, from speaker tracking and robust speech transcription to meaning extraction and summarization, with the goal of increasing productivity both during the meeting and after, for both participants and nonparticipants.
Spoken language understanding (SLU) is an emerging field in between the areas of speech processing and natural language processing. The term spoken language understanding has largely been coined for targeted understanding of human speech directed at machines. This project covers our research on SLU tasks such as domain detection, intent determination, and slot filling, using data-driven methods.
Intelligent Tutoring Systems (ITS) can significantly enhance the educational experience, both in the classroom and online. A key aspect of ITS is the ability to automatically generate problems of a certain difficulty level and that exercise use of certain concepts. This can help avoid copyright or plagiarism issues and help generate personalized workflows. This project develops technologies for problem generation in various subject domains including math, logic, and even language learning.
The goals of MSRA Knowledge Service system and the team are: (1) build a large, high-quality, fresh, and easy to use knowledge layer; (2) provide knowledge service to the utility layer and the application layer; (3) coordinate various knowledge extraction and refining effors at MSRA (to reduce duplication efforts).