We introduce a novel approach for automatically generating image descriptions. Visual detectors, language models, and deep multimodal similarity models are learned directly from a dataset of image captions. Our system is state-of-the-art on the official Microsoft COCO benchmark, producing a BLEU-4 score of 29.1%. Human judges consider the captions to be as good as or better than humans 34% of the time.
This project aims to enable people to converse with their devices. We are trying to teach devices to engage with humans using human language in ways that appear seamless and natural to humans. Our research focuses on statistical methods by which devices can learn from human-human conversational interactions and can situate responses in the verbal context and in physical or virtual environments.
The Logical Form analysis produced by the NLPwin parser is very close in spirit to the level of semantic representation defined in AMR, Abstract Meaning Representation. The "NLPwin parses AMR" project is a conversion from LF to AMR in order to facilitate 1) evaluation of the NLPwin LF and 2) contribution the ongoing discussion of the specification of AMR. In this project, we include publications, as well as links to our LF training data converted to AMR and to the LF-AMR parser for English.
NLPwin is a software project at Microsoft Research that aims to provide Natural Language Processing tools for Windows (hence, NLPwin). The project was started in 1991, just as Microsoft inaugurated the Microsoft Research group; while active development of NLPwin continued through 2002, it is still being updated regularly, primarily in service of Machine Translation.
Website for the CIKM2014 tutorial on Deep Learning for Natural Language Processing: Theory and Practice (more content to be added)
A tool that analyze the Arabic text and generates the parse tree
This tool converts the colloquial Arabic to modern standard Arabic.
Project CodaLab is an open source platform that empowers communities to explore experiments together and create competitions designed to advance the state-of-the-art in machine learning.
Definition • Conversion of text from one script to another • Translation of named entities • Conversion of text from Romanized Arabic to native Arabic script
Natural Language Processing (NLP) is a foundational infrastructure for processing written text. This processing revolves around text analysis and understanding serving a multitude of sophisticated tasks such as Text Search, Document Management, Automatic Translation, Proofreading, Text Summarization and many more…
The global hub for sustainable development at Microsoft Research
The goal of this project is to provide easily usable models for lexical semantic relations, which have been developed at Microsoft Research. Currently the models include heterogeneous vector space models for measuring semantic word relatedness and the polarity inducing latent semantic analysis (LSA) model that judges whether two words or synonyms or antonyms.
Spoken language understanding (SLU) is an emerging field in between the areas of speech processing and natural language processing. The term spoken language understanding has largely been coined for targeted understanding of human speech directed at machines. This project covers our research on SLU tasks such as domain detection, intent determination, and slot filling, using data-driven methods.
Intelligent Tutoring Systems (ITS) can significantly enhance the educational experience, both in the classroom and online. A key aspect of ITS is the ability to automatically generate problems of a certain difficulty level and that exercise use of certain concepts. This can help avoid copyright or plagiarism issues and help generate personalized workflows. This project develops technologies for problem generation in various subject domains including math, logic, and even language learning.
The goals of MSRA Knowledge Service system and the team are: (1) build a large, high-quality, fresh, and easy to use knowledge layer; (2) provide knowledge service to the utility layer and the application layer; (3) coordinate various knowledge extraction and refining effors at MSRA (to reduce duplication efforts).
This project focuses on advancing the state-of-the-art in language processing with recurrent neural networks. We are currently applying these to language modeling, machine translation, speech recognition, language understanding and meaning representation. A special interest in is adding side-channels of information as input, to model phenomena which are not easily handled in other frameworks.
Using analysis of social media posts, we look for linguistic markers that might indicate postpartum depression.
Statistical Parsing and Linguistic Analysis Toolkit is a linguistic analysis toolkit. Its main goal is to allow easy access to the linguistic analysis tools produced by the Natural Language Processing group at Microsoft Research. The tools include both traditional linguistic analysis tools such as part-of-speech taggers and parsers, and more recent developments, such as sentiment analysis (identifying whether a particular of text has positive or negative sentiment towards its focus)
Mood-based detection of affects in tweets
Extraction of structured information from biomedical text.