Project CodaLab is an open source platform that empowers communities to explore experiments together and create competitions designed to advance the state-of-the-art in machine learning.
Definition • Conversion of text from one script to another • Translation of named entities • Conversion of text from Romanized Arabic to native Arabic script
Natural Language Processing (NLP) is a foundational infrastructure for processing written text. This processing revolves around text analysis and understanding serving a multitude of sophisticated tasks such as Text Search, Document Management, Automatic Translation, Proofreading, Text Summarization and many more…
The global hub for sustainable development at Microsoft Research
The goal of this project is to provide easily usable models for lexical semantic relations, which have been developed at Microsoft Research. Currently the models include heterogeneous vector space models for measuring semantic word relatedness and the polarity inducing latent semantic analysis (LSA) model that judges whether two words or synonyms or antonyms.
Spoken language understanding (SLU) is an emerging field in between the areas of speech processing and natural language processing. The term spoken language understanding has largely been coined for targeted understanding of human speech directed at machines. This project covers our research on SLU tasks such as domain detection, intent determination, and slot filling, using data-driven methods.
Intelligent Tutoring Systems (ITS) can significantly enhance the educational experience, both in the classroom and online. A key aspect of ITS is the ability to automatically generate problems of a certain difficulty level and that exercise use of certain concepts. This can help avoid copyright or plagiarism issues and help generate personalized workflows. This project develops technologies for problem generation in various subject domains including math, logic, and even language learning.
The goals of MSRA Knowledge Service system and the team are: (1) build a large, high-quality, fresh, and easy to use knowledge layer; (2) provide knowledge service to the utility layer and the application layer; (3) coordinate various knowledge extraction and refining effors at MSRA (to reduce duplication efforts).
This project focuses on advancing the state-of-the-art in language processing with recurrent neural networks. We are currently applying these to language modeling, machine translation, speech recognition, language understanding and meaning representation. A special interest in is adding side-channels of information as input, to model phenomena which are not easily handled in other frameworks.
Using analysis of social media posts, we look for linguistic markers that might indicate postpartum depression.
Statistical Parsing and Linguistic Analysis Toolkit is a linguistic analysis toolkit. Its main goal is to allow easy access to the linguistic analysis tools produced by the Natural Language Processing group at Microsoft Research. The tools include both traditional linguistic analysis tools such as part-of-speech taggers and parsers, and more recent developments, such as sentiment analysis (identifying whether a particular of text has positive or negative sentiment towards its focus)
Mood-based detection of affects in tweets
Extraction of structured information from biomedical text.
MS Afkar is an initiative to implement the technology and innovations from Cairo Microsoft Innovation Lab (CMIC) into cool ideas and scenarios targeted to impact Arabic internet users and to collect data that enables further R&D at CMIC. These ideas come from our researchers and developers and span a wide range of applications for content authoring, language tools, Internet browsing, search and much more.
To drive and support the strategic initiatives of over 800 researchers in MSR worldwide through partnerships and communities, Microsoft Research Connections holds a yearly Think Tank meeting with academics from universities worldwide. The Winter 2011 Think Tank focused on semantic computing.
Contact search is important in many scenarios. In this project, we focus on making contact search easy and painless. Specifically, we address two issues: a) Spelling Mistakes b) Multilinguality. Users make different kinds of mistakes while typing names and aliases. It is important that contact search be tolerant to these mistakes. Further, users often type names in one language whereas the name is represented in a different language in the name directory/contact list.
Building a Better Speller: Bing and Microsoft Research Offer Prizes for Best Search Engine Spelling Alteration Services
This research project is for exploiring the core technologies and applications of semantic computing.
"Twahpic" shows what tweets on Twitter™ are about in terms of both topics (like sports, politics, Internet, etc) and axes of Substance, Social, Status, and Style. Twahpic uses Partially Labeled Latent Dirichlet Analysis (PLDA) to identify 200 topics used on Twitter.
WikiBhasha beta is a multilingual content creation tool for Wikipedia. Developed by Microsoft Research, WikiBhasha beta enables contributors to Wikipedia to find content from other Wikipedia articles, translate the content into other languages, and then either compose new articles or enhance existing articles in multilingual Wikipedias.