We are studying how we can get regular people to do simple tasks at specific locations. An example task is to take a picture of a sign at a certain location. We are interested in who to ask and how much to pay.
We work on questions motivated by machine learning, in particular from the theoretical and computational perspectives. Our goals are to mathematically understand the effectiveness of existing learning algorithms and to design new learning algorithms. We combine expertise from diverse fields such as algorithms and complexity, statistics, and convex geometry.
Tabular is a Excel add-in that brings the power of model based machine learning to data enthusiasts. It allows the user to write a simple model that explains their data and perform Bayesian inference. Tabular is built on top of Infer.NET.
Natural Language Processing (NLP) is a foundational infrastructure for processing written text. This processing revolves around text analysis and understanding serving a multitude of sophisticated tasks such as Text Search, Document Management, Automatic Translation, Proofreading, Text Summarization and many more…
Labs: ATL Cairo
Crowded: Digital Piecework and the Politics of Platform Responsibility in Precarious Times looks as crowdsourcing as a focal point for many of the issues that are raised by the structure of our current information economy: economic value, cultural meaning, and ethics.
Labs: New England
This is an umbrella project for our activity in machine learning with exploration-exploitation tradeoff. Most of us are at MSR-NYC.
We work on fundamental issues in crowdsourcing, in particular, incentive mechanisms for paid crowdsourcing, algorithms and theory for crowdsourced problem solving.
The LKW project is aimed at designing low-power algorithms and systems for admission control to speech systems: i.e., detecting foreground speech, recognizing leading keywords and verifying speakers on a continuously-on wearable device. Our goal is to consume under 10 mW average on generic embedded hardware available today and under 100uW on custom hardware.
Big Sky is a web service for exploratory data analysis.
The goal of this project is to provide easily usable models for lexical semantic relations, which have been developed at Microsoft Research. Currently the models include heterogeneous vector space models for measuring semantic word relatedness and the polarity inducing latent semantic analysis (LSA) model that judges whether two words or synonyms or antonyms.
GeoS is a Windows application for interactive semi automated segmentation of medical images such as CT (Computed Tomography) and MR (Magnetic Resonance) scans.
Search TrailBlazer is a project that aims at redefining the way people think about search. We propose to model user search behaivor using tasks rather than queries or sessions in the traditional way. Our framework contains components to impact multiple core areas of search engines, including relevance ranking, metric design, user satisfaction prediction, DSAT mining, competitive analysis and etc.
R2 is a research project within the Programming Languages and Tools group at Microsoft Research India on probabilistic programming. Our goal is to build a user friendly and scalable probabilistic programming system by employing powerful techniques from language design, program analysis and verification.
The 7-Scenes dataset is a collection of tracked RGB-D camera frames. The dataset may be used for evaluation of methods for different applications such as dense tracking and mapping and relocalization techniques.
There is some evidence that a gap exists between the neural network research and software development communities. Source code examples available to software developers are often incomplete, misleading, or just plain incorrect. The goal of this project is to bridge that gap by providing a series of high quality demo programs. The basic C# demo can be accessed from: http://research.microsoft.com/NeuralNetworks/BackPropDemo.aspx
Accurate localization and identification of vertebrae in spinal CT imaging is important for many clinical tasks such as diagnosis, surgical planning, and post-operative assessment. Clinical datasets raise many difficulties for automatic methods. These arise from the frequent presence of abnormal spine curvature, small field of view, and image artifacts caused by surgical implants. To facilitate the advance of research on this topic, we provide a database of 242 annotated spine CT scans.
Spoken language understanding (SLU) is an emerging field in between the areas of speech processing and natural language processing. The term spoken language understanding has largely been coined for targeted understanding of human speech directed at machines. This project covers our research on SLU tasks such as domain detection, intent determination, and slot filling, using data-driven methods.
Commerce search is fundamentally different from web search. Main differences lie in both, the needs of the users posing the queries and the characteristics of the underlying product data that is served in response to the query. Product data such as products have inherent structure, in that they have typed properties (for instance, brand, color, weight) and are often categorized into a taxonomy. User queries to the search engine also have semantics and implicit structure. Often, users posing
We investigate how people's behaviour online can be characterized in terms of psychometric measurements such as the Big-5 personality traits openness, conscientiousness, extraversion, agreeableness, and neuroticism as well as general intelligence and satisfaction-with-life. We investigate patterns of Facebook usage, website preferences, query logs, and Facebook Likes and look for interesting correlations which can be used to predict users behaviours, preferences or characteristics.
Distribution Modeller (temporary name only!) is CEES' end-to-end browser tool that lets the researcher to rapidly import data, supplement that data with environmental info from FetchClimate, specify an arbitrary model by point and click or in code, parameterize the model against the data using Filzbach, make and visualize predictions with a full propagation of parameter uncertainty – then package and share everytihng, in a way that is inspectable, repeatable, and modifiable.
With the emergence of abundant online content, cloud computing, and electronic reading devices, textbooks are poised for transformative changes. Taking into account the vast amount of existing textbooks designed for traditional printed medium and the potential for enabling new kinds of functionalities through the medium of electronic textbooks, we present the results of our research into algorithmically diagnosing and enhancing the quality of textbooks.
Intelligent Tutoring Systems (ITS) can significantly enhance the educational experience, both in the classroom and online. A key aspect of ITS is the ability to automatically generate problems of a certain difficulty level and that exercise use of certain concepts. This can help avoid copyright or plagiarism issues and help generate personalized workflows. This project develops technologies for problem generation in various subject domains including math, logic, and even language learning.
Probabilistic inference made easy by a direct integration inside the tools where the data is found: spreadsheets and databases.
This project focuses on advancing the state-of-the-art in language processing with recurrent neural networks. We are currently applying these to language modeling, machine translation, speech recognition, language understanding and meaning representation. A special interest in is adding side-channels of information as input, to model phenomena which are not easily handled in other frameworks.
Research around information aggregation and prediction, including polls, probability elicitation, and prediction markets.These methods, broadly defined as wisdom of the crowds, are utilized for a range of outcomes: elections, marketing, internal corporate, military intelligence, etc. We demonstrate some serious advances. (1) Combinatorial Prediction Markets: frontend, backened, and unique questions. (2) Experimental Prediction Markets and Polling. (3) Forecasts, Sentiment, and Data Analytics
Labs: New York