We present a new real-time articulated hand tracker which can enable new possibilities for human-computer interaction (HCI). Our system accurately reconstructs complex hand poses across a variety of subjects using only a single depth camera. It also allows for a high-degree of robustness, continually recovering from tracking failures. However, the most unique aspect of our tracker is its flexibility in terms of camera placement and operating range.
Quick interaction between a human teacher and a learning machine presents numerous benefits and challenges when working with web-scale data. The human teacher guides the machine towards accomplishing the task of interest. The system leverages big data to find examples that maximize the training value of its interaction with the teacher.
CityNoise is a project led by Dr. Yu Zheng in Microsoft Research. The project aims to diagnose a city's noise pollution with crowdsensing and ubiquitous data. It reveals the fine-grained noise situation throughout a city and analyzes the composition of noises in a particular location, by using 311 complaint data together with road network data, points of interests, and social media.
We present a machine learning technique for estimating absolute, per-pixel depth using any conventional monocular 2D camera, with minor hardware modifications. Our approach targets close-range human capture and interaction where dense 3D estimation of hands and faces is desired. We use hybrid classification-regression forests to learn how to map from near infrared intensity images to absolute, metric depth in real-time. We demonstrate a variety of human computer interaction scenarios.
Website for the CIKM2014 tutorial on Deep Learning for Natural Language Processing: Theory and Practice (more content to be added)
This paper presents a method for acquiring dense nonrigid shape and deformation from a single monocular depth sensor. We focus on modeling the human hand, and assume that a single rough template model is available. We combine and extend existing work on model-based tracking, subdivision surface fitting, and mesh deformation to acquire detailed hand models from as few as 15 frames of depth data.
We introduce an efficient camera relocalization approach which can be easily integrated into real-time 3D reconstruction methods, such as KinectFusion. Our approach makes use of compact encoding of whole image frames which enables both online harvesting of keyframes in tracking mode, and fast retrieval of pose proposals when tracking is lost. The encoding scheme is based on randomized ferns and simple binary feature tests.
Site under construction
ViiBoard uses vision techniques to significantly enhance the user experience on large touch displays (e.g. Microsoft Perceptive Pixels) in two directions: human computer interaction and immersive remote collaboration. the first
Alternating minimization is a popular approach to solve several optimization problems. In this work, we explore theoretical properties of this method (and its variants) for several non-convex optimization problems that feature prominently in several important areas such as recommendation systems, compressive sensing, computer vision etc.
Microsoft Research in partnership with Bing is happy to launch the second MSR-Bing Challenge on Image Retrieval. Do you have what it takes to build the best image retrieval system? Enter the MSR-Bing Image Retrieval Challenge in ACM Multimedia and/or ICME to develop an image scoring system for a search query. Current Challenge: MSR-Bing IRC @ ACM Multimedia 2014
Using a diversity of big data to infer and predict fine-grained air quality throughout a city, and finally tackle air pollutions.
Sequence Error (SE) Minimization Training of Neural Network for Voice Conversion
Project CodaLab is an open source platform that empowers communities to explore experiments together and create competitions designed to advance the state-of-the-art in machine learning.
Conversational systems interact with people through language to assist, enable, or entertain. Research at Microsoft spans dialogs that use language exclusively, or in conjunctions with additional modalities like gesture; where language is spoken or in text; and in a variety of settings, such as conversational systems in apps or devices, and situated interactions in the real world.
Filter forests (FF) are an efficient new discriminative approach for predicting continuous variables given a signal and its context. FF can be used for general signal restoration tasks that can be tackled via convolutional filtering, where it attempts to learn the optimal filtering kernels to be applied to each data point. The model can learn both the size of the kernel and its values, conditioned on the observation and its spatial or temporal context.
We are studying how we can get regular people to do simple tasks at specific locations. An example task is to take a picture of a sign at a certain location. We are interested in who to ask and how much to pay.
We work on questions motivated by machine learning, in particular from the theoretical and computational perspectives. Our goals are to mathematically understand the effectiveness of existing learning algorithms and to design new learning algorithms. We combine expertise from diverse fields such as algorithms and complexity, statistics, and convex geometry.
Tabular is a Excel add-in that brings the power of model based machine learning to data enthusiasts. It allows the user to write a simple model that explains their data and perform Bayesian inference. Tabular is built on top of Infer.NET.
Natural Language Processing (NLP) is a foundational infrastructure for processing written text. This processing revolves around text analysis and understanding serving a multitude of sophisticated tasks such as Text Search, Document Management, Automatic Translation, Proofreading, Text Summarization and many more…
Labs: ATL Cairo
Face In The Crowd examines the social impact of crowdsourcing platforms—cloud-based computational systems that allow the outsourcing of work through open requests—and how they might shape the future of work.
Labs: New England
This is an umbrella project for our activity in machine learning with exploration-exploitation tradeoff. Most of us are at MSR-NYC.
We are working toward a theoretic foundation of developing large-scale human-machine systems that combine the intelligence of human and the computing power of machine to address tasks that are difficult to complete by either human or machine alone.
The LKW project is aimed at designing low-power algorithms and systems for admission control to speech systems: i.e., detecting foreground speech, recognizing leading keywords and verifying speakers on a continuously-on wearable device. Our goal is to consume under 10 mW average on generic embedded hardware available today and under 100uW on custom hardware.
Tempe is a web service for exploratory data analysis.