This project aims to enable people to converse with their devices. We are trying to teach devices to engage with humans using human language in ways that appear seamless and natural to humans. Our research focuses on statistical methods by which devices can learn from human-human conversational interactions and can situate responses in the verbal context and in physical or virtual environments.
This is a project looking into design and evaluation of efficient and deployable algorithms for assignment of complex workloads to resources in modern cloud service platforms.
Deep Structured Semantic Model / Deep Semantic Similarity Model
FaST-LMM (Factored Spectrally Transformed Linear Mixed Models) is a set of tools for performing genome-wide association studies (GWAS) on large data sets. FaST-LMM runs on both Windows and Linux, and contains code to do (1) univariate GWAS, (2) testing sets of SNPs, (3) feature selection for background correction, (4) epistatic association scans, (5) a correction method for cellular heterogeneity in methylation and similar data.
We present a new real-time articulated hand tracker which can enable new possibilities for human-computer interaction (HCI). Our system accurately reconstructs complex hand poses across a variety of subjects using only a single depth camera. It also allows for a high-degree of robustness, continually recovering from tracking failures. However, the most unique aspect of our tracker is its flexibility in terms of camera placement and operating range.
Quick interaction between a human teacher and a learning machine presents numerous benefits and challenges when working with web-scale data. The human teacher guides the machine towards accomplishing the task of interest. The system leverages big data to find examples that maximize the training value of its interaction with the teacher.
CityNoise is a project led by Dr. Yu Zheng in Microsoft Research. The project aims to diagnose a city's noise pollution with crowdsensing and ubiquitous data. It reveals the fine-grained noise situation throughout a city and analyzes the composition of noises in a particular location, by using 311 complaint data together with road network data, points of interests, and social media.
We present a machine learning technique for estimating absolute, per-pixel depth using any conventional monocular 2D camera, with minor hardware modifications. Our approach targets close-range human capture and interaction where dense 3D estimation of hands and faces is desired. We use hybrid classification-regression forests to learn how to map from near infrared intensity images to absolute, metric depth in real-time. We demonstrate a variety of human computer interaction scenarios.
Website for the CIKM2014 tutorial on Deep Learning for Natural Language Processing: Theory and Practice (more content to be added)
This paper presents a method for acquiring dense nonrigid shape and deformation from a single monocular depth sensor. We focus on modeling the human hand, and assume that a single rough template model is available. We combine and extend existing work on model-based tracking, subdivision surface fitting, and mesh deformation to acquire detailed hand models from as few as 15 frames of depth data.
We introduce an efficient camera relocalization approach which can be easily integrated into real-time 3D reconstruction methods, such as KinectFusion. Our approach makes use of compact encoding of whole image frames which enables both online harvesting of keyframes in tracking mode, and fast retrieval of pose proposals when tracking is lost. The encoding scheme is based on randomized ferns and simple binary feature tests.
We publish a subset of the data from the paper "Discriminative Ferns Ensemble for Hand Pose Recognition".
ViiBoard uses vision techniques to significantly enhance the user experience on large touch displays (e.g. Microsoft Perceptive Pixels) in two directions: human computer interaction and immersive remote collaboration. the first
Alternating minimization is a popular approach to solve several optimization problems. In this work, we explore theoretical properties of this method (and its variants) for several non-convex optimization problems that feature prominently in several important areas such as recommendation systems, compressive sensing, computer vision etc.
Microsoft Research in partnership with Bing is happy to launch the second MSR-Bing Challenge on Image Retrieval. Do you have what it takes to build the best image retrieval system? Enter the MSR-Bing Image Retrieval Challenge in ACM Multimedia and/or ICME to develop an image scoring system for a search query. Last Challenge: MSR-Bing IRC @ ACM Multimedia 2014. Current Challenge: MSR-Bing IRC @ ICME 2015. Next Challenge: MSR-Bing IRC @ ACM Multimedia 2015
Using a diversity of big data to infer and predict fine-grained air quality throughout a city, and finally tackle air pollutions.
Sequence Error (SE) Minimization Training of Neural Network for Voice Conversion
Project CodaLab is an open source platform that empowers communities to explore experiments together and create competitions designed to advance the state-of-the-art in machine learning.
Conversational systems interact with people through language to assist, enable, or entertain. Research at Microsoft spans dialogs that use language exclusively, or in conjunctions with additional modalities like gesture; where language is spoken or in text; and in a variety of settings, such as conversational systems in apps or devices, and situated interactions in the real world.
Filter forests (FF) are an efficient new discriminative approach for predicting continuous variables given a signal and its context. FF can be used for general signal restoration tasks that can be tackled via convolutional filtering, where it attempts to learn the optimal filtering kernels to be applied to each data point. The model can learn both the size of the kernel and its values, conditioned on the observation and its spatial or temporal context.
We are studying how we can get regular people to do simple tasks at specific locations. An example task is to take a picture of a sign at a certain location. We are interested in who to ask and how much to pay.
We work on questions motivated by machine learning, in particular from the theoretical and computational perspectives. Our goals are to mathematically understand the effectiveness of existing learning algorithms and to design new learning algorithms. We combine expertise from diverse fields such as algorithms and complexity, statistics, and convex geometry.
Tabular is a Excel add-in that brings the power of model based machine learning to data enthusiasts. It allows the user to write a simple model that explains their data and perform Bayesian inference. Tabular is built on top of Infer.NET.
Natural Language Processing (NLP) is a foundational infrastructure for processing written text. This processing revolves around text analysis and understanding serving a multitude of sophisticated tasks such as Text Search, Document Management, Automatic Translation, Proofreading, Text Summarization and many more…
Labs: ATL Cairo