Can we develop artificial intelligence that learns to make sense of complex environments? That learn from others, including humans, how to interact with the world? That learn transferable skills throughout their existence, and apply them to solve new, challenging problems? Project Malmo sets out to address these core challenges of artificial intelligence research. We address them by integrating (deep) reinforcement learning, cognitive science, and many ideas from artificial intelligence.
This project studies the problem of visualizing large-scale and high-dimensional data in a low-dimensional (typically 2D or 3D) space. Much success has been reported recently by techniques that first compute a similarity structure of the data points and then project them into a low-dimensional space with the structure preserved. These two steps suffer from considerable computational costs, preventing the state-of-the-art methods such as the t-SNE from scaling to large-scale and high-dimensional
Several studies have demonstrated the need for the world’s food production to double by 2050. However, there is limited amount of additional arable land, and water levels have also been receding at a fast rate. Although technology could help the farmer, its adoption is limited because the farms usually do not have power, or Internet connectivity, and the farmers are typically not technology savvy. We are working towards an end-to-end approach, from sensors to the cloud, to solve the problem.
Homomorphic Encryption (HE) refers to a special type of encryption technique that allows for computations to be done on encrypted data, without requiring access to a decryption key.
holoportation is a new type of 3D capture technology that allows high quality 3D models of people to be reconstructed, compressed, and transmitted anywhere in the world in real-time. When combined with mixed reality displays such as HoloLens, this technology allows users to see and interact with remote participants in 3D as if they are actually present in their physical space. Communicating and interacting with remote users becomes as simple as face to face communication.
Friendships are dynamic. In this project, we uncover the dynamics of tie strength and online social interactions in terms of various aspects, such as reciprocity, temporality, and contextually. Based upon these dynamics, we build a learning to rank framework to predict social interactions in online social networks.
One out of four people in the world have experienced mental illness at some point in their lives. DiPsy is a digital psychologist presented as a personalized chatbot, who can evaluate, diagnose, treat and study users' mental processes through natural conversations.
Image is becoming a popular media for user communications on social networks. Then, it comes to be a natural requirement to enable chatbot to chat on images besides textual inputs. Based on MS XiaoIce(微软小冰), we explore the direction of image chat and iterate several rounds to enhance her talkative ability for images.
Dogs are human's close friends on the planet, there were estimated to be 400 million dogs in the world from hundreds of varied breeds. As the large number of breeds, it is hard for normal users to recognize most of them. Hereby, we developed a dog recognizer to assist users to know more about dogs.
In the field of computer science, large-scale experimentation on users is not new: there have been many efforts in both the public and private sectors to analyze users and to create experimental conditions to provoke changes in their behavior. However, new autonomous and semi-autonomous systems for experimentation, driven by techniques from AI and machine learning, raise important questions for the field. Many of these questions are about the social and ethical implications of these systems.
Labs: New York
Click-through data accumulated by search engine where rich connections between images and semantics have been built via the massive user clicks. The data comes free when search engine freely provides service to users, and naturally scales up to million scale even billion scale. Unlike dedicatedly constructed datasets, click-through data is noisy, unstructured and unbalanced. Under this project, we are targeting effectively using click-through data to solve image understanding problems.
This project aims at applying recent deep learning methods for conversational understanding tasks such as Cortana.
The Dual Embedding Space Model (DESM) is an information retrieval model that uses two word embeddings, one for query words and one for document words. It takes into account the vector similarity between each query word vector and all document word vectors.
Automatically describing video content with natural language is a fundamental challenge of computer vision. Recurrent Neural Networks (RNNs), which models sequence dynamics, has attracted increasing attention on visual interpretation. In this project, we present a novel unified framework, named Long Short-Term Memory with visual-semantic Embedding (LSTM-E), which can simultaneously explore the learning of LSTM and visual-semantic embedding.
Uncertainty is a C# library that uses LINQ to let developers easily express probabilistic computations and then inference over those computations. See our recorded Research In Focus talk from the Microsoft Faculty Summit (http://research.microsoft.com/apps/video/?id=251861) this past year for more information. Uncertain
Seabed is a project to provide analytics over encrypted Big Data. The challenge is to develop fast yet secure cryptographic techniques that support a suite of applications such as Business Intelligence tools and large-scale Machine Learning frameworks. Currently, we are building Seabed into Apache Spark.
Language is one of the fundamental ways in which intelligence can be demonstrated, and seeking to build AI systems that can use language effectively helps focus our efforts on a number of hard research problems: Where does knowledge come from and how is it stored? What representations, learning, and inference are required to build flexible goal-directed conversational systems? How do we build conversational systems that people want to interact with? How do we learn from these interactions?
The goal of this project is to study and devise methods for the problems of low-rank matrix completion and in general, estimating low-rank matrices by using a small number of observations.
Numiscan is a project to scan, process and sort coins using machine learning.
Embedding information networks into low-dimensional spaces is potentially useful in many applications such as visualization, node classification, link prediction and recommendation. In this project, we proposed a large-scale information network embedding model called the "LINE", which is suitable for arbitrary types of information networks: undirected, directed, and/or weighted.
Building a computer system to automatically solve math word problems written in natural language.
Platform for Situated Interaction
We present a new interactive approach to 3D scene understanding. Our system, SemanticPaint, allows users to simultaneously scan their environment, whilst interactively segmenting the scene simply by reaching out and touching any desired object or surface. Our system continuously learns from these segmentations, and labels new unseen parts of the environment. Unlike offline systems, where capture, labeling and batch learning often takes hours or even days to perform, our approach is fully online.
MWT is a toolbox of machine learning technology for principled and efficient experimentation, plausibly applicable to most Microsoft services that interact with customers.
The Microsoft Academic Graph is a heterogeneous graph containing scientific publication records, citation relationships between those publications, as well as authors, institutions, journals and conference "venues" and fields of study. This data is available as a set of zipped text files stored in Microsoft Azure blob storage and available via HTTP.