Mobile video is quickly becoming a mass consumer phenomenon. More and more people are using their smartphones to search and browse video contents while on the move. This project is to develop an innovative instant mobile video search system through which users can discover videos by simply pointing their phones at a screen to capture a very few seconds of what they are watching.
Stroke Recovery with Kinect is an interactive rehabilitation system that helps stroke patients improve their upper-limb motor functioning in the comfort of their own home. By using the Microsoft Kinect sensor’s gesture recognition technology, the system recognizes and interprets the user’s movements, assesses their rehabilitation progress, and adjusts the level of difficulty for subsequent therapy sessions.
Exploratory queries on a database often returns too few or too many results (e.g., a home search query on a database of available homes). In such cases, the user faces the challenges of (i) navigating through too many results and/or (ii) refining the query. This project focuses on innovative ways to help the user when the face the above challenges.
The same entity is often referred to in a variety of ways. For example, the camera Canon 600d is also referred to as "canon rebel t3i", the celebrity Jennifer Lopez is also referred to as "jlo" and Seattle Tacoma International Airport is also referred to as "sea tac". These are known as synonyms. Without knowledge of synonyms, many applications like e-commerce search will fail to return relevant results. We leverage the data assets amassed by Bing to automatically mine such synonyms.
AutoTag ‘n Search My Photos is a Windows 8 application that helps manage your personal photos. It authenticates to Facebook using your Facebook id, and uses photos tagged in Facebook to automatically tag photos in your Pictures Library. Once tagged, photos can be searched with people tags using the search charm as well as selectively uploaded to Facebook with tags. Tagging accuracy improves as more photos are tagged on Facebook and edits/confirmations are made to tags using the application.
Big Sky is a web service for exploratory data analysis.
Search TrailBlazer is a project that aims at redefining the way people think about search. We propose to model user search behaivor using tasks rather than queries or sessions in the traditional way. Our framework contains components to impact multiple core areas of search engines, including relevance ranking, metric design, user satisfaction prediction, DSAT mining, competitive analysis and etc.
In this project, we investigate near-duplicate document detection, focusing primarily on the detection of evolving news stories. These stories often consist primarily of syndicated information, with local replacement of headlines, captions, and the addition of locally-relevant content. By detecting near-duplicates, we can offer users only those stories with content materially different from previously-viewed versions of the story.
Our team from Microsoft Research is studying social information seeking behavior.
We investigated heuristics for automatically identifying "spam" web pages, i.e. pages that are created to enrich the publisher rather than to provide utility to the consumer.
The Scalable Hyperlink Store is a specialized "database" for the web graph. SHS maintains the web graph in main memory, distributed over many machines. The system is available as C# source code as well as precompiled binaries.
A Web page is not atom but rich in structure. In this project, we take advantage of HTML DOM structure and associated visual features, such as font size, width and height of a DOM element, to understand the purpose of authors in creating a page. We model importance of blocks in the page; we extract structured data from pages across websites; we learn templates from a set of mixed pages from a website; we also identify article title, body and images from pages to improve reading experience.
An automated, unsupervised, scalable solution to language identification based on publicly available data.
Our research team is studying how users seek health information using both traditional search engines and emerging social platforms, and how the experience of health information seeking can be improved.
There is some evidence that a gap exists between the neural network research and software development communities. Source code examples available to software developers are often incomplete, misleading, or just plain incorrect. The goal of this project is to bridge that gap by providing a series of high quality demo programs. The basic C# demo can be accessed from: http://research.microsoft.com/NeuralNetworks/BackPropDemo.aspx
One Click Access evaluation at NTCIR
By submitting one query, users may have different intents. For an ambiguous query, users may seek for different interpretations. For a faceted topic, users may be interested in different subtopics. In this project, we investigate how many queries are ambiguous in real search logs; we propose methods to diversify search results; we experiment with new metrics to measure diversity; we also organize NTCIR INTENT and IMINE tasks to provide common data for IR community.
Evaluating summaries, ranked retrieval and sessions seamlessly
Commerce search is fundamentally different from web search. Main differences lie in both, the needs of the users posing the queries and the characteristics of the underlying product data that is served in response to the query. Product data such as products have inherent structure, in that they have typed properties (for instance, brand, color, weight) and are often categorized into a taxonomy. User queries to the search engine also have semantics and implicit structure. Often, users posing
We investigate how people's behaviour online can be characterized in terms of psychometric measurements such as the Big-5 personality traits openness, conscientiousness, extraversion, agreeableness, and neuroticism as well as general intelligence and satisfaction-with-life. We investigate patterns of Facebook usage, website preferences, query logs, and Facebook Likes and look for interesting correlations which can be used to predict users behaviours, preferences or characteristics.
In recent years the Web has evolved substantially, transforming from a place where we primarily find information to a place where we also leave, share and keep it. This presents a fresh set of challenges for the management of personal information, which include how to underpin greater awareness and more control over digital belongings and other personally meaningful content that is hosted online.
An increased dependence on medical imaging for patient diagnosis and treatment places new challenges upon the clinical community. Existing image processing workflows struggle to keep up with the pace at which imaging technology is developing. Microsoft Research is working with top research institutes around the world to make available data and tools and advance the state of the art in automatic analysis of medical scans.
Distribution Modeller (temporary name only!) is CEES' end-to-end browser tool that lets the researcher to rapidly import data, supplement that data with environmental info from FetchClimate, specify an arbitrary model by point and click or in code, parameterize the model against the data using Filzbach, make and visualize predictions with a full propagation of parameter uncertainty – then package and share everytihng, in a way that is inspectable, repeatable, and modifiable.
With the emergence of abundant online content, cloud computing, and electronic reading devices, textbooks are poised for transformative changes. Taking into account the vast amount of existing textbooks designed for traditional printed medium and the potential for enabling new kinds of functionalities through the medium of electronic textbooks, we present the results of our research into algorithmically diagnosing and enhancing the quality of textbooks.
Identifying and Visualizing Viral Content
Labs: New York