We develop novel eye-gaze tracking technologies in order to make eye-gaze tracking technology ubiquitously available for improved natural user interaction (NUI).
We built the Sketch2Cartoon system, which is an automatic cartoon making system. It enables users to sketch major curves of characters and props in their mind, and real-time search results from millions of clipart images could be selected to compose the cartoon images. The selected com- ponents are vectorized and thus could be further edited. By enabling sketch-based input, even a child who is too young to read or write can draw whatever he/she imagines and get interesting cartoon images.
Microsoft Research in partnership with Bing is happy to launch the second MSR-Bing Challenge on Image Retrieval. Do you have what it takes to build the best image retrieval system? Enter the MSR-Bing Image Retrieval Challenge in ACM Multimedia and/or ICME to develop an image scoring system for a search query.
We argue that the massive amount of click data from commercial search engines provides a data set that is unique in the bridging of the semantic and intent gap. Search engines generate millions of click data (a.k.a. image-query pairs), which provide almost "unlimited" yet strong connections between semantics and images, as well as connections between users' intents and queries. This site is to introduce such as dataset, Clickture.
Mobile video is quickly becoming a mass consumer phenomenon. More and more people are using their smartphones to search and browse video contents while on the move. This project is to develop an innovative instant mobile video search system through which users can discover videos by simply pointing their phones at a screen to capture a very few seconds of what they are watching.
Gigapixel ArtZoom is an interactive panoramic image of Seattle that captures artists and performers in action throughout a 360-degree view of the city. You can zoom into the image to see dancers, acrobats, painters, performance artists, actors, jugglers, and sculptors—all appearing simultaneously within a single 20-gigapixel image. Visit the web site to explore the panorama and find out more.
We address the fundamental challenge of scalability for real-time volumetric surface reconstruction methods. We present a memory-efficient, streamable, hierarchical GPU data structure for 3D reconstruction of large-scale scenes with fine geometric details in real time. The system fuses live depth maps from a moving Kinect camera to generate high-quality 3D models of unbounded size.
The goal of the Spin project is to enable users to capture photorealistic 3D models of objects using just an ordinary camera -- with no special lighting, sensors, or other equipment. Our approach works equally well for a mobile phone, a point-and-shoot camera, or a digital SLR camera. The results can be shared and viewed on a phone, in a web browser, or in a desktop application.
We demonstrate a novel method for real-time 3D scene capture and reconstruction. Using several live color images, we build a high resolution voxelization of the visible surfaces. The key to our approach is an efficient sparse voxel representation ideally suited to Graphics Processing Unit (GPU) acceleration. We store only those voxels that contain the visible surfaces, leading to a compact representation for the 3D model.
How can we communicate one's biometric info, in non-verbal ways, to others, ourselves, places and across time? Under the assumptions that one's face is a sound window to somebody’s emotion, we constructed different art pieces and interactive prototypes that comprise different communication channels ( aural, visual, haptics), and aim to help reflecting on communication itself, its poetry, and ourselves. This project starts with the internship work from Tomas Laurenzo http://laurenz.net.
Xiang Cao, Xin Tong, Yang Liu, Takaaki Shiratori, Yupeng Zhang, Zhimin Ren, Teng Han, Nobuyuki Umetani
VidWiki is a project out of MSR India that leverages the crowd to improve the quality and content of online video lectures like those produced by Khan Academy, Coursera, EdX, and Udacity. Through the online platform, users annotate videos by overlaying content on top of the video. Annotations can be typed text, LaTeX equations, shapes, images, or custom pen-drawn notes directly on the canvas.
A set of embeddable apps that let Excel users experiment with new data visualizations against their own data.
Using Blink for Windows Phone 8, you'll never miss the best shot. Blink captures a burst of images that span the moments before and after you press the shutter. No problem if you push the shutter a few moments too early or too late. With Blink, a simple finger swipe lets you rewind or fast-forward.
IllumiRoom is a proof-of-concept system from Microsoft Research. It augments the area surrounding a television screen with projected visualizations to enhance the traditional living room entertainment experience.
The Multimedia, Interaction, and Communication (MIC) Group at Microsoft Research, Redmond, has several openings for summer internships in 2013. We are looking for experienced and highly motivated students in all areas of multimedia signal processing, computer vision, and graphics, including (but not limited to) the follow areas:
Building a 6-dof tracker using the Kinect projector.
Use Kinect to make movies: watch yourself against a virtual background, and interact with virtual props. Scenery may be scripted, and can include scanned models of real world objects. May also be used in 'weatherman' scenarios, green screen / blue screen scenarios, and in making live or recorded presentations (e.g. powerpoint slides, excel charts); elements of the presentation may be enabled for (scripted) interaction by the presenter.
Participating phones are positioned in a stack, each phone held by a different player. The top phone's screen displays its camera's viewpoint, overlaid with a maze incorporating an AR tag, and a ball guided by accelerometers. When the ball is guided to the end of the maze it 'falls' through a hole, at which point the second phone can 'catch' it, but only if it is positioned correctly. At this point the phones swap positions to continue the game.
Use face tracking and a webcam to detect when a user is looking at the monitor. Switch the monitor (and other systems) to sleep mode when no face is tracked or user gaze is not directed at the screen. Use novel visualizations of power saved to incent user to continue using the tool. Provide a server-based reporting tool to allow IT managers to track power saved across an organization.
Windows Phone consumer entertainment app to find 'closest match' celebrity results given a face in a photo. App searches from a pre-built database of celebrity images.
We exploit the falloff of visual acuity away from the gaze direction in the human visual system for dynamic 3D rendering. Through user studies, we have honed our system parameters and demonstrated the effectiveness of the system. We have also shown the system to bring significant performance increases, or equivalent reductions in hardware and power requirements, in typical 3D rendering applications on existing hardware. Finally, the method is easily integrated into existing 3D applications.
Microsoft Research Connections partnered with the University of Southern California Annenberg Innovation Lab, Brown University, University of Iowa Digital Studio for Public Humanities, National Endowment for the Humanities, NAMES Project Foundation, and others to create several interactive digital exhibits that allow the public to explore the largest work of community-created folk art in the world.
We study the problem of 3D object reconstruction and recognition. For reconstruction, we aim at developing algorithms and systems to lower down the barrier of 3D reconstruction for common users. In this way, we can collect a world-class 3D object repository via leveraging crowdsourcing. For recognition, we aim at dealing with a large-scale task (e.g. identifying thousands of objects), and providing real-time performance.
"Components for End Users" - Visualization, Data, and UI components that users can put together without needing to write code. Mercury comes with over 200 prebuilt components and hundreds of samples applets, including the Mercury IDE (shown here).