Mano-a-Mano is a unique spatial augmented reality system that combines dynamic projection mapping, multiple perspective views and device-less interaction to support face-to-face, or dyadic, interaction with 3D virtual objects. Its main advantage over more traditional AR approaches is users are able to interact with 3D virtual objects and each other without cumbersome devices that obstruct face to face interaction.
RoomAlive is a proof-of-concept prototype that transforms any room into an immersive, augmented, magical entertainment experience. RoomAlive presents a unified, scalable approach for interactive projection mapping that dynamically adapts content to any room. Users can touch, shoot, stomp, dodge and steer projected content that seamlessly co-exists with their existing physical environment.
Animated computer graphics are projected onto the base of a fiber optic tree to create a sparse 3D display within the tree. This was done as an entry into Microsoft Research's MakeFest and demonstrated on 1/10/2014 to the MSRMakeFest community.
Using the Internet as an (noisy) knowledgebase to mine semantics for multimedia data.
This paper presents a method for acquiring dense nonrigid shape and deformation from a single monocular depth sensor. We focus on modeling the human hand, and assume that a single rough template model is available. We combine and extend existing work on model-based tracking, subdivision surface fitting, and mesh deformation to acquire detailed hand models from as few as 15 frames of depth data.
Online 3D reconstruction is gaining newfound interest due to the availability of real-time consumer depth cameras. The basic problem takes live overlapping depth maps as input and incrementally fuses these into a single 3D model. This is challenging particularly when real-time performance is desired without trading quality or scale. We contribute an online system for large and fine scale volumetric reconstruction based on a memory and speed efficient data structure.
We develop novel eye-gaze tracking technologies in order to make eye-gaze tracking technology ubiquitously available for improved natural user interaction (NUI).
We built the Sketch2Cartoon system, which is an automatic cartoon making system. It enables users to sketch major curves of characters and props in their mind, and real-time search results from millions of clipart images could be selected to compose the cartoon images. The selected com- ponents are vectorized and thus could be further edited. By enabling sketch-based input, even a child who is too young to read or write can draw whatever he/she imagines and get interesting cartoon images.
Microsoft Research in partnership with Bing is happy to launch the second MSR-Bing Challenge on Image Retrieval. Do you have what it takes to build the best image retrieval system? Enter the MSR-Bing Image Retrieval Challenge in ACM Multimedia and/or ICME to develop an image scoring system for a search query. Last Challenge: MSR-Bing IRC @ ACM Multimedia 2014. Current Challenge: MSR-Bing IRC @ ICME 2015. Next Challenge: MSR-Bing IRC @ ACM Multimedia 2015
We argue that the massive amount of click data from commercial search engines provides a data set that is unique in the bridging of the semantic and intent gap. Search engines generate millions of click data (a.k.a. image-query pairs), which provide almost "unlimited" yet strong connections between semantics and images, as well as connections between users' intents and queries. This site is to introduce such as dataset, Clickture.
Mobile video is quickly becoming a mass consumer phenomenon. More and more people are using their smartphones to search and browse video contents while on the move. This project is to develop an innovative instant mobile video search system through which users can discover videos by simply pointing their phones at a screen to capture a very few seconds of what they are watching.
Gigapixel ArtZoom is an interactive panoramic image of Seattle that captures artists and performers in action throughout a 360-degree view of the city. You can zoom into the image to see dancers, acrobats, painters, performance artists, actors, jugglers, and sculptors—all appearing simultaneously within a single 20-gigapixel image. Visit the web site to explore the panorama and find out more.
We address the fundamental challenge of scalability for real-time volumetric surface reconstruction methods. We present a memory-efficient, streamable, hierarchical GPU data structure for 3D reconstruction of large-scale scenes with fine geometric details in real time. The system fuses live depth maps from a moving Kinect camera to generate high-quality 3D models of unbounded size.
The goal of the Spin project is to enable users to capture photorealistic 3D models of objects using just an ordinary camera -- with no special lighting, sensors, or other equipment. Our approach works equally well for a mobile phone, a point-and-shoot camera, or a digital SLR camera. The results can be shared and viewed on a phone, in a web browser, or in a desktop application.
We demonstrate a novel method for real-time 3D scene capture and reconstruction. Using several live color images, we build a high resolution voxelization of the visible surfaces. The key to our approach is an efficient sparse voxel representation ideally suited to Graphics Processing Unit (GPU) acceleration. We store only those voxels that contain the visible surfaces, leading to a compact representation for the 3D model.
How can we communicate one's biometric info, in non-verbal ways, to others, ourselves, places and across time? Under the assumptions that one's face is a sound window to somebody’s emotion, we constructed different art pieces and interactive prototypes that comprise different communication channels ( aural, visual, haptics), and aim to help reflecting on communication itself, its poetry, and ourselves. This project starts with the internship work from Tomas Laurenzo http://laurenz.net.
Xiang Cao, Xin Tong, Yang Liu, Takaaki Shiratori, Yupeng Zhang, Zhimin Ren, Teng Han, Nobuyuki Umetani
VidWiki is a project out of MSR India that leverages the crowd to improve the quality and content of online video lectures like those produced by Khan Academy, Coursera, EdX, and Udacity. Through the online platform, users annotate videos by overlaying content on top of the video. Annotations can be typed text, LaTeX equations, shapes, images, or custom pen-drawn notes directly on the canvas.
A set of embeddable apps that let Excel users experiment with new data visualizations against their own data.
Using Blink for Windows Phone 8, you'll never miss the best shot. Blink captures a burst of images that span the moments before and after you press the shutter. No problem if you push the shutter a few moments too early or too late. With Blink, a simple finger swipe lets you rewind or fast-forward.
IllumiRoom is a proof-of-concept system from Microsoft Research. It augments the area surrounding a television screen with projected visualizations to enhance the traditional living room entertainment experience.
The Multimedia, Interaction, and Communication (MIC) Group at Microsoft Research, Redmond, has several openings for summer internships in 2013. We are looking for experienced and highly motivated students in all areas of multimedia signal processing, computer vision, and graphics, including (but not limited to) the follow areas:
Building a 6-dof tracker using the Kinect projector.
Use Kinect to make movies: watch yourself against a virtual background, and interact with virtual props. Scenery may be scripted, and can include scanned models of real world objects. May also be used in 'weatherman' scenarios, green screen / blue screen scenarios, and in making live or recorded presentations (e.g. powerpoint slides, excel charts); elements of the presentation may be enabled for (scripted) interaction by the presenter.
Participating phones are positioned in a stack, each phone held by a different player. The top phone's screen displays its camera's viewpoint, overlaid with a maze incorporating an AR tag, and a ball guided by accelerometers. When the ball is guided to the end of the maze it 'falls' through a hole, at which point the second phone can 'catch' it, but only if it is positioned correctly. At this point the phones swap positions to continue the game.