Mobile video is quickly becoming a mass consumer phenomenon. More and more people are using their smartphones to search and browse video contents while on the move. This project is to develop an innovative instant mobile video search system through which users can discover videos by simply pointing their phones at a screen to capture a very few seconds of what they are watching.
Gigapixel ArtZoom is an interactive panoramic image of Seattle that captures artists and performers in action throughout a 360-degree view of the city. You can zoom into the image to see dancers, acrobats, painters, performance artists, actors, jugglers, and sculptors—all appearing simultaneously within a single 20-gigapixel image. Visit the web site to explore the panorama and find out more.
We address the fundamental challenge of scalability for real-time volumetric surface reconstruction methods. We present a memory-efficient, streamable, hierarchical GPU data structure for 3D reconstruction of large-scale scenes with fine geometric details in real time. The system fuses live depth maps from a moving Kinect camera to generate high-quality 3D models of unbounded size.
The goal of the Spin project is to enable users to capture photorealistic 3D models of objects using just an ordinary camera -- with no special lighting, sensors, or other equipment. Our approach works equally well for a mobile phone, a point-and-shoot camera, or a digital SLR camera. The results can be shared and viewed on a phone, in a web browser, or in a desktop application.
We demonstrate a novel method for real-time 3D scene capture and reconstruction. Using several live color images, we build a high resolution voxelization of the visible surfaces. The key to our approach is an efficient sparse voxel representation ideally suited to Graphics Processing Unit (GPU) acceleration. We store only those voxels that contain the visible surfaces, leading to a compact representation for the 3D model.
How can we communicate one's biometric info, in non-verbal ways, to others, ourselves, places and across time? Under the assumptions that one's face is a sound window to somebody’s emotion, we constructed different art pieces and interactive prototypes that comprise different communication channels ( aural, visual, haptics), and aim to help reflecting on communication itself, its poetry, and ourselves. This project starts with the internship work from Tomas Laurenzo http://laurenz.net.
Xiang Cao, Xin Tong, Yang Liu, Takaaki Shiratori, Yupeng Zhang, Zhimin Ren, Teng Han, Nobuyuki Umetani
VidWiki is a project out of MSR India that leverages the crowd to improve the quality and content of online video lectures like those produced by Khan Academy, Coursera, EdX, and Udacity. Through the online platform, users annotate videos by overlaying content on top of the video. Annotations can be typed text, LaTeX equations, shapes, images, or custom pen-drawn notes directly on the canvas.
A set of embeddable apps that let Excel users experiment with new data visualizations against their own data.
Using Blink for Windows Phone 8, you'll never miss the best shot. Blink captures a burst of images that span the moments before and after you press the shutter. No problem if you push the shutter a few moments too early or too late. With Blink, a simple finger swipe lets you rewind or fast-forward.
IllumiRoom is a proof-of-concept system from Microsoft Research. It augments the area surrounding a television screen with projected visualizations to enhance the traditional living room entertainment experience.
The Multimedia, Interaction, and Communication (MIC) Group at Microsoft Research, Redmond, has several openings for summer internships in 2013. We are looking for experienced and highly motivated students in all areas of multimedia signal processing, computer vision, and graphics, including (but not limited to) the follow areas:
Building a 6-dof tracker using the Kinect projector.
Use Kinect to make movies: watch yourself against a virtual background, and interact with virtual props. Scenery may be scripted, and can include scanned models of real world objects. May also be used in 'weatherman' scenarios, green screen / blue screen scenarios, and in making live or recorded presentations (e.g. powerpoint slides, excel charts); elements of the presentation may be enabled for (scripted) interaction by the presenter.
Participating phones are positioned in a stack, each phone held by a different player. The top phone's screen displays its camera's viewpoint, overlaid with a maze incorporating an AR tag, and a ball guided by accelerometers. When the ball is guided to the end of the maze it 'falls' through a hole, at which point the second phone can 'catch' it, but only if it is positioned correctly. At this point the phones swap positions to continue the game.
Use face tracking and a webcam to detect when a user is looking at the monitor. Switch the monitor (and other systems) to sleep mode when no face is tracked or user gaze is not directed at the screen. Use novel visualizations of power saved to incent user to continue using the tool. Provide a server-based reporting tool to allow IT managers to track power saved across an organization.
Windows Phone consumer entertainment app to find 'closest match' celebrity results given a face in a photo. App searches from a pre-built database of celebrity images.
We exploit the falloff of visual acuity away from the gaze direction in the human visual system for dynamic 3D rendering. Through user studies, we have honed our system parameters and demonstrated the effectiveness of the system. We have also shown the system to bring significant performance increases, or equivalent reductions in hardware and power requirements, in typical 3D rendering applications on existing hardware. Finally, the method is easily integrated into existing 3D applications.
Microsoft Research Connections partnered with the University of Southern California Annenberg Innovation Lab, Brown University, University of Iowa Digital Studio for Public Humanities, National Endowment for the Humanities, NAMES Project Foundation, and others to create several interactive digital exhibits that allow the public to explore the largest work of community-created folk art in the world.
We study the problem of 3D object reconstruction and recognition. For reconstruction, we aim at developing algorithms and systems to lower down the barrier of 3D reconstruction for common users. In this way, we can collect a world-class 3D object repository via leveraging crowdsourcing. For recognition, we aim at dealing with a large-scale task (e.g. identifying thousands of objects), and providing real-time performance.
"Components for End Users" - Visualization, Data, and UI components that users can put together without needing to write code. Mercury comes with over 200 prebuilt components and hundreds of samples applets, including the Mercury IDE (shown here).
MSRA-CFW is a data set of celebrity face images collected from the web. Starting from any face image, we obtain its near-duplicate images and associated surrounding texts. Then we detect the dominant people names by matching with a large list of celebrity names from public websites such as Wikipedia. A classifier is applied to further identify the celebrities appearing in the web images. The final dataset contains 202792 faces of 1583 people.
Microsoft Research Cliplets is an interactive app that uses semi-automated methods to give users the power to create "Cliplets" -- a type of imagery that sits between stills and video from handheld videos. The app provides a creative lens one can use to focus on important aspects of a moment by mixing static and dynamic elements from a video clip.
We create Bing homepage style text pop-ups automatically for most of the interesting images on the web
Intensive interest exists in intelligent online 3-D exploration and navigation in urban areas. Thus, understanding the 3-D structures of urban areas from captured images or videos becomes indispensable. Automatic Building Parsing in Urban Areas is a tool that can automatically detect the façades in a single image or in multiple images. Beyond locating the position, it also can compute the geometry of each façade—the orientation of the plane— without human interaction.