The SIGGRAPH conference and exhibition is renowned as the most prestigious forum for the publication of computer graphics research, and this year our researchers will present a broad spectrum of new research, detailed below, developed across our global labs.
Our research presented at SIGGRAPH
The acoustic wave field in a complex scene is a chaotic 7D function of time and the positions of source and listener, making it difficult to compress and interpolate. This hampers precomputed approaches which tabulate impulse responses (IRs) to allow immersive, real-time sound propagation in static scenes. We code the field of time-varying IRs in terms of a few perceptual parameters derived from the IR’s energy decay. The resulting parameter fields are spatially smooth and compressed using a lossless scheme similar to PNG. We show that this encoding removes two of the seven dimensions, making it possible to handle large scenes such as entire game maps within 100MB of memory. Run-time decoding is fast, taking 100μs per source. We introduce an efficient and scalable method for convolutionally rendering acoustic parameters that generates artifact-free audio even for fast motion and sudden changes in reverberance. We demonstrate convincing spatially-varying effects in complex scenes including occlusion/obstruction and reverberation, in our system integrated with Unreal Engine 3TM.
Download the paper (5 MB .pdf)
Controllable High-Fidelity Facial Performance Transfer
Feng Xu, Xin Tong, Microsoft Research Asia
Yilong Liu, Tsinghua University
Jinxiang Chai, Texas A&M University
Recent technological advances in facial capture have made it possible to acquire high-fidelity 3D facial performance data with stunningly high spatial-temporal resolution. Current methods for facial expression transfer, however, are often limited to large-scale facial deformation. This paper introduces a novel facial expression transfer and editing technique for high-fidelity facial performance data. The key idea of our approach is to decompose high-fidelity facial performances into high-level facial feature lines, large-scale facial deformation and fine-scale motion details and transfer them appropriately to reconstruct the retargeted facial animation in an efficient optimization framework. The system also allows the user to quickly modify and control the retargeted facial sequences in the spatial-temporal domain. We demonstrate the power of our approach by transferring and editing high-fidelity facial animation data from high-resolution source models to a wide range of target models, including both human faces and non-human faces such as “monster” and “dog”.
We present a method for converting first-person videos, for example, captured with a helmet camera during activities such as rock climbing or bicycling, into hyper-lapse videos, i.e., timelapse videos with a smoothly moving camera. At high speed-up rates, simple frame sub-sampling coupled with existing video stabilization methods does not work, because the erratic camera shake present in first-person videos is amplified by the speed-up. Our algorithm first reconstructs the 3D input camera path as well as dense, per-frame proxy geometries. We then optimize a novel camera path for the output video that passes near the input cameras while ensuring that the virtual camera looks in directions that can be rendered well from the input. Finally, we generate the novel smoothed, timelapse video by rendering, stitching, and blending appropriately selected source frames for each output frame. We present a number of results for challenging videos that cannot be processed using traditional techniques.
The Visual Microphone: Passive Recovery of Sound from Video
Abe Davis, MIT CSAIL
Michael Rubinstein, Microsoft Research, MIT CSAIL
Neal Wadhwa, MIT CSAIL
Gautham J. Mysore, Adobe Research
Frédo Durand, William T. Freeman, MIT CSAIL
When sound hits an object, it causes small vibrations of the object’s surface. We show how, using only high-speed video of the object, we can extract those minute vibrations and partially recover the sound that produced them, allowing us to turn everyday objects—a glass of water, a potted plant, a box of tissues, or a bag of chips—into visual microphones. We recover sounds from highspeed footage of a variety of objects with different properties, and use both real and simulated data to examine some of the factors that affect our ability to visually recover sound. We evaluate the quality of recovered sounds using intelligibility and SNR metrics and provide input and recovered audio samples for direct comparison. We also explore how to leverage the rolling shutter in regular consumer cameras to recover audio from standard frame-rate videos, and use the spatial resolution of our method to visualize how sound-related vibrations vary over an object’s surface, which we can use to recover the vibration modes of an object.
Download the paper (74 MB .pdf)
Learning to Be a Depth Camera for Close-Range Human Capture and Interaction
Cem Keskin, Shahram Izadi, Pushmeet Kohli, Jamie Shotton, Antonio Criminisi, David Kim, David Sweeney, Microsoft Research Cambridge
Sing Bing Kang, Microsoft Research Redmond
Sean Ryan Fanello, Istituto Italiano di Tecnologia
We present a machine learning technique for estimating absolute, per-pixel depth using any conventional monocular 2D camera, with minor hardware modifications. Our approach targets close-range human capture and interaction where dense 3D estimation of hands and faces is desired. We use hybrid classification-regression forests to learn how to map from near infrared intensity images to absolute, metric depth in real-time. We demonstrate a variety of humancomputer interaction and capture scenarios. Experiments show an accuracy that outperforms a conventional light fall-off baseline, and is comparable to high-quality consumer depth cameras, but with a dramatically reduced cost, power consumption, and form-factor.
Sensitivity-Optimized Rigging for Example-Based Real-Time Clothing Synthesis
Weiwei Xu, Hangzhou Normal University
Nobuyuki Umentani, Autodesk Research, The University of Tokyo
Qianwen Chao, Zhejiang University
Jie Mao, Google, Inc.
Xiaogang Jin, Zhejiang University
Xin Tong, Microsoft Research Asia
We present a real-time solution for generating detailed clothing deformations from pre-computed clothing shape examples. Given an input pose, it synthesizes a clothing deformation by blending skinned clothing deformations of nearby examples controlled by the body skeleton. Observing that cloth deformation can be well modeled with sensitivity analysis driven by the underlying skeleton, we introduce a sensitivity based method to construct a posedependent rigging solution from sparse examples. We also develop a sensitivity based blending scheme to find nearby examples for the input pose and evaluate their contributions to the result. Finally, we propose a stochastic optimization based greedy scheme for sampling the pose space and generating example clothing shapes. Oursolution is fast, compact and can generate realistic clothing animation results for various kinds of clothes in real time.
Download the paper (60 MB .pdf)
Reflectance Scanning: Estimating Shading Frame and BRDF with Generalized Linear Light Sources
Guojun Chen, Tianjin University
Yue Dong, Microsoft Research Asia
Pieter Peers, College Of William & Mary
Jiawan Zhang, Tianjin University
Xin Tong, Microsoft Research Asia
We present a generalized linear light source solution to estimate both the local shading frame and anisotropic surface reflectance of a planar spatially varying material sample.
We generalize linear light source reflectometry by modulating the intensity along the linear light source, and show that a constant and two sinusoidal lighting patterns are sufficient for estimating the local shading frame and anisotropic surface reflectance. We propose a novel reconstruction algorithm based on the key observation that after factoring out the tangent rotation, the anisotropic surface reflectance lies in a low rank subspace. We exploit the differences in tangent rotation between surface points to infer the low rank subspace and fit each surface point’s reflectance function in the projected low rank subspace to the observations. We propose two prototype acquisition devices for capturing surface reflectance that differ on whether the camera is fixed with respect to the linear light source or fixed with respect to the material sample.
We demonstrate convincing results obtained from reflectance scans of surfaces with different reflectance and shading frame variations.
Download the paper (14 MB .pdf)
Image Completion Using Planar Structure Guidance
Jia-Bin Huang, University of Illinois at Urbana-Champaign
Sing Bing Kang, Microsoft Research
Narendra Ahuja, University Of Illinois, Urbana-Champaign
Johannes Kopf, Microsoft Research
We propose a method for automatically guiding patch-based image completion using mid-level structural cues. Our method first estimates planar projection parameters, softly segments the known region into planes, and discovers translational regularity within these planes. This information is then converted into soft constraints for the low-level completion algorithm by defining prior probabilities for patch offsets and transformations. Our method handles multiple planes, and in the absence of any detected planes falls back to a baseline fronto-parallel image completion algorithm. We validate our technique through extensive comparisons with state-of-the-art algorithms on a variety of scenes.
Download the paper (23 MB .pdf)
Real-Time Non-Rigid Reconstruction Using an RGB-D Camera
Michael Zollhöfer, Marc Stamminger, Friedrich-Alexander-Universität Erlangen-Nürnberg
Matthias Nießner, Matthew Fisher, Stanford University
Shahram Izadi, Christoph Rehmann, Christopher Zach, Andrew Fitzgibbon, Microsoft Research Cambridge
Charles Loop, Microsoft Research
Chenglei Wu, Christian Theobalt, Max-Planck-Institut für Informatik
We present a combined hardware and software solution for markerless reconstruction of non-rigidly deforming physical objects with arbitrary shape in real-time. Our system uses a single self-contained stereo camera unit built from off-the-shelf components and consumer graphics hardware to generate spatio-temporally coherent 3D models at 30 Hz. A new stereo matching algorithm estimates real-time RGB-D data. We start by scanning a smooth template model of the subject as they move rigidly. This geometric surface prior avoids strong scene assumptions, such as a kinematic human skeleton or a parametric shape model. Next, a novel GPU pipeline performs non-rigid registration of live RGB-D data to the smooth template using an extended non-linear as-rigid-as-possible (ARAP) framework. High-frequency details are fused onto the final mesh using a linear deformation model. The system is an order of magnitude faster than state-of-the-art methods, while matching the quality and robustness of many offline algorithms. We show precise real-time reconstructions of diverse scenes, including: large deformations of users’ heads, hands, and upper bodies; fine-scale wrinkles and folds of skin and clothing; and non-rigid interactions performed by users on flexible objects such as toys. We demonstrate how acquired models can be used for many interactive scenarios, including re-texturing, online performance capture and preview, and real-time shape and motion re-targeting.