Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Our research
Content type
+
Downloads (455)
+
Events (449)
 
Groups (151)
+
News (2743)
 
People (735)
 
Projects (1106)
+
Publications (12579)
+
Videos (5741)
Labs
Research areas
Algorithms and theory47205 (341)
Communication and collaboration47188 (215)
Computational linguistics47189 (243)
Computational sciences47190 (223)
Computer systems and networking47191 (761)
Computer vision208594 (911)
Data mining and data management208595 (106)
Economics and computation47192 (105)
Education47193 (85)
Gaming47194 (77)
Graphics and multimedia47195 (233)
Hardware and devices47196 (216)
Health and well-being47197 (92)
Human-computer interaction47198 (894)
Machine learning and intelligence47200 (893)
Mobile computing208596 (57)
Quantum computing208597 (32)
Search, information retrieval, and knowledge management47199 (691)
Security and privacy47202 (311)
Social media208598 (46)
Social sciences47203 (264)
Software development, programming principles, tools, and languages47204 (623)
Speech recognition, synthesis, and dialog systems208599 (138)
Technology for emerging markets208600 (32)
1–25 of 233
Sort
Show 25 | 50 | 100
1234567Next 
Dongwook Yoon, Nicholas Chen, François Guimbretière, and Abigail Sellen

This paper introduces a novel document annotation system that aims to enable the kinds of rich communication that usually only occur in face-to-face meetings. Our system, RichReview, lets users create annotations on top of digital documents using three main modalities: freeform inking, voice for narration, and deictic gestures in support of voice. RichReview uses novel visual representations and timesynchronization between modalities to simplify annotation access and navigation. Moreover, RichReview’s...

Publication details
Date: 1 October 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Gerard Pons-Moll, Jonathan Taylor, Jamie Shotton, Aaron Hertzmann, and Andrew Fitzgibbon

We present a new method for inferring dense data to model correspondences, focusing on the application of human pose estimation from depth images. Recent work proposed the use of regression forests to quickly predict correspondences between depth pixels and points on a 3D human mesh model. That work, however, used a proxy forest training objective based on the classification of depth pixels to body parts. In contrast, we introduce Metric Space Information Gain (MSIG), a new decision forest training...

Publication details
Date: 1 August 2015
Type: Article
Publisher: Springer
Kyungmin Lee, David Chu, Eduardo Cuervo, Johannes Kopf, Yury Degtyarev, Sergey Grizan, Alec Wolman, and Jason Flinn

Gaming on phones, tablets and laptops is very popular. Cloud gaming -- where remote servers perform game execution and rendering on behalf of thin clients that simply send input and display output frames -- promises any device the ability to play any game any time. Unfortunately, the reality is that wide-area network latencies are often prohibitive; cellular, Wi-Fi and even wired residential end host round trip times (RTTs) can exceed 100ms, a threshold above which many gamers tend to deem...

Publication details
Date: 3 June 2015
Type: Proceedings
Publisher: ACM – Association for Computing Machinery
Sameh Khamis, Jonathan Taylor, Jamie Shotton, Cem Keskin, Shahram Izadi, and Andrew Fitzgibbon

We describe how to learn a compact and efficient model of the surface deformation of human hands. The model is built from a set of noisy depth images of a diverse set of subjects performing different poses with their hands. We represent the observed surface using Loop subdivision of a control mesh that is deformed by our learned parametric shape and pose model. The model simultaneously accounts for variation in subject-specific shape and subject-agnostic pose. Specifically, hand shape is...

Publication details
Date: 1 June 2015
Type: Inproceeding
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Jarrod Knibbe, Hrvoje Benko, and Andrew D. Wilson

Projector-camera (pro-cam) systems afford a wide range of interactive possibilities, combining both natural and mixed-reality 3D interaction. However, the latency inherent within these systems can cause the projection to ‘slip’ from its intended target, detracting from the overall experience. Because of this, pro-cam systems have typically shied away from truly dynamic scenarios. In turn, research has been exploring latency reduction techniques across a range of domains, but these techniques typically...

Publication details
Date: 11 May 2015
Type: Technical report
Publisher: Microsoft Research
Number: MSR-TR-2015-35
Felicia Lim, Mark R. P. Thomas, and Ivan J. Tashev

Reverberation time is an important parameter for characterizing acoustic environments. It is useful in many applications including acoustic scene analysis, robust automatic speech recognition and dereverberation. Given knowledge of the acoustic impulse response, reverberation time can be measured using Schroeder’s backward integration method. Since it is not always practical to obtain impulse responses, blind estimation algorithms are sometimes desirable. In this work, the reverberation problem is...

Publication details
Date: 21 April 2015
Type: Inproceeding
Publisher: IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)
Dinei Florencio and Zhengyou Zhang

Estimating room impulse responses (RIRs) has a number of applications, including personalized audio, analyzing and improving acoustic behavior of concert halls, listening room compensation, sound source localization, and many others. RIRs have been estimated in essentially the same fashion for the last 50 years: Compute the cross correlation between a signal played at point A, and the signal received at point B. Best results are obtained when the signal played is white noise, or a maximum length...

Publication details
Date: 1 April 2015
Type: Inproceeding
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Cha Zhang, Dinei Florencio, and Philip Chou

This theoretical paper aims to provide a probabilistic framework for graph signal processing. By modeling signals on graphs as Gaussian Markov Random Fields, we present numerous important aspects of graph signal processing, including graph construction, graph transform, graph downsampling, graph prediction, and graph-based regularization, from a probabilistic point of view. As examples, we discuss a number of methods for constructing graphs based on statistics from input data sets; we show that the...

Publication details
Date: 1 April 2015
Type: Technical report
Publisher: Microsoft Research
Number: MSR-TR-2015-31
Keita Higuchi, Yinpeng Chen, Philip A. Chou, Zhengyou Zhang, and Zicheng Liu

ImmerseBoard is a system for remote collaboration through a digital whiteboard that gives participants a 3D immersive experience, enabled only by an RGBD camera (Microsoft Kinect) mounted on the side of a large touch display. Using 3D processing of the depth images, life-sized rendering, and novel visualizations, ImmerseBoard emulates writing side-by-side on a physical whiteboard, or alternatively on a mirror. User studies involving three tasks show that compared to standard video conferencing with a...

Publication details
Date: 1 April 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Lijun Zhu and Dinei Florencio

Parametric speakers produce sound by emitting ultrasound, and using the small nonlinearity in air to demodulate it back to audible sound. The use of ultrasound allows for producing very narrow audio beams, which finds application in a number of military and consumer scenarios. However, designing better parametric speakers has been hard: closed-form solution of the nonlinear wave equation for generic geometries is nearly impossible, and the only existing solution was derived for the simple case of a...

Publication details
Date: 1 April 2015
Type: Inproceeding
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Ivan J. Tashev

Voice Activity Detectors (VAD) play important role in audio processing algorithms. Most of the algorithms are designed to be causal, i.e. to work in real time using only current and past audio samples. Off-line processing, when we have access to the entire voice utterance, allows using different type of approaches for increased precision. In this paper we propose an algorithm for off-line VAD based on the different probability density functions (PDFs) of the speech and noise. While a Gaussian...

Publication details
Date: 3 February 2015
Type: Inproceeding
Publisher: University of California - San Diego
Hao Cui, Ruiqin Xiong, Chong Luo, Zhihai Song, and Feng Wu
Publication details
Date: 1 February 2015
Type: Article
Number: 1
Xian-Sheng Hua and Jin Li

With the advances in distributed computation, machine learning and deep neural networks, we enter into an era that it is possible to build a real world image recognition system. There are three essential components to build a real-world image recognition system: 1) creating representative features, 2) de-signing powerful learning approaches, and 3) identifying massive training data. While extensive researches have been done on the first two aspects, much less attention has been paid on the third. In...

Publication details
Date: 1 January 2015
Type: Inproceeding
Publisher: AAAI - Association for the Advancement of Artificial Intelligence
Bin Li and Jizheng Xu

This document presents an efficient screen content coding scheme based on HEVC framework. The major techniques in the scheme includes hash-based large-scale block matching, dictionary mode, palette mode, adaptive color space coding, and several improvements to intra block copy mode. This scheme was submitted as our response to the joint Call for Proposals (CfP) for coding of screen content issued by ISO/IEC JCT1/SC29/WG11, i.e. MPEG and ITU-T Q6/16, i.e. VCEG. Compared with other coding schemes in...

Publication details
Date: 1 January 2015
Type: Technical report
Number: MSR-TR-2015-3
Phil Pitts, Arrigo Benedetti, Malcolm Slaney, and Phil Chou

This document describes a ray tracing application that calculates backscatter information useful in understanding time-of-flight phenomena. This system is particularly useful in understanding the effects of multipath (or global illumination) in time-of-flight (ToF) camera depth calculations. The Time-of-Flight Tracer (ToF Tracer) system is based on path tracing. In path tracing, rays are traced from their point of origin until they intersect with scene geometry. At these intersections, the ray...

Publication details
Date: 8 November 2014
Type: Technical report
Publisher: Microsoft Research
Number: MSR-TR-2014-142
Tianjun Xiao, Jiaxing Zhang, Kuiyuan Yang, Yuxin Peng, and Zheng Zhang

Supervised learning using deep convolutional neural network has shown its promise in large-scale image classification task. As a building block, it is now well positioned to be part of a larger system that tackles real-life multimedia tasks. An unresolved issue is that such model is trained on a static snapshot of data. Instead, this paper positions the training as a continuous learning process as new classes of data arrive. A system with such capability is useful in practical scenarios, as it gradually...

Publication details
Date: 1 November 2014
Type: Inproceeding
Xiao Lin Liu, Wenjun Hu, Chong Luo, Qifan Pu, Feng Wu, and Yongguang Zhang
Publication details
Date: 1 November 2014
Type: Article
Number: 7
Pengfei Wan, Gene Cheung, Dinei Florencio, Cha Zhang, and Oscar Au

While modern displays offer high dynamic range (HDR) with large bit-depth for each rendered pixel, the bulk of legacy image and video contents were captured using cameras with shallower bit-depth. In this paper, we study the bit-depth enhancement problem for images, so that a high bit-depth (HBD) image can be reconstructed from an input low bit-depth (LBD) image. The key idea is to apply appropriate smoothing given the constraints that reconstructed signal must lie within the per-pixel quantization...

Publication details
Date: 1 October 2014
Type: Inproceeding
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Philippe Moquin, Kevin Venalainen, and Dinei Florencio

Publication details
Date: 1 October 2014
Type: Article
Publisher: Acoustical Society of America
Cha Zhang, Dinei Florencio, and Charles Loop

Compressing attributes on 3D point clouds such as colors or normal directions has been a challenging problem, since these attribute signals are unstructured. In this paper, we propose to compress such attributes with graph transform. We construct graphs on small neighborhoods of the point cloud by connecting nearby points, and treat the attributes as signals over the graph. The graph transform, which is equivalent to Kahunen-Loeve Transform on such graphs, is then adopted to decorrelate the signal....

Publication details
Date: 1 October 2014
Type: Inproceeding
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Daniel Khashabi, Sebastian Nowozin, Jeremy Jancsary, and Andrew Fitzgibbon

We introduce a machine learning approach to demosaicing, the reconstruction of color images from incomplete color filter array samples. There are two challenges to overcome by a demosaicing method: first, it needs to model and respect the statistics of natural images in order to reconstruct natural looking images; second, it needs to be able to perform well in the presence of noise. To facilitate an objective assessment of current methods we introduce a public ground truth data set of natural images...

Publication details
Date: 1 October 2014
Type: Article
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Ismael Daribo, Dinei Florencio, and Gene Cheung

Depth image compression is important for compact representation of 3D visual data in texture-plus-depth format, where texture and depth maps from one or more viewpoints are encoded and transmitted. A decoder can then synthesize a freely chosen virtual view via depth-image-based rendering using nearby coded texture and depth maps as reference. Further, depth information can be used in other image processing applications beyond view synthesis, such as object identification, segmentation, and so on. In...

Publication details
Date: 23 September 2014
Type: Article
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Mark R. P. Thomas, Felicia Lim, Ivan J. Tashev, and Patrick A. Naylor

Signals captured by microphone arrays provide spatial diversity that can be exploited by multichannel processing algorithms to suppress noise and reverberation. Beamforming is a class of approaches that treats the problem with respect to the spatial location of wanted and competing sources, leveraging properties of propagation of waves in free space. A related class of algorithms is channel equalization that exploits knowledge of the acoustic impulse response between a source and microphones with a view...

Publication details
Date: 9 September 2014
Type: Inproceeding
Publisher: International Workshop on Acoustic Signal Enhancement (IWAENC)
Nathan Silberman, Lior Shapira, Ran Gal, and Pushmeet Kohli

The availability of commodity depth sensors such as Kinect has enabled development of methods which can densely reconstruct arbitrary scenes. While the results of these methods are accurate and visually appealing, they are quite often incomplete. This is either due to the fact that only part of the space was visible during the data capture process or due to the surfaces being occluded by other objects in the scene. In this paper, we address the problem of completing and refining such reconstructions. We...

Publication details
Date: 6 September 2014
Type: Proceedings
Publisher: Springer
Ran Gal, Lior Shapira, Eyal Ofek, and Pushmeet Kohli

Creating a layout for an augmented reality (AR) application which embeds virtual objects in a physical environment is difficult as it must adapt to any physical space. We propose a rule-based framework for generating object layouts for AR applications. Under our framework, the developer of an AR application specifies a set of rules (constraints) which enforce self-consistency (rules regarding the inter-relationships of application components) and scene consistency (application components are consistent...

Publication details
Date: 6 September 2014
Type: Proceedings
Publisher: IEEE – Institute of Electrical and Electronics Engineers
1–25 of 233
Sort
Show 25 | 50 | 100
1234567Next 
> Our research