Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Our research
Content type
+
Downloads (448)
+
Events (422)
 
Groups (145)
+
News (2657)
 
People (737)
 
Projects (1069)
+
Publications (12193)
+
Videos (5463)
Labs
Research areas
Algorithms and theory47205 (294)
Communication and collaboration47188 (196)
Computational linguistics47189 (194)
Computational sciences47190 (199)
Computer systems and networking47191 (707)
Computer vision208594 (885)
Data mining and data management208595 (82)
Economics and computation47192 (98)
Education47193 (79)
Gaming47194 (71)
Graphics and multimedia47195 (216)
Hardware and devices47196 (201)
Health and well-being47197 (82)
Human-computer interaction47198 (810)
Machine learning and intelligence47200 (802)
Mobile computing208596 (38)
Quantum computing208597 (20)
Search, information retrieval, and knowledge management47199 (639)
Security and privacy47202 (280)
Social media208598 (29)
Social sciences47203 (248)
Software development, programming principles, tools, and languages47204 (572)
Speech recognition, synthesis, and dialog systems208599 (98)
Technology for emerging markets208600 (28)
1–25 of 216
Sort
Show 25 | 50 | 100
1234567Next 
Lijun Zhu and Dinei Florencio

Parametric speakers produce sound by emitting ultrasound, and using the small nonlinearity in air to demodulate it back to audible sound. The use of ultrasound allows for producing very narrow audio beams, which finds application in a number of military and consumer scenarios. However, designing better parametric speakers has been hard: closed-form solution of the nonlinear wave equation for generic geometries is nearly impossible, and the only existing solution was derived for the simple case of a...

Publication details
Date: 1 April 2015
Type: Inproceeding
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Dinei Florencio and Zhengyou Zhang

Estimating room impulse responses (RIRs) has a number of applications, including personalized audio, analyzing and improving acoustic behavior of concert halls, listening room compensation, sound source localization, and many others. RIRs have been estimated in essentially the same fashion for the last 50 years: Compute the cross correlation between a signal played at point A, and the signal received at point B. Best results are obtained when the signal played is white noise, or a maximum length...

Publication details
Date: 1 April 2015
Type: Inproceeding
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Ivan J. Tashev

Voice Activity Detectors (VAD) play important role in audio processing algorithms. Most of the algorithms are designed to be causal, i.e. to work in real time using only current and past audio samples. Off-line processing, when we have access to the entire voice utterance, allows using different type of approaches for increased precision. In this paper we propose an algorithm for off-line VAD based on the different probability density functions (PDFs) of the speech and noise. While a Gaussian...

Publication details
Date: 3 February 2015
Type: Inproceeding
Publisher: University of California - San Diego
Xian-Sheng Hua and Jin Li

With the advances in distributed computation, machine learning and deep neural networks, we enter into an era that it is possible to build a real world image recognition system. There are three essential components to build a real-world image recognition system: 1) creating representative features, 2) de-signing powerful learning approaches, and 3) identifying massive training data. While extensive researches have been done on the first two aspects, much less attention has been paid on the third. In...

Publication details
Date: 1 January 2015
Type: Inproceeding
Publisher: AAAI - Association for the Advancement of Artificial Intelligence
Bin Li and Jizheng Xu

This document presents an efficient screen content coding scheme based on HEVC framework. The major techniques in the scheme includes hash-based large-scale block matching, dictionary mode, palette mode, adaptive color space coding, and several improvements to intra block copy mode. This scheme was submitted as our response to the joint Call for Proposals (CfP) for coding of screen content issued by ISO/IEC JCT1/SC29/WG11, i.e. MPEG and ITU-T Q6/16, i.e. VCEG. Compared with other coding schemes in...

Publication details
Date: 1 January 2015
Type: Technical report
Number: MSR-TR-2015-3
Phil Pitts, Arrigo Benedetti, Malcolm Slaney, and Phil Chou

This document describes a ray tracing application that calculates backscatter information useful in understanding time-of-flight phenomena. This system is particularly useful in understanding the effects of multipath (or global illumination) in time-of-flight (ToF) camera depth calculations. The Time-of-Flight Tracer (ToF Tracer) system is based on path tracing. In path tracing, rays are traced from their point of origin until they intersect with scene geometry. At these intersections, the ray...

Publication details
Date: 8 November 2014
Type: Technical report
Publisher: Microsoft Research
Number: MSR-TR-2014-142
Tianjun Xiao, Jiaxing Zhang, Kuiyuan Yang, Yuxin Peng, and Zheng Zhang

Supervised learning using deep convolutional neural network has shown its promise in large-scale image classification task. As a building block, it is now well positioned to be part of a larger system that tackles real-life multimedia tasks. An unresolved issue is that such model is trained on a static snapshot of data. Instead, this paper positions the training as a continuous learning process as new classes of data arrive. A system with such capability is useful in practical scenarios, as it gradually...

Publication details
Date: 1 November 2014
Type: Inproceeding
Philippe Moquin, Kevin Venalainen, and Dinei Florencio

Publication details
Date: 1 October 2014
Type: Article
Publisher: Acoustical Society of America
Daniel Khashabi, Sebastian Nowozin, Jeremy Jancsary, and Andrew Fitzgibbon

We introduce a machine learning approach to demosaicing, the reconstruction of color images from incomplete color filter array samples. There are two challenges to overcome by a demosaicing method: first, it needs to model and respect the statistics of natural images in order to reconstruct natural looking images; second, it needs to be able to perform well in the presence of noise. To facilitate an objective assessment of current methods we introduce a public ground truth data set of natural images...

Publication details
Date: 1 October 2014
Type: Article
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Cha Zhang, Dinei Florencio, and Charles Loop

Compressing attributes on 3D point clouds such as colors or normal directions has been a challenging problem, since these attribute signals are unstructured. In this paper, we propose to compress such attributes with graph transform. We construct graphs on small neighborhoods of the point cloud by connecting nearby points, and treat the attributes as signals over the graph. The graph transform, which is equivalent to Kahunen-Loeve Transform on such graphs, is then adopted to decorrelate the signal....

Publication details
Date: 1 October 2014
Type: Inproceeding
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Pengfei Wan, Gene Cheung, Dinei Florencio, Cha Zhang, and Oscar Au

While modern displays offer high dynamic range (HDR) with large bit-depth for each rendered pixel, the bulk of legacy image and video contents were captured using cameras with shallower bit-depth. In this paper, we study the bit-depth enhancement problem for images, so that a high bit-depth (HBD) image can be reconstructed from an input low bit-depth (LBD) image. The key idea is to apply appropriate smoothing given the constraints that reconstructed signal must lie within the per-pixel quantization...

Publication details
Date: 1 October 2014
Type: Inproceeding
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Ismael Daribo, Dinei Florencio, and Gene Cheung

Depth image compression is important for compact representation of 3D visual data in texture-plus-depth format, where texture and depth maps from one or more viewpoints are encoded and transmitted. A decoder can then synthesize a freely chosen virtual view via depth-image-based rendering using nearby coded texture and depth maps as reference. Further, depth information can be used in other image processing applications beyond view synthesis, such as object identification, segmentation, and so on. In...

Publication details
Date: 23 September 2014
Type: Article
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Mark R. P. Thomas, Felicia Lim, Ivan J. Tashev, and Patrick A. Naylor

Signals captured by microphone arrays provide spatial diversity that can be exploited by multichannel processing algorithms to suppress noise and reverberation. Beamforming is a class of approaches that treats the problem with respect to the spatial location of wanted and competing sources, leveraging properties of propagation of waves in free space. A related class of algorithms is channel equalization that exploits knowledge of the acoustic impulse response between a source and microphones with a view...

Publication details
Date: 9 September 2014
Type: Inproceeding
Publisher: International Workshop on Acoustic Signal Enhancement (IWAENC)
Ran Gal, Lior Shapira, Eyal Ofek, and Pushmeet Kohli

Creating a layout for an augmented reality (AR) application which embeds virtual objects in a physical environment is difficult as it must adapt to any physical space. We propose a rule-based framework for generating object layouts for AR applications. Under our framework, the developer of an AR application specifies a set of rules (constraints) which enforce self-consistency (rules regarding the inter-relationships of application components) and scene consistency (application components are consistent...

Publication details
Date: 6 September 2014
Type: Proceedings
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Nathan Silberman, Lior Shapira, Ran Gal, and Pushmeet Kohli

The availability of commodity depth sensors such as Kinect has enabled development of methods which can densely reconstruct arbitrary scenes. While the results of these methods are accurate and visually appealing, they are quite often incomplete. This is either due to the fact that only part of the space was visible during the data capture process or due to the surfaces being occluded by other objects in the scene. In this paper, we address the problem of completing and refining such reconstructions. We...

Publication details
Date: 6 September 2014
Type: Proceedings
Publisher: Springer
X. Xiong, Q. Cai, Z. Liu, and Z. Zhang

Most commercial eye gaze tracking systems are based on the use of infrared lights. However, such systems may not work outdoor or may have a very limited head box for them to work. This paper proposes a non-infrared based approach to track one's eye gaze with an RGBD camera (in our case, Kinect). The proposed method adopts a personalized 3D face model constructed off-line. To detect the eye gaze, our system tracks the iris center and a set of 2D facial landmarks whose 3D locations are provided by the...

Publication details
Date: 1 September 2014
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Mar Gonzalez-Franco and Philip A. Chou

In this paper, we further the characterization of a fundamental limit of human perception: the accuracy of human estimation of others’ eye gaze directions. In particular, we introduce a non-linear model that describes how both the head direction and the gaze direction of a looker relative to an observer jointly affect the observer’s perception of the looker’s gaze direction. Ours is the first to explain in a single model the biases introduced by the looker’s head direction, the relative accuracy of eye...

Publication details
Date: 1 September 2014
Type: Inproceeding
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Kyungmin Lee, David Chu, Eduardo Cuervo, Johannes Kopf, Sergey Grizan, Alec Wolman, and Jason Flinn

Gaming is very popular. Cloud gaming – where remote servers perform game execution and rendering on behalf of thin clients that simply send input and display output frames – promises any device the ability to play any game any time. Unfortunately, the reality is that wide-area network latencies are often prohibitive; cellular, Wi-Fi and even wired residential end host round trip times (RTTs) can exceed 100ms, a threshold above which many gamers tend to deem responsiveness unacceptable.

In this...

Publication details
Date: 21 August 2014
Type: Technical report
Publisher: Microsoft Research
Number: MSR-TR-2014-115
Johannes Kopf, Michael Cohen, and Richard Szeliski
Publication details
Date: 1 August 2014
Type: Article
Publisher: ACM – Association for Computing Machinery
Number: 4
Wenxiu Sun, Gene Cheung, Phil Chou, Dinei Florencio, Cha Zhang, and Oscar Au

Transmitting compactly represented geometry of a dynamic 3D scene from a sender can enable a multitude of imaging functionalities at a receiver, such as synthesis of virtual images at freely chosen viewpoints via depth-image-based rendering. While depth maps-projections of 3D geometry onto 2D image planes at chosen camera viewpoints-can nowadays be readily captured by inexpensive depth sensors, they are often corrupted by non-negligible acquisition noise. Given depth maps need to be denoised and...

Publication details
Date: 1 July 2014
Type: Article
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Sudipta Sinha, Daniel Scharstein, and Richard Szeliski

We present a stereo algorithm designed for speed and efficiency that uses local slanted plane sweeps to propose disparity hypotheses for a semi-global matching algorithm. Our local plane hypotheses are derived from initial sparse feature correspondences followed by an iterative clustering step. Local plane sweeps are then performed around each slanted plane to produce out-of-plane parallax and matching-cost estimates. A final global optimization stage, implemented using semi-global matching, assigns...

Publication details
Date: 21 June 2014
Type: Inproceeding
Publisher: Computer Vision and Patter Recognition
Jaesik Park, Sudipta N. Sinha, Yasuyuki Matsushita, Yu-Wing Tai, and In So Kweon

We show that a non-isotropic near point light source rigidly attached to a camera can be calibrated using multiple images of a weakly textured planar scene. We prove that if the radiant intensity distribution (RID) of a light source is radially symmetric with respect to its dominant direction, then the shading observed on a Lambertian scene plane is bilaterally symmetric with respect to a 2D line on the plane. The symmetry axis detected in an image provides a linear constraint for estimating the...

Publication details
Date: 21 June 2014
Type: Inproceeding
Publisher: Computer Vision and Patter Recognition
Abner Guzman-Rivera, Pushmeet Kohli, Ben Glocker, Jamie Shotton, Toby Sharp, Andrew Fitzgibbon, and Shahram Izadi

Publication details
Date: 1 June 2014
Type: Inproceeding
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Neel Joshi and C. Lawrence Zitnick

Tradeoffs exist between the baseline or distance between cameras and the difficulty of matching corresponding points in stereo and structure from motion. Smaller baselines result in reduced disparities reducing the accuracy of depth estimation. Larger baselines increase the range of observed disparities, but also increase the difficulty of finding corresponding points. In this paper, we explore the use of very small baselines, called micro-baselines. Microbaselines, typically just a few millimeters,...

Publication details
Date: 22 May 2014
Type: Technical report
Publisher: Microsoft Research Technical Report
Number: MSR-TR-2014-73
Ivan Dokmanic and Ivan Tashev

Depth imaging is commonly based on light. For example, LIDAR and Kinect use infrared light, while stereo cameras use visible light. These systems require hardware operating at high sampling frequencies, precise calibration, and they dissipate significant power. In this paper, we investigate the potential of ultrasound for image and depth acquisition, with applications to human-computer interaction and skeletal tracking in mind. We use a loudspeaker array and a microphone array to sense the scene. We...

Publication details
Date: 9 May 2014
Type: Inproceeding
Publisher: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
1–25 of 216
Sort
Show 25 | 50 | 100
1234567Next 
> Our research