Our research
Content type
+
Downloads (441)
+
Events (396)
 
Groups (150)
+
News (2592)
 
People (804)
 
Projects (1066)
+
Publications (12006)
+
Videos (5240)
Labs
Research areas
Algorithms and theory47205 (268)
Communication and collaboration47188 (187)
Computational linguistics47189 (186)
Computational sciences47190 (197)
Computer systems and networking47191 (680)
Computer vision208594 (47)
Data mining and data management208595 (64)
Economics and computation47192 (95)
Education47193 (79)
Gaming47194 (69)
Graphics and multimedia47195 (199)
Hardware and devices47196 (196)
Health and well-being47197 (77)
Human-computer interaction47198 (779)
Machine learning and intelligence47200 (722)
Mobile computing208596 (33)
Quantum computing208597 (19)
Search, information retrieval, and knowledge management47199 (618)
Security and privacy47202 (268)
Social media208598 (21)
Social sciences47203 (240)
Software development, programming principles, tools, and languages47204 (556)
Speech recognition, synthesis, and dialog systems208599 (73)
Technology for emerging markets208600 (25)
1–25 of 199
Sort
Show 25 | 50 | 100
1234567Next 
Pengfei Wan, Gene Cheung, Dinei Florencio, Cha Zhang, and Oscar Au

While modern displays offer high dynamic range (HDR) with large bit-depth for each rendered pixel, the bulk of legacy image and video contents were captured using cameras with shallower bit-depth. In this paper, we study the bit-depth enhancement problem for images, so that a high bit-depth (HBD) image can be reconstructed from an input low bit-depth (LBD) image. The key idea is to apply appropriate smoothing given the constraints that reconstructed signal must lie within the per-pixel quantization...

Publication details
Date: 1 October 2014
Type: Inproceeding
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Cha Zhang, Dinei Florencio, and Charles Loop

Compressing attributes on 3D point clouds such as colors or normal directions has been a challenging problem, since these attribute signals are unstructured. In this paper, we propose to compress such attributes with graph transform. We construct graphs on small neighborhoods of the point cloud by connecting nearby points, and treat the attributes as signals over the graph. The graph transform, which is equivalent to Kahunen-Loeve Transform on such graphs, is then adopted to decorrelate the signal....

Publication details
Date: 1 October 2014
Type: Inproceeding
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Mark R. P. Thomas, Felicia Lim, Ivan J. Tashev, and Patrick A. Naylor

Signals captured by microphone arrays provide spatial diversity that can be exploited by multichannel processing algorithms to suppress noise and reverberation. Beamforming is a class of approaches that treats the problem with respect to the spatial location of wanted and competing sources, leveraging properties of propagation of waves in free space. A related class of algorithms is channel equalization that exploits knowledge of the acoustic impulse response between a source and microphones with a view...

Publication details
Date: 9 September 2014
Type: Inproceeding
Publisher: International Workshop on Acoustic Signal Enhancement (IWAENC)
Ran Gal, Lior Shapira, Eyal Ofek, and Pushmeet Kohli

Creating a layout for an augmented reality (AR) application which embeds virtual objects in a physical environment is difficult as it must adapt to any physical space. We propose a rule-based framework for generating object layouts for AR applications. Under our framework, the developer of an AR application specifies a set of rules (constraints) which enforce self-consistency (rules regarding the inter-relationships of application components) and scene consistency (application components are consistent...

Publication details
Date: 6 September 2014
Type: Proceedings
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Nathan Silberman, Lior Shapira, Ran Gal, and Pushmeet Kohli

The availability of commodity depth sensors such as Kinect has enabled development of methods which can densely reconstruct arbitrary scenes. While the results of these methods are accurate and visually appealing, they are quite often incomplete. This is either due to the fact that only part of the space was visible during the data capture process or due to the surfaces being occluded by other objects in the scene. In this paper, we address the problem of completing and refining such reconstructions. We...

Publication details
Date: 6 September 2014
Type: Proceedings
Publisher: Springer
X. Xiong, Q. Cai, Z. Liu, and Z. Zhang

Most commercial eye gaze tracking systems are based on the use of infrared lights. However, such systems may not work outdoor or may have a very limited head box for them to work. This paper proposes a non-infrared based approach to track one's eye gaze with an RGBD camera (in our case, Kinect). The proposed method adopts a personalized 3D face model constructed off-line. To detect the eye gaze, our system tracks the iris center and a set of 2D facial landmarks whose 3D locations are provided by the...

Publication details
Date: 1 September 2014
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Mar Gonzalez-Franco and Philip A. Chou

In this paper, we further the characterization of a fundamental limit of human perception: the accuracy of human estimation of others’ eye gaze directions. In particular, we introduce a non-linear model that describes how both the head direction and the gaze direction of a looker relative to an observer jointly affect the observer’s perception of the looker’s gaze direction. Ours is the first to explain in a single model the biases introduced by the looker’s head direction, the relative accuracy of eye...

Publication details
Date: 1 September 2014
Type: Inproceeding
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Kyungmin Lee, David Chu, Eduardo Cuervo, Johannes Kopf, Sergey Grizan, Alec Wolman, and Jason Flinn

Gaming is very popular. Cloud gaming – where remote servers perform game execution and rendering on behalf of thin clients that simply send input and display output frames – promises any device the ability to play any game any time. Unfortunately, the reality is that wide-area network latencies are often prohibitive; cellular, Wi-Fi and even wired residential end host round trip times (RTTs) can exceed 100ms, a threshold above which many gamers tend to deem responsiveness unacceptable.

In this...

Publication details
Date: 21 August 2014
Type: Technical report
Publisher: Microsoft Research
Number: MSR-TR-2014-115
Abner Guzman-Rivera, Pushmeet Kohli, Ben Glocker, Jamie Shotton, Toby Sharp, Andrew Fitzgibbon, and Shahram Izadi

Publication details
Date: 1 June 2014
Type: Inproceeding
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Neel Joshi and C. Lawrence Zitnick

Tradeoffs exist between the baseline or distance between cameras and the difficulty of matching corresponding points in stereo and structure from motion. Smaller baselines result in reduced disparities reducing the accuracy of depth estimation. Larger baselines increase the range of observed disparities, but also increase the difficulty of finding corresponding points. In this paper, we explore the use of very small baselines, called micro-baselines. Microbaselines, typically just a few millimeters,...

Publication details
Date: 22 May 2014
Type: Technical report
Publisher: Microsoft Research Technical Report
Number: MSR-TR-2014-73
Ivan Dokmanic and Ivan Tashev

Depth imaging is commonly based on light. For example, LIDAR and Kinect use infrared light, while stereo cameras use visible light. These systems require hardware operating at high sampling frequencies, precise calibration, and they dissipate significant power. In this paper, we investigate the potential of ultrasound for image and depth acquisition, with applications to human-computer interaction and skeletal tracking in mind. We use a loudspeaker array and a microphone array to sense the scene. We...

Publication details
Date: 9 May 2014
Type: Inproceeding
Publisher: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
Publication details
Date: 4 May 2014
Type: Inproceeding
Publisher: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
Ha Q. Nguyen, Philip A. Chou, and Yinpeng Chen

The next step in immersive communication beyond video from a single camera is object-based free viewpoint video, which is the capture and compression of a dynamic object such that it can be reconstructed and viewed from an arbitrary viewpoint. The moving human body is a particularly useful subclass of dynamic object for object-based free viewpoint video relevant to both telepresence and entertainment. In this paper, we compress moving human body sequences by applying recently developed Graph Wavelet...

Publication details
Date: 1 May 2014
Type: Proceedings
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Z. Zhang and Q. Cai

The cross-ratio approach has recently attracted increasing attention in eye-gaze tracking due to its simplicity in setting up a tracking system. Its accuracy, however, is lower than that of the model-based approach, and substantial efforts have been devoted to improving its accuracy. Binocular fixation is essential for humans to have good depth perception, and this paper presents a technique leveraging this constraint. It is used in two ways: First, in estimating jointly the homography matrices for both...

Publication details
Date: 1 March 2014
Type: Inproceeding
Publisher: ACM
J.-B. Huang, Q. Cai, Z. Liu, N. Ahuja, and Z. Zhang

Cross-ratio (CR) based methods offer many attractive properties for remote gaze estimation using a single camera in an uncalibrated setup by exploiting invariance of a plane projectivity. Unfortunately, due to several simplification assumptions, the performance of CR-based eye gaze trackers decays significantly as the subject moves away from the calibration position. In this paper, we introduce an adaptive homography mapping for achieving gaze prediction with higher accuracy at the calibration position...

Publication details
Date: 1 March 2014
Type: Inproceeding
Publisher: ACM
Ivan Tashev

We propose a method for the synthesis of the phases of Head-Related Transfer Functions (HRTFs) using a sparse representation of anthropometric features. Our approach treats the HRTF synthesis problem as finding a sparse representation of the subjects anthropometric features w.r.t. the anthropometric features in the training set. The fundamental assumption is that the group delay of a given HRTF set can be described by the same sparse combination as the anthropometric data. Thus, we learn a sparse vector...

Publication details
Date: 13 February 2014
Type: Inproceeding
Publisher: University of California - San Diego
tao mei, yong rui, shipeng li, and qi tian

The explosive growth and widespread accessibility of community contributed media content on the Internet have led to a surge of research activity in multimedia search. Approaches that apply text search techniques for multimedia search have achieved limited success as they entirely ignore visual content as a ranking signal. Multimedia search re-ranking, which reorders visual documents based on multimodal cues to improve initial text-only searches, has received increasing attention in recent years. Such a...

Publication details
Date: 1 January 2014
Type: Article
Jaesik Park, Sudipta n Sinha, Yasuyuki Matsushita, Yu-Wing Tai, and In So Kweon

We propose a method for accurate 3D shape reconstruction using uncalibrated multiview photometric stereo. A coarse mesh reconstructed using multiview stereo is first parameterized using a planar mesh parameterization technique. Subsequently, multiview photometric stereo is performed in the 2D parameter domain of the mesh, where all geometric and photometric cues from multiple images can be treated uniformly. Unlike traditional methods, there is no need for merging view-dependent surface normal maps. Our...

Publication details
Date: 3 December 2013
Type: Inproceeding
Publisher: International Conference on Computer Vision
Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun
Publication details
Date: 1 November 2013
Type: Article
Publisher: IEEE Computer Society
Xian-Sheng Hua, Linjun Yang, Jingdong Wang, Jing Wang, Ming Ye, Kuansan Wang, Yong Rui, and Jin Li

The semantic gap between low-level visual features and high-level semantics has been investigated for decades but stillremains a big challenge in multimedia. When "search" became one of the most frequently used applications, "intent gap", the gap between query expressions and users' search intents, emerged. Researchers have been focusing on three approaches to bridge the semantic and intent gaps: 1) developing more representative features, 2) exploiting better learning approaches or statistical models...

Publication details
Date: 21 October 2013
Type: Inproceeding
Publisher: ACM Conference on Multimedia
Wenyuan Yin, Tao Mei, and Chang Wen Chen

The ongoing revolution in media consumption from traditional PCs to the pervasiveness of mobile devices is driving the adoption of social media in our daily lives. More and more people are using their mobile devices to enjoy social media content while on the move. However, mobile display constraints create challenges for presenting and authoring the rich media content on screens with limited display size. This paper presents an innovative system to automatically generate magazine-like social media...

Publication details
Date: 1 October 2013
Type: Inproceeding
Publisher: ACM Multimedia
Wu Liu, Tao Mei, Yongdong Zhang, Jintao Li, and Shipeng Li

Mobile video is quickly becoming a mass consumer phenomenon. More and more people are using their smartphones to search and browse video content while on the move. In this paper, we have developed an innovative instant mobile video search system through which users can discover videos by simply pointing their phones at a screen to capture a very few seconds of what they are watching. The system is able to index large-scale video data using a new layered audio-video indexing approach in the cloud, as...

Publication details
Date: 1 October 2013
Type: Inproceeding
Publisher: ACM Multimedia
Ting Yao, Tao Mei, Chong-Wah Ngo, and Shipeng Li

The problem of tagging is mostly considered from the perspectives of machine learning and data-driven philosophy. A fundamental issue that underlies the success of these approaches is the visual similarity, ranging from the nearest neighbor search to manifold learning, to identify similar instances of an example for tag completion. The need to searching for millions of visual examples in high-dimensional feature space, however, makes the task computationally expensive. Moreover, the results can suffer...

Publication details
Date: 1 October 2013
Type: Inproceeding
Publisher: ACM Multimedia
Ivan Tashev

Kinect is a device for human-machine interaction, which adds two more input modalities to the palette of the user interface designer: gestures and speech. Kinect is transforming how people interact with computers, kiosks, and other motion-controlled devices from fun applications like playing a virtual violin, to applications in health care and physical therapy, retail, education, and training. The Kinect for Windows SDK and toolkit contain drivers, tools, APIs, device interfaces, and code samples to...

Publication details
Date: 1 September 2013
Type: Article
Publisher: IEEE
Publication details
Date: 1 September 2013
Type: Inproceeding
Publisher: British Machine Vision Conference (BMVC)
1–25 of 199
Sort
Show 25 | 50 | 100
1234567Next 
> Our research