Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Our research
Content type
+
Downloads (456)
+
Events (484)
 
Groups (150)
+
News (2840)
 
People (723)
 
Projects (1158)
+
Publications (12944)
+
Videos (6110)
Labs
Research areas
Algorithms and theory47205 (375)
Communication and collaboration47188 (243)
Computational linguistics47189 (267)
Computational sciences47190 (245)
Computer systems and networking47191 (843)
Computer vision208594 (948)
Data mining and data management208595 (154)
Economics and computation47192 (127)
Education47193 (91)
Gaming47194 (85)
Graphics and multimedia47195 (255)
Hardware and devices47196 (235)
Health and well-being47197 (113)
Human-computer interaction47198 (1007)
Machine learning and intelligence47200 (1022)
Mobile computing208596 (88)
Quantum computing208597 (41)
Search, information retrieval, and knowledge management47199 (749)
Security and privacy47202 (364)
Social media208598 (91)
Social sciences47203 (313)
Software development, programming principles, tools, and languages47204 (679)
Speech recognition, synthesis, and dialog systems208599 (178)
Technology for emerging markets208600 (64)
1–25 of 255
Sort
Show 25 | 50 | 100
1234567Next 
Dorina Thanou, Philip A. Chou, and Pascal Frossard

This paper addresses the problem of compression of 3D point cloud sequences that are characterized by moving 3D positions and color attributes. As temporally successive point cloud frames share some similarities, motion estimation is key to effective compression of these sequences. It however remains a challenging problem as the point cloud frames have varying numbers of points without explicit correspondence information. We represent the time-varying geometry of these sequences with a set of graphs,...

Publication details
Date: 1 December 2016
Type: Article
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Philip A. Chou and Ricardo L. de Queiroz

We introduce the Gaussian Process Transform (GPT), an orthogonal transform for signals defined on a finite but otherwise arbitrary set of points in a Euclidean domain. The GPT is obtained as the e Transform (KLT) of the marginalization of a Gaussian Process defined on the domain. Compared to the Graph Transform (GT), which is the KLT of a Gauss Markov Random Field over the same set of points whose neighborhood structure is inherited from the Euclidean domain, the GPT retains up to 6 dB higher energy in...

Publication details
Date: 1 September 2016
Type: Inproceeding
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Mingsong Dou, Sameh Khamis, Yury Degtyarev, Philip Davidson, Sean Fanello, Adarsh Kowdle, Sergio Orts Escolano, Christoph Rhemann, David Kim, Jonathan Taylor, Pushmeet Kohli, Vladimir Tankovich, and Shahram Izadi
Publication details
Date: 1 July 2016
Type: Inproceeding
Publisher: SIGGRAPH
Yingwei Pan, Tao Mei, Ting Yao, Houqiang Li, and Yong Rui

Automatically describing video content with natural language is a fundamental challenge of computer vision. Recurrent Neural Networks (RNNs), which models sequence dynamics, has attracted increasing attention on visual interpretation. However, most existing approaches generate a word locally with the given previous words and the visual content, while the relationship between sentence semantics and visual content is not holistically exploited. As a result, the generated sentences may be contextually...

Publication details
Date: 1 June 2016
Type: Inproceeding
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Sean Fanello, Christoph Rhemann, Vladimir Tankovich, Adarsh Kowdle, Sergio Orts Escolano, David Kim, and Shahram Izadi

Structured light sensors are popular due to their robustness to untextured scenes and multipath. These systems triangulate depth by solving a correspondence problem between each camera and projector pixel. This is often framed as a local stereo matching task, correlating patches of pixels in the observed and reference image. However, this is computationally intensive, leading to reduced depth accuracy and framerate. We contribute an algorithm for solving this correspondence problem efficiently, without...

Publication details
Date: 1 June 2016
Type: Inproceeding
Publisher: CVPR
Awards: Oral
Jun Xu, Tao Mei, Ting Yao, and Yong Rui

While there has been increasing interest in the task of describing video with natural language, current computer vision algorithms are still severely limited in terms of the variability and complexity of the videos and their associated language that they can recognize. This is in part due to the simplicity of current benchmarks, which mostly focus on specific fine-grained domains with limited videos and simple descriptions. While researchers have provided several benchmark datasets for image...

Publication details
Date: 1 June 2016
Type: Inproceeding
Publisher: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR)
Ting Yao, Tao Mei, and Yong Rui

The emergence of wearable devices such as portable cameras and smart glasses makes it possible to record life logging first-person videos. Browsing such long unstructured videos is time-consuming and tedious. This paper studies the discovery of moments of user’s major or special interest (i.e., highlights) in a video, for generating the summarization of first-person videos. Specifically, we propose a novel pairwise deep ranking model that employs deep learning techniques to learn the...

Publication details
Date: 1 June 2016
Type: Inproceeding
Tomislav Pejsa, Julian Kantor, Hrvoje Benko, Eyal Ofek, and Andrew Wilson

Room2Room is a telepresence system that leverages projected augmented reality to enable life-size, co-present interaction between two remote participants. Our solution recreates the experience of a face-to-face conversation by performing 3D capture of the local user with color + depth cameras and projecting their life-size virtual copy into the remote space. This creates an illusion of the remote person’s physical presence in the local space, as well as a shared understanding of verbal and...

Publication details
Date: 1 March 2016
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Aamir Anis, Philip A. Chou, and Antonio Ortega

The advent of advanced acquisition techniques in 3D media applications has led to an increasing trend of capturing dynamic objects and scenes via 3D point cloud sequences. This form of data is composed of time-indexed frames, each consisting of a collection of points with position and color attributes. Compression of such datasets is challenging because of the lack of efficient techniques for exploiting spatial and temporal correlations between the attributes. In our approach, we create an intermediate...

Publication details
Date: 1 March 2016
Type: Inproceeding
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Bo Wu, Tao Mei, Wen-Huang Cheng, and Yongdong Zhang

Time information plays a crucial role on social media popularity. Existing research on popularity prediction, effective though, ignores temporal information which is highly related to user-item associations and thus often results in limited success. An essential way is to consider all these factors (user, item, and time), which capture the dynamic nature of photo popularity. In this paper, we present a novel approach to factorize the popularity into user-item context and time-sensitive context for...

Publication details
Date: 1 February 2016
Type: Inproceeding
Publisher: AAAI - Association for the Advancement of Artificial Intelligence
Jianlong Fu, Yue Wu, Tao Mei, Jinqiao Wang, Hanqing Lu, and Yong Rui

The development of deep learning has empowered machines with comparable capability of recognizing limited image categories to human beings. However, most existing approaches heavily rely on human-curated training data, which hinders the scalability to large and unlabeled vocabularies in image tagging. In this paper, we propose a weakly-supervised deep learning model which can be trained from the readily available Web images to relax the dependence on human labors and scale up to arbitrary tags...

Publication details
Date: 1 December 2015
Type: Inproceeding
Publisher: IEEE International Conference on Computer Vision
Danhang Tang, Jonathan Taylor, Pushmeet Kohli, Cem Keskin, Tae-Kyun Kim, and Jamie Shotton

We address the problem of hand pose estimation, formulated as an inverse problem. Typical approaches optimize an energy function over pose parameters using a ‘black box’ image generation procedure. This procedure knows little about either the relationships between the parameters or the form of the energy function. In this paper, we show that we can significantly improve upon black box optimization by exploiting high-level knowledge of the parameter structure and using a local surrogate energy function....

Publication details
Date: 1 December 2015
Type: Inproceeding
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Ting Yao, Tao Mei, and Chong-Wah Ngo

One of the fundamental problems in image search is to learn the ranking functions, i.e., similarity between the query and image. The research on this topic has evolved through two paradigms: feature-based vector model and image ranker learning. The former relies on the image surrounding texts, while the latter learns a ranker based on human labeled query-image pairs. Each of the paradigms has its own limitation. The vector model is sensitive to the quality of text descriptions, and the learning paradigm...

Publication details
Date: 1 December 2015
Type: Inproceeding
Publisher: IEEE International Conference on Computer Vision
Xuyong Yang, Tao Mei, Ying-Qing Xu, Yong Rui, and Shipeng Li

Visual-textual presentation layout (e.g., digital magazine cover, poster, Power Point slides, and any other rich media), which combines beautiful image and overlaid readable texts, can result in an eye candy touch to attract users' attention. The designing of visual-textual presentation layout is therefore becoming ubiquitous in both commercially printed publications and online digital magazines. However, handcrafting aesthetically compelling layouts still remains challenging for many small businesses...

Publication details
Date: 1 November 2015
Type: Article
Publisher: ACM – Association for Computing Machinery
Dongwook Yoon, Nicholas Chen, François Guimbretière, and Abigail Sellen

This paper introduces a novel document annotation system that aims to enable the kinds of rich communication that usually only occur in face-to-face meetings. Our system, RichReview, lets users create annotations on top of digital documents using three main modalities: freeform inking, voice for narration, and deictic gestures in support of voice. RichReview uses novel visual representations and timesynchronization between modalities to simplify annotation access and navigation. Moreover, RichReview’s...

Publication details
Date: 1 October 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Jinkyu Lee and Ivan Tashev

This paper presents a speech emotion recognition system using a recurrent neural network (RNN) model trained by an efficient learning algorithm. The proposed system takes into account the long-range context effect and the uncertainty of emotional label expressions. To extract high-level representation of emotional states with regard to its temporal dynamics, a powerful learning method with a bidirectional long short-term memory (BLSTM) model is adopted. To overcome the uncertainty of emotional labels,...

Publication details
Date: 8 September 2015
Type: Inproceeding
Publisher: ISCA - International Speech Communication Association
Songtao He, Yunxin Liu, and Hucheng Zhou

The extremely-high display density of modern smartphones imposes a significant burden on power consumption, yet does not always provide an improved user experience and may even lead to a compromised user experience. As human visually-perceivable ability highly depends on the user-screen distance, a reduced display resolution may still achieve the same user experience when the user-screen distance is large. This provides new power-saving opportunities. In this paper, we present a exible dynamic...

Publication details
Date: 1 September 2015
Type: Proceedings
Publisher: ACM – Association for Computing Machinery
Dorina Thanou, Philip A. Chou, and Pascal Frossard

This paper addresses the problem of motion estimation in 3D point cloud sequences that are characterized by moving 3D positions and color attributes. Motion estimation is key to effective compression of these sequences, but it remains a challenging problem as the temporally successive frames have varying sizes without explicit correspondence information. We represent the time-varying geometry of these sequences with a set of graphs, and consider 3D positions and color attributes of the points clouds as...

Publication details
Date: 1 September 2015
Type: Inproceeding
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Steven M. Drucker and Roland Fernandez

We use the term Unit Visualizations to describe a class of visualizations that explicitly represent every row in a data set. They have been around in one form or another for hundreds of years, usually in static form (e.g. tallies, scatterplots, Dot Plots, Unit Charts, Pixel Charts, or Isotypes.) We characterize their design space and propose a unifying framework that can produce common types of Unit Visualizations. In addition, we introduce SandDance, a tool built to explore the effectiveness of...

Publication details
Date: 6 August 2015
Type: Technical report
Publisher: Microsoft Research
Number: MSR-TR-2015-65
Gerard Pons-Moll, Jonathan Taylor, Jamie Shotton, Aaron Hertzmann, and Andrew Fitzgibbon

We present a new method for inferring dense data to model correspondences, focusing on the application of human pose estimation from depth images. Recent work proposed the use of regression forests to quickly predict correspondences between depth pixels and points on a 3D human mesh model. That work, however, used a proxy forest training objective based on the classification of depth pixels to body parts. In contrast, we introduce Metric Space Information Gain (MSIG), a new decision forest training...

Publication details
Date: 1 August 2015
Type: Article
Publisher: Springer
Kyungmin Lee, David Chu, Eduardo Cuervo, Johannes Kopf, Yury Degtyarev, Sergey Grizan, Alec Wolman, and Jason Flinn

Gaming on phones, tablets and laptops is very popular. Cloud gaming -- where remote servers perform game execution and rendering on behalf of thin clients that simply send input and display output frames -- promises any device the ability to play any game any time. Unfortunately, the reality is that wide-area network latencies are often prohibitive; cellular, Wi-Fi and even wired residential end host round trip times (RTTs) can exceed 100ms, a threshold above which many gamers tend to deem...

Publication details
Date: 3 June 2015
Type: Proceedings
Publisher: ACM – Association for Computing Machinery
Sameh Khamis, Jonathan Taylor, Jamie Shotton, Cem Keskin, Shahram Izadi, and Andrew Fitzgibbon

We describe how to learn a compact and efficient model of the surface deformation of human hands. The model is built from a set of noisy depth images of a diverse set of subjects performing different poses with their hands. We represent the observed surface using Loop subdivision of a control mesh that is deformed by our learned parametric shape and pose model. The model simultaneously accounts for variation in subject-specific shape and subject-agnostic pose. Specifically, hand shape is...

Publication details
Date: 1 June 2015
Type: Inproceeding
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Eduardo Cuervo, Alec Wolman, Landon P. Cox, Kiron Lebeck, Ali Razeen, Stefan Saroiu, and Madanlal Musuvathi

This paper presents Kahawai, a system that provides high-quality gaming on mobile devices, such as tablets and smartphones, by of- floading a portion of the GPU computation to server-side infrastructure. In contrast with previous thin-client approaches that require a server-side GPU to render the entire content, Kahawai uses collaborative rendering to combine the output of a mobile GPU and a server-side GPU into the displayed output. Compared to a thin client, collaborative rendering requires...

Publication details
Date: 19 May 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Jarrod Knibbe, Hrvoje Benko, and Andrew D. Wilson

Projector-camera (pro-cam) systems afford a wide range of interactive possibilities, combining both natural and mixed-reality 3D interaction. However, the latency inherent within these systems can cause the projection to ‘slip’ from its intended target, detracting from the overall experience. Because of this, pro-cam systems have typically shied away from truly dynamic scenarios. In turn, research has been exploring latency reduction techniques across a range of domains, but these techniques typically...

Publication details
Date: 11 May 2015
Type: Technical report
Publisher: Microsoft Research
Number: MSR-TR-2015-35
Felicia Lim, Mark R. P. Thomas, and Ivan J. Tashev

Reverberation time is an important parameter for characterizing acoustic environments. It is useful in many applications including acoustic scene analysis, robust automatic speech recognition and dereverberation. Given knowledge of the acoustic impulse response, reverberation time can be measured using Schroeder’s backward integration method. Since it is not always practical to obtain impulse responses, blind estimation algorithms are sometimes desirable. In this work, the reverberation problem is...

Publication details
Date: 21 April 2015
Type: Inproceeding
Publisher: IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)
1–25 of 255
Sort
Show 25 | 50 | 100
1234567Next 
> Our research