Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Our research
Content type
+
Downloads (461)
+
Events (467)
 
Groups (151)
+
News (2815)
 
People (712)
 
Projects (1136)
+
Publications (12929)
+
Videos (5998)
Labs
Research areas
Algorithms and theory47205 (360)
Communication and collaboration47188 (238)
Computational linguistics47189 (260)
Computational sciences47190 (245)
Computer systems and networking47191 (810)
Computer vision208594 (934)
Data mining and data management208595 (140)
Economics and computation47192 (124)
Education47193 (89)
Gaming47194 (84)
Graphics and multimedia47195 (250)
Hardware and devices47196 (226)
Health and well-being47197 (104)
Human-computer interaction47198 (980)
Machine learning and intelligence47200 (988)
Mobile computing208596 (79)
Quantum computing208597 (41)
Search, information retrieval, and knowledge management47199 (737)
Security and privacy47202 (358)
Social media208598 (83)
Social sciences47203 (300)
Software development, programming principles, tools, and languages47204 (656)
Speech recognition, synthesis, and dialog systems208599 (160)
Technology for emerging markets208600 (58)
1–25 of 250
Sort
Show 25 | 50 | 100
1234567Next 
Dorina Thanou, Philip A. Chou, and Pascal Frossard

This paper addresses the problem of compression of 3D point cloud sequences that are characterized by moving 3D positions and color attributes. As temporally successive point cloud frames share some similarities, motion estimation is key to effective compression of these sequences. It however remains a challenging problem as the point cloud frames have varying numbers of points without explicit correspondence information. We represent the time-varying geometry of these sequences with a set of graphs,...

Publication details
Date: 1 December 2016
Type: Article
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Ricardo L. de Queiroz and Philip A. Chou

In free-viewpoint video, there is a recent trend to represent scene objects as solids rather than using multiple depth maps. Point clouds have been used in computer graphics for a long time and with the recent possibility of real time capturing and rendering, point clouds have been favored over meshes in order to save computation. Each point in the cloud is associated with its 3D position and its color. We devise a method to compress the colors in point clouds which is based on a hierarchical transform...

Publication details
Date: 1 December 2016
Type: Article
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Philip A. Chou and Ricardo L. de Queiroz

We introduce the Gaussian Process Transform (GPT), an orthogonal transform for signals defined on a finite but otherwise arbitrary set of points in a Euclidean domain. The GPT is obtained as the e Transform (KLT) of the marginalization of a Gaussian Process defined on the domain. Compared to the Graph Transform (GT), which is the KLT of a Gauss Markov Random Field over the same set of points whose neighborhood structure is inherited from the Euclidean domain, the GPT retains up to 6 dB higher energy in...

Publication details
Date: 1 September 2016
Type: Inproceeding
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Aamir Anis, Philip A. Chou, and Antonio Ortega

The advent of advanced acquisition techniques in 3D media applications has led to an increasing trend of capturing dynamic objects and scenes via 3D point cloud sequences. This form of data is composed of time-indexed frames, each consisting of a collection of points with position and color attributes. Compression of such datasets is challenging because of the lack of efficient techniques for exploiting spatial and temporal correlations between the attributes. In our approach, we create an intermediate...

Publication details
Date: 1 March 2016
Type: Inproceeding
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Bo Wu, Tao Mei, Wen-Huang Cheng, and Yongdong Zhang

Time information plays a crucial role on social media popularity. Existing research on popularity prediction, effective though, ignores temporal information which is highly related to user-item associations and thus often results in limited success. An essential way is to consider all these factors (user, item, and time), which capture the dynamic nature of photo popularity. In this paper, we present a novel approach to factorize the popularity into user-item context and time-sensitive context for...

Publication details
Date: 1 February 2016
Type: Inproceeding
Publisher: AAAI - Association for the Advancement of Artificial Intelligence
Danhang Tang, Jonathan Taylor, Pushmeet Kohli, Cem Keskin, Tae-Kyun Kim, and Jamie Shotton

We address the problem of hand pose estimation, formulated as an inverse problem. Typical approaches optimize an energy function over pose parameters using a ‘black box’ image generation procedure. This procedure knows little about either the relationships between the parameters or the form of the energy function. In this paper, we show that we can significantly improve upon black box optimization by exploiting high-level knowledge of the parameter structure and using a local surrogate energy function....

Publication details
Date: 1 December 2015
Type: Inproceeding
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Jianlong Fu, Yue Wu, Tao Mei, Jinqiao Wang, Hanqing Lu, and Yong Rui

The development of deep learning has empowered machines with comparable capability of recognizing limited image categories to human beings. However, most existing approaches heavily rely on human-curated training data, which hinders the scalability to large and unlabeled vocabularies in image tagging. In this paper, we propose a weakly-supervised deep learning model which can be trained from the readily available Web images to relax the dependence on human labors and scale up to arbitrary tags...

Publication details
Date: 1 December 2015
Type: Inproceeding
Publisher: IEEE International Conference on Computer Vision
Ting Yao, Tao Mei, and Chong-Wah Ngo

One of the fundamental problems in image search is to learn the ranking functions, i.e., similarity between the query and image. The research on this topic has evolved through two paradigms: feature-based vector model and image ranker learning. The former relies on the image surrounding texts, while the latter learns a ranker based on human labeled query-image pairs. Each of the paradigms has its own limitation. The vector model is sensitive to the quality of text descriptions, and the learning paradigm...

Publication details
Date: 1 December 2015
Type: Inproceeding
Publisher: IEEE International Conference on Computer Vision
Xuyong Yang, Tao Mei, Ying-Qing Xu, Yong Rui, and Shipeng Li

Visual-textual presentation layout (e.g., digital magazine cover, poster, Power Point slides, and any other rich media), which combines beautiful image and overlaid readable texts, can result in an eye candy touch to attract users' attention. The designing of visual-textual presentation layout is therefore becoming ubiquitous in both commercially printed publications and online digital magazines. However, handcrafting aesthetically compelling layouts still remains challenging for many small businesses...

Publication details
Date: 1 November 2015
Type: Article
Publisher: ACM – Association for Computing Machinery
Dongwook Yoon, Nicholas Chen, François Guimbretière, and Abigail Sellen

This paper introduces a novel document annotation system that aims to enable the kinds of rich communication that usually only occur in face-to-face meetings. Our system, RichReview, lets users create annotations on top of digital documents using three main modalities: freeform inking, voice for narration, and deictic gestures in support of voice. RichReview uses novel visual representations and timesynchronization between modalities to simplify annotation access and navigation. Moreover, RichReview’s...

Publication details
Date: 1 October 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Jinkyu Lee and Ivan Tashev

This paper presents a speech emotion recognition system using a recurrent neural network (RNN) model trained by an efficient learning algorithm. The proposed system takes into account the long-range context effect and the uncertainty of emotional label expressions. To extract high-level representation of emotional states with regard to its temporal dynamics, a powerful learning method with a bidirectional long short-term memory (BLSTM) model is adopted. To overcome the uncertainty of emotional labels,...

Publication details
Date: 8 September 2015
Type: Inproceeding
Publisher: ISCA - International Speech Communication Association
Songtao He, Yunxin Liu, and Hucheng Zhou

The extremely-high display density of modern smartphones imposes a significant burden on power consumption, yet does not always provide an improved user experience and may even lead to a compromised user experience. As human visually-perceivable ability highly depends on the user-screen distance, a reduced display resolution may still achieve the same user experience when the user-screen distance is large. This provides new power-saving opportunities. In this paper, we present a exible dynamic...

Publication details
Date: 1 September 2015
Type: Proceedings
Publisher: ACM – Association for Computing Machinery
Dorina Thanou, Philip A. Chou, and Pascal Frossard

This paper addresses the problem of motion estimation in 3D point cloud sequences that are characterized by moving 3D positions and color attributes. Motion estimation is key to effective compression of these sequences, but it remains a challenging problem as the temporally successive frames have varying sizes without explicit correspondence information. We represent the time-varying geometry of these sequences with a set of graphs, and consider 3D positions and color attributes of the points clouds as...

Publication details
Date: 1 September 2015
Type: Inproceeding
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Steven M. Drucker and Roland Fernandez

We use the term Unit Visualizations to describe a class of visualizations that explicitly represent every row in a data set. They have been around in one form or another for hundreds of years, usually in static form (e.g. tallies, scatterplots, Dot Plots, Unit Charts, Pixel Charts, or Isotypes.) We characterize their design space and propose a unifying framework that can produce common types of Unit Visualizations. In addition, we introduce SandDance, a tool built to explore the effectiveness of...

Publication details
Date: 6 August 2015
Type: Technical report
Publisher: Microsoft Research
Number: MSR-TR-2015-65
Gerard Pons-Moll, Jonathan Taylor, Jamie Shotton, Aaron Hertzmann, and Andrew Fitzgibbon

We present a new method for inferring dense data to model correspondences, focusing on the application of human pose estimation from depth images. Recent work proposed the use of regression forests to quickly predict correspondences between depth pixels and points on a 3D human mesh model. That work, however, used a proxy forest training objective based on the classification of depth pixels to body parts. In contrast, we introduce Metric Space Information Gain (MSIG), a new decision forest training...

Publication details
Date: 1 August 2015
Type: Article
Publisher: Springer
Kyungmin Lee, David Chu, Eduardo Cuervo, Johannes Kopf, Yury Degtyarev, Sergey Grizan, Alec Wolman, and Jason Flinn

Gaming on phones, tablets and laptops is very popular. Cloud gaming -- where remote servers perform game execution and rendering on behalf of thin clients that simply send input and display output frames -- promises any device the ability to play any game any time. Unfortunately, the reality is that wide-area network latencies are often prohibitive; cellular, Wi-Fi and even wired residential end host round trip times (RTTs) can exceed 100ms, a threshold above which many gamers tend to deem...

Publication details
Date: 3 June 2015
Type: Proceedings
Publisher: ACM – Association for Computing Machinery
Sameh Khamis, Jonathan Taylor, Jamie Shotton, Cem Keskin, Shahram Izadi, and Andrew Fitzgibbon

We describe how to learn a compact and efficient model of the surface deformation of human hands. The model is built from a set of noisy depth images of a diverse set of subjects performing different poses with their hands. We represent the observed surface using Loop subdivision of a control mesh that is deformed by our learned parametric shape and pose model. The model simultaneously accounts for variation in subject-specific shape and subject-agnostic pose. Specifically, hand shape is...

Publication details
Date: 1 June 2015
Type: Inproceeding
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Yingwei Pan, Tao Mei, Ting Yao, Houqiang Li, and Yong Rui

Automatically describing video content with natural language is a fundamental challenge of multimedia. Recurrent Neural Networks (RNN), which models sequence dynamics, has attracted increasing attention on visual interpretation. However, most existing approaches generate a word locally with given previous words and the visual content, while the relationship between sentence semantics and visual content is not holistically exploited. As a result, the generated sentences may be contextually correct...

Publication details
Date: 1 June 2015
Type: Technical report
Publisher: Microsoft Research
Number: MSR-TR-2015-92
Eduardo Cuervo, Alec Wolman, Landon P. Cox, Kiron Lebeck, Ali Razeen, Stefan Saroiu, and Madanlal Musuvathi

This paper presents Kahawai, a system that provides high-quality gaming on mobile devices, such as tablets and smartphones, by of- floading a portion of the GPU computation to server-side infrastructure. In contrast with previous thin-client approaches that require a server-side GPU to render the entire content, Kahawai uses collaborative rendering to combine the output of a mobile GPU and a server-side GPU into the displayed output. Compared to a thin client, collaborative rendering requires...

Publication details
Date: 19 May 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Jarrod Knibbe, Hrvoje Benko, and Andrew D. Wilson

Projector-camera (pro-cam) systems afford a wide range of interactive possibilities, combining both natural and mixed-reality 3D interaction. However, the latency inherent within these systems can cause the projection to ‘slip’ from its intended target, detracting from the overall experience. Because of this, pro-cam systems have typically shied away from truly dynamic scenarios. In turn, research has been exploring latency reduction techniques across a range of domains, but these techniques typically...

Publication details
Date: 11 May 2015
Type: Technical report
Publisher: Microsoft Research
Number: MSR-TR-2015-35
Felicia Lim, Mark R. P. Thomas, and Ivan J. Tashev

Reverberation time is an important parameter for characterizing acoustic environments. It is useful in many applications including acoustic scene analysis, robust automatic speech recognition and dereverberation. Given knowledge of the acoustic impulse response, reverberation time can be measured using Schroeder’s backward integration method. Since it is not always practical to obtain impulse responses, blind estimation algorithms are sometimes desirable. In this work, the reverberation problem is...

Publication details
Date: 21 April 2015
Type: Inproceeding
Publisher: IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)
Lijun Zhu and Dinei Florencio

Parametric speakers produce sound by emitting ultrasound, and using the small nonlinearity in air to demodulate it back to audible sound. The use of ultrasound allows for producing very narrow audio beams, which finds application in a number of military and consumer scenarios. However, designing better parametric speakers has been hard: closed-form solution of the nonlinear wave equation for generic geometries is nearly impossible, and the only existing solution was derived for the simple case of a...

Publication details
Date: 1 April 2015
Type: Inproceeding
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Dinei Florencio and Zhengyou Zhang

Estimating room impulse responses (RIRs) has a number of applications, including personalized audio, analyzing and improving acoustic behavior of concert halls, listening room compensation, sound source localization, and many others. RIRs have been estimated in essentially the same fashion for the last 50 years: Compute the cross correlation between a signal played at point A, and the signal received at point B. Best results are obtained when the signal played is white noise, or a maximum length...

Publication details
Date: 1 April 2015
Type: Inproceeding
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Cha Zhang, Dinei Florencio, and Philip Chou

This theoretical paper aims to provide a probabilistic framework for graph signal processing. By modeling signals on graphs as Gaussian Markov Random Fields, we present numerous important aspects of graph signal processing, including graph construction, graph transform, graph downsampling, graph prediction, and graph-based regularization, from a probabilistic point of view. As examples, we discuss a number of methods for constructing graphs based on statistics from input data sets; we show that the...

Publication details
Date: 1 April 2015
Type: Technical report
Publisher: Microsoft Research
Number: MSR-TR-2015-31
Keita Higuchi, Yinpeng Chen, Philip A. Chou, Zhengyou Zhang, and Zicheng Liu

ImmerseBoard is a system for remote collaboration through a digital whiteboard that gives participants a 3D immersive experience, enabled only by an RGBD camera (Microsoft Kinect) mounted on the side of a large touch display. Using 3D processing of the depth images, life-sized rendering, and novel visualizations, ImmerseBoard emulates writing side-by-side on a physical whiteboard, or alternatively on a mirror. User studies involving three tasks show that compared to standard video conferencing with a...

Publication details
Date: 1 April 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
1–25 of 250
Sort
Show 25 | 50 | 100
1234567Next 
> Our research