Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Our research
Content type
+
Downloads (455)
+
Events (487)
 
Groups (150)
+
News (2849)
 
People (717)
 
Projects (1161)
+
Publications (13026)
+
Videos (6121)
Labs
Research areas
Algorithms and theory47205 (376)
Communication and collaboration47188 (251)
Computational linguistics47189 (275)
Computational sciences47190 (247)
Computer systems and networking47191 (848)
Computer vision208594 (953)
Data mining and data management208595 (168)
Economics and computation47192 (129)
Education47193 (91)
Gaming47194 (85)
Graphics and multimedia47195 (265)
Hardware and devices47196 (243)
Health and well-being47197 (117)
Human-computer interaction47198 (1018)
Machine learning and intelligence47200 (1034)
Mobile computing208596 (89)
Quantum computing208597 (45)
Search, information retrieval, and knowledge management47199 (757)
Security and privacy47202 (372)
Social media208598 (93)
Social sciences47203 (319)
Software development, programming principles, tools, and languages47204 (688)
Speech recognition, synthesis, and dialog systems208599 (176)
Technology for emerging markets208600 (64)
1–25 of 265
Sort
Show 25 | 50 | 100
1234567Next 
Ricardo L. de Queiroz and Philip A. Chou

In free-viewpoint video, there is a recent trend to represent scene objects as solids rather than using multiple depth maps. Point clouds have been used in computer graphics for a long time and with the recent possibility of real time capturing and rendering, point clouds have been favored over meshes in order to save computation. Each point in the cloud is associated with its 3D position and its color. We devise a method to compress the colors in point clouds which is based on a hierarchical transform...

Publication details
Date: 1 December 2016
Type: Article
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Philip A. Chou and Ricardo L. de Queiroz

We introduce the Gaussian Process Transform (GPT), an orthogonal transform for signals defined on a finite but otherwise arbitrary set of points in a Euclidean domain. The GPT is obtained as the e Transform (KLT) of the marginalization of a Gaussian Process defined on the domain. Compared to the Graph Transform (GT), which is the KLT of a Gauss Markov Random Field over the same set of points whose neighborhood structure is inherited from the Euclidean domain, the GPT has up to 6 dB higher coding...

Publication details
Date: 1 September 2016
Type: Inproceeding
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Dong Liu, Lizhi Wang, Li Li, Zhiwei Xiong, Feng Wu, and Wenjun Zeng
Publication details
Date: 1 July 2016
Type: Inproceeding
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Mingsong Dou, Sameh Khamis, Yury Degtyarev, Philip Davidson, Sean Fanello, Adarsh Kowdle, Sergio Orts Escolano, Christoph Rhemann, David Kim, Jonathan Taylor, Pushmeet Kohli, Vladimir Tankovich, and Shahram Izadi
Publication details
Date: 1 July 2016
Type: Inproceeding
Publisher: SIGGRAPH
Ting Yao, Tao Mei, and Yong Rui

The emergence of wearable devices such as portable cameras and smart glasses makes it possible to record life logging first-person videos. Browsing such long unstructured videos is time-consuming and tedious. This paper studies the discovery of moments of user’s major or special interest (i.e., highlights) in a video, for generating the summarization of first-person videos. Specifically, we propose a novel pairwise deep ranking model that employs deep learning techniques to learn the...

Publication details
Date: 1 June 2016
Type: Inproceeding
Sean Fanello, Christoph Rhemann, Vladimir Tankovich, Adarsh Kowdle, Sergio Orts Escolano, David Kim, and Shahram Izadi

Structured light sensors are popular due to their robustness to untextured scenes and multipath. These systems triangulate depth by solving a correspondence problem between each camera and projector pixel. This is often framed as a local stereo matching task, correlating patches of pixels in the observed and reference image. However, this is computationally intensive, leading to reduced depth accuracy and framerate. We contribute an algorithm for solving this correspondence problem efficiently, without...

Publication details
Date: 1 June 2016
Type: Inproceeding
Publisher: CVPR
Awards: Oral
Yingwei Pan, Tao Mei, Ting Yao, Houqiang Li, and Yong Rui

Automatically describing video content with natural language is a fundamental challenge of computer vision. Recurrent Neural Networks (RNNs), which models sequence dynamics, has attracted increasing attention on visual interpretation. However, most existing approaches generate a word locally with the given previous words and the visual content, while the relationship between sentence semantics and visual content is not holistically exploited. As a result, the generated sentences may be contextually...

Publication details
Date: 1 June 2016
Type: Inproceeding
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Jun Xu, Tao Mei, Ting Yao, and Yong Rui

While there has been increasing interest in the task of describing video with natural language, current computer vision algorithms are still severely limited in terms of the variability and complexity of the videos and their associated language that they can recognize. This is in part due to the simplicity of current benchmarks, which mostly focus on specific fine-grained domains with limited videos and simple descriptions. While researchers have provided several benchmark datasets for image...

Publication details
Date: 1 June 2016
Type: Inproceeding
Publisher: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR)
Philip A. Chou and Ricardo L. de Queiroz

Graphs are often used to model signals defined on a set of points embedded in a Euclidean domain. Examples are distributed sensor readings, measures of congestion in a transportation network, samples in a feature space, and colors on a 3D point clouds. However, it may be better to model such signals as samples of a Gaussian Process defined on the Euclidean domain. We show, on a 3D point cloud example, that Karhunen Loeve Transforms (KLTs) based on Gaussian Process models can have significantly higher...

Publication details
Date: 1 May 2016
Type: Inproceeding
Xiaosong Lan, Zhiwei Xiong, Wei Zhang, Shuxiao Li, Hongxing Chang, and Wenjun Zeng
Publication details
Date: 1 May 2016
Type: Proceedings
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Dorina Thanou, Philip A. Chou, and Pascal Frossard

This paper addresses the problem of compression of 3D point cloud sequences that are characterized by moving 3D positions and color attributes. As temporally successive point cloud frames share some similarities, motion estimation is key to effective compression of these sequences. It however remains a challenging problem as the point cloud frames have varying numbers of points without explicit correspondence information. We represent the time-varying geometry of these sequences with a set of graphs,...

Publication details
Date: 1 April 2016
Type: Article
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Ricardo L. de Queiroz and Philip A. Chou

We propose using stationary Gaussian Processes (GPs) to model the statistics of the signal on points in a point cloud, which can be considered samples of a GP at the positions of the points. Further, we propose using Gaussian Process Transforms (GPTs), which are Karhunen-Loeve Transforms of the GP, as the basis of transform coding of the signal. Focusing on colored 3D point clouds, we propose a transform coder that breaks the point cloud into blocks, transforms the blocks using GPTs, and entropy codes...

Publication details
Date: 1 April 2016
Type: Article
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Yanghao Li, Cuiling Lan, Junliang Xing, Wenjun Zeng, Chunfeng Yuan, and Jiaying Liu
Publication details
Date: 1 April 2016
Type: Proceedings
Publisher: arXiv:1604.05633
Aamir Anis, Philip A. Chou, and Antonio Ortega

The advent of advanced acquisition techniques in 3D media applications has led to an increasing trend of capturing dynamic objects and scenes via 3D point cloud sequences. This form of data is composed of time-indexed frames, each consisting of a collection of points with position and color attributes. Compression of such datasets is challenging because of the lack of efficient techniques for exploiting spatial and temporal correlations between the attributes. In our approach, we create an intermediate...

Publication details
Date: 1 March 2016
Type: Inproceeding
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Tomislav Pejsa, Julian Kantor, Hrvoje Benko, Eyal Ofek, and Andrew Wilson

Room2Room is a telepresence system that leverages projected augmented reality to enable life-size, co-present interaction between two remote participants. Our solution recreates the experience of a face-to-face conversation by performing 3D capture of the local user with color + depth cameras and projecting their life-size virtual copy into the remote space. This creates an illusion of the remote person’s physical presence in the local space, as well as a shared understanding of verbal and...

Publication details
Date: 1 March 2016
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Wentao Zhu, Cuiling Lan, Junliang Xing, Wenjun Zeng, Yanghao Li, Li Shen, and Xiaohui Xie
Publication details
Date: 1 February 2016
Type: Inproceeding
Publisher: AAAI - Association for the Advancement of Artificial Intelligence
Bo Wu, Tao Mei, Wen-Huang Cheng, and Yongdong Zhang

Time information plays a crucial role on social media popularity. Existing research on popularity prediction, effective though, ignores temporal information which is highly related to user-item associations and thus often results in limited success. An essential way is to consider all these factors (user, item, and time), which capture the dynamic nature of photo popularity. In this paper, we present a novel approach to factorize the popularity into user-item context and time-sensitive context for...

Publication details
Date: 1 February 2016
Type: Inproceeding
Publisher: AAAI - Association for the Advancement of Artificial Intelligence
Danhang Tang, Jonathan Taylor, Pushmeet Kohli, Cem Keskin, Tae-Kyun Kim, and Jamie Shotton

We address the problem of hand pose estimation, formulated as an inverse problem. Typical approaches optimize an energy function over pose parameters using a ‘black box’ image generation procedure. This procedure knows little about either the relationships between the parameters or the form of the energy function. In this paper, we show that we can significantly improve upon black box optimization by exploiting high-level knowledge of the parameter structure and using a local surrogate energy function....

Publication details
Date: 1 December 2015
Type: Inproceeding
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Ting Yao, Tao Mei, and Chong-Wah Ngo

One of the fundamental problems in image search is to learn the ranking functions, i.e., similarity between the query and image. The research on this topic has evolved through two paradigms: feature-based vector model and image ranker learning. The former relies on the image surrounding texts, while the latter learns a ranker based on human labeled query-image pairs. Each of the paradigms has its own limitation. The vector model is sensitive to the quality of text descriptions, and the learning paradigm...

Publication details
Date: 1 December 2015
Type: Inproceeding
Publisher: IEEE International Conference on Computer Vision
Jianlong Fu, Yue Wu, Tao Mei, Jinqiao Wang, Hanqing Lu, and Yong Rui

The development of deep learning has empowered machines with comparable capability of recognizing limited image categories to human beings. However, most existing approaches heavily rely on human-curated training data, which hinders the scalability to large and unlabeled vocabularies in image tagging. In this paper, we propose a weakly-supervised deep learning model which can be trained from the readily available Web images to relax the dependence on human labors and scale up to arbitrary tags...

Publication details
Date: 1 December 2015
Type: Inproceeding
Publisher: IEEE International Conference on Computer Vision
Xuyong Yang, Tao Mei, Ying-Qing Xu, Yong Rui, and Shipeng Li

Visual-textual presentation layout (e.g., digital magazine cover, poster, Power Point slides, and any other rich media), which combines beautiful image and overlaid readable texts, can result in an eye candy touch to attract users' attention. The designing of visual-textual presentation layout is therefore becoming ubiquitous in both commercially printed publications and online digital magazines. However, handcrafting aesthetically compelling layouts still remains challenging for many small businesses...

Publication details
Date: 1 November 2015
Type: Article
Publisher: ACM – Association for Computing Machinery
Dongwook Yoon, Nicholas Chen, François Guimbretière, and Abigail Sellen

This paper introduces a novel document annotation system that aims to enable the kinds of rich communication that usually only occur in face-to-face meetings. Our system, RichReview, lets users create annotations on top of digital documents using three main modalities: freeform inking, voice for narration, and deictic gestures in support of voice. RichReview uses novel visual representations and timesynchronization between modalities to simplify annotation access and navigation. Moreover, RichReview’s...

Publication details
Date: 1 October 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Jinkyu Lee and Ivan Tashev

This paper presents a speech emotion recognition system using a recurrent neural network (RNN) model trained by an efficient learning algorithm. The proposed system takes into account the long-range context effect and the uncertainty of emotional label expressions. To extract high-level representation of emotional states with regard to its temporal dynamics, a powerful learning method with a bidirectional long short-term memory (BLSTM) model is adopted. To overcome the uncertainty of emotional labels,...

Publication details
Date: 8 September 2015
Type: Inproceeding
Publisher: ISCA - International Speech Communication Association
Dorina Thanou, Philip A. Chou, and Pascal Frossard

This paper addresses the problem of motion estimation in 3D point cloud sequences that are characterized by moving 3D positions and color attributes. Motion estimation is key to effective compression of these sequences, but it remains a challenging problem as the temporally successive frames have varying sizes without explicit correspondence information. We represent the time-varying geometry of these sequences with a set of graphs, and consider 3D positions and color attributes of the points clouds as...

Publication details
Date: 1 September 2015
Type: Inproceeding
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Songtao He, Yunxin Liu, and Hucheng Zhou

The extremely-high display density of modern smartphones imposes a significant burden on power consumption, yet does not always provide an improved user experience and may even lead to a compromised user experience. As human visually-perceivable ability highly depends on the user-screen distance, a reduced display resolution may still achieve the same user experience when the user-screen distance is large. This provides new power-saving opportunities. In this paper, we present a exible dynamic...

Publication details
Date: 1 September 2015
Type: Proceedings
Publisher: ACM – Association for Computing Machinery
1–25 of 265
Sort
Show 25 | 50 | 100
1234567Next 
> Our research