Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Our research
Content type
+
Downloads (457)
+
Events (451)
 
Groups (152)
+
News (2759)
 
People (734)
 
Projects (1111)
+
Publications (12625)
+
Videos (5807)
Labs
Research areas
Algorithms and theory47205 (342)
Communication and collaboration47188 (215)
Computational linguistics47189 (249)
Computational sciences47190 (224)
Computer systems and networking47191 (767)
Computer vision208594 (911)
Data mining and data management208595 (120)
Economics and computation47192 (105)
Education47193 (86)
Gaming47194 (79)
Graphics and multimedia47195 (235)
Hardware and devices47196 (216)
Health and well-being47197 (92)
Human-computer interaction47198 (898)
Machine learning and intelligence47200 (907)
Mobile computing208596 (63)
Quantum computing208597 (35)
Search, information retrieval, and knowledge management47199 (699)
Security and privacy47202 (317)
Social media208598 (53)
Social sciences47203 (267)
Software development, programming principles, tools, and languages47204 (625)
Speech recognition, synthesis, and dialog systems208599 (138)
Technology for emerging markets208600 (32)
1–25 of 699
Sort
Show 25 | 50 | 100
1234567Next 
Jianpeng Cheng, Zhongyuan Wang, Ji-Rong Wen, Jun Yan, and Zheng Chen

Representing discrete words in a continuous vector space turns out to be useful for natural language applications related to text understanding. Meanwhile, it poses extensive challenges, one of which is due to the polysemous nature of human language. A common solution (a.k.a word sense induction) is to separate each word into multiple senses and create a representation for each sense respectively. However, this approach is usually computationally expensive and prone to data sparsity, since each sense...

Publication details
Date: 1 October 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Bhanu Vattikonda, Vacha Dave, Saikat Guha, and Alex C. Scoeren
Publication details
Date: 1 October 2015
Type: Inproceeding
Bhaskar Mitra and Nick Craswell

Query auto-completion (QAC) systems typically suggest queries that have previously been observed in search logs. Given a partial user query, the system looks up this query prefix against a precomputed set of candidates, then orders them using ranking signals such as popularity. Such systems can only recommend queries for prefixes that have been previously seen by the search engine with adequate frequency. They fail to recommend if the prefix is sufficiently rare such that it has no matches in the...

Publication details
Date: 1 October 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Alistair Moffat, Falk Scholer, Paul Thomas, and Peter Bailey

Evaluation of information retrieval systems with test collections makes use of a suite of fixed resources: a document corpus; a set of topics; and associated judgments of the relevance of each document to each topic. With large modern collections, exhaustive judging is not feasible. Therefore an approach called pooling is typically used where, for example, the documents to be judged can be determined by taking the union of all documents returned in the top positions of the answer lists returned...

Publication details
Date: 1 October 2015
Type: Article
Publisher: ACM – Association for Computing Machinery
Zhongyuan Wang, Haixun Wang, Ji-Rong Wen, and Yanghua Xiao

Humans understand the world by classifying objects into an appropriate level of categories. This process is often automatic and subconscious. Psychologists and linguists call it as Basic-level Categorization (BLC). BLC can benefit lots of applications such as knowledge panel, advertising and recommendation. However, how to quantify basic-level concepts is still an open problem. Recently, much work focuses on constructing knowledge bases or semantic networks from web scale text corpora, which makes it...

Publication details
Date: 1 October 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Vasileios Lampos, Elad Yom-Tov, Richard Pebody, and Ingemar J. Cox

Assessing the effect of a health-oriented intervention by traditional epidemiological methods is commonly based only on population segments that use healthcare services. Here we introduce a complementary framework for evaluating the impact of a targeted intervention, such as a vaccination campaign against an infectious disease, through a statistical analysis of usergenerated content submitted on web platforms. Using supervised learning, we derive a nonlinear regression model for estimating the...

Publication details
Date: 7 September 2015
Type: Article
Publisher: Springer
Publication details
Date: 1 September 2015
Type: Proceedings
Publisher: ACL – Association for Computational Linguistics
Paul N. Bennett, Milad Shokouhi, and Rich Caruana

Interaction data such as clicks and dwells provide valuable signals for learning and evaluating personalized models. However, while models of personalization typically distinguish between clicked and non-clicked results, no preference distinctions within the nonclicked results are made and all are treated as equally non-relevant.

In this paper, we demonstrate that failing to enforce a prior on preferences among non-clicked results leads to learning models that often personalize with no measurable...

Publication details
Date: 1 September 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Fiana Raiber, Oren Kurland, Filip Radlinski, and Milad Shokouhi

Several applications in information retrieval rely on asymmetric co-relevance estimation; that is, estimating the relevance of a document to a query under the assumption that another document is relevant. We present a supervised model for learning an asymmetric co-relevance estimate. The model uses different types of similarities with the assumed relevant document and the query, as well as document-quality measures. Empirical evaluation demonstrates the merits of using the co-relevance estimate in...

Publication details
Date: 1 September 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Young-Bum Kim, Karl Stratos, Ruhi Sarikaya, and Minwoo Jeong

In natural language understanding (NLU), a user utterance can be labeled differently depending on the domain or application (e.g., weather vs. calendar). Standard domain adaptation techniques are not directly applicable to take advantage of the existing annotations because they assume that the label set is invariant. We propose a solution based on label embeddings induced from canonical correlation analysis (CCA) that reduces the problem to a standard domain adaptation task and allows use of a number of...

Publication details
Date: 29 August 2015
Type: Proceedings
Publisher: ACL – Association for Computational Linguistics
Young-Bum Kim, Karl Stratos, and Ruhi Sarikaya

In this paper, we apply the concept of pre-training to hidden-unit conditional random
fields (HUCRFs) to enable learning on unlabeled data. We present a simple yet effective pre-training technique that learns to associate words with their clusters, which are obtained in an unsupervised manner. The learned parameters are then used to initialize the supervised learning process. We also propose a word clustering technique based on canonical correlation analysis (CCA) that is sensitive to multiple word...

Publication details
Date: 28 August 2015
Type: Proceedings
Publisher: ACL – Association for Computational Linguistics
Manish Gupta

The 2011 Cricket World Cup final match was watched by around 135 million people. Such a huge viewership demands a great experience for users of online cricket portals. Many portals like espncricinfo.com host a variety of content related to recent matches including match reports and ball-by-ball commentaries. When reading a match report, reader experience can be significantly improved by augmenting (on demand) the event mentions in the report with detailed commentaries. We build an event linking system...

Publication details
Date: 9 August 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Chi Wang, Xueqing Liu, Yanglei Song, and Jiawei Han

Automatic construction of user-desired topical hierarchies over large volumes of text data is a highly desirable but challenging task. This study proposes to give users freedom to construct topical hierarchies via interactive operations such as expanding a branch and merging several branches. Existing hierarchical topic modeling techniques are inadequate for this purpose because (1) they cannot consistently preserve the topics when the hierarchy structure is modified; and (2) the slow inference prevents...

Publication details
Date: 1 August 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Peter Bailey, Alistair Moffat, Falk Scholer, and Paul Thomas

Test collection design eliminates sources of user variability to make statistical comparisons among information retrieval (IR) systems more affordable. Does this choice unnecessarily limit generalizability of the outcomes to real usage scenarios? We explore two aspects of user variability with regard to evaluating the relative performance of IR systems, assessing effectiveness in the context of a subset of topics from three TREC collections, with the embodied information needs categorized against three...

Publication details
Date: 1 August 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Bhaskar Mitra

Search logs contain examples of frequently occurring patterns of user reformulations of queries. Intuitively, the reformulation "san francisco" → "san francisco 49ers" is semantically similar to "detroit" →"detroit lions". Likewise, "london"→"things to do in london" and "new york"→"new york tourist attractions" can also be considered similar transitions in intent. The reformulation "movies" → "new movies" and "york" → "new york", however, are clearly different despite the lexical similarities in the two...

Publication details
Date: 1 August 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Emre Kıcıman and Matthew Richardson

Every day, people take action, trying to achieve their personal, high-order goals. People decide what actions to take based on their personal experience, knowledge and gut instinct. While this leads to positive outcomes for some people, many others do not have the necessary experience, knowledge and instinct to make good decisions. What if, rather than making decisions based solely on their own personal experience, people could take advantage of the reported experiences of hundreds of millions of other...

Publication details
Date: 1 August 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Anne Schuth, Katja Hofmann, and Filip Radlinski

The gold standard for online retrieval evaluation is AB testing. Rooted in the idea of a controlled experiment, AB tests compare the performance of an experimental system (treatment) on one sample of the user population, to that of a baseline system (control) on another sample. Given an online evaluation metric that accurately reflects user satisfaction, these tests enjoy high validity. However, due to the high variance across users, these comparisons often have low sensitivity, requiring millions of...

Publication details
Date: 1 August 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Bhanu Vattikonda, Santhosh Kodipaka, Hongyan Zhou, Vacha Dave, Saikat Guha, and Alex C. Scoeren
Publication details
Date: 1 August 2015
Type: Inproceeding
Xiang Ren, Ahmed El-Kishky, Chi Wang, Fangbo Tao, Clare R. Voss, Heng Ji, and Jiawei Han

Entity recognition is an important but challenging research problem. In reality, many text collections are from specific, dynamic, or emerging domains, which poses significant new challenges for entity recognition with increase in name ambiguity and context sparsity, requiring entity detection without domain restriction. In this paper, we investigate entity recognition (ER) with distant-supervision and propose a novel relation phrase-based ER framework, called ClusType, that runs...

Publication details
Date: 1 August 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Paul Bennett, Alexander Fishkov, and Emre Kıcıman

Many information retrieval tasks involve searching on behalf of others. Example scenarios include searching for a present to give a friend, trying to find “cool” clothes for a teenage child, looking for medical supplies for an elderly relative [1], or planning a group activity that many friends will enjoy. In this paper, we use demographically annotated web search logs to present a large-scale study of such “on behalf of” searches. We develop an exploratory technique for recognizing such searches, and...

Publication details
Date: 1 August 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Zhongyuan Wang, Kejun Zhao, Haixun Wang, Xiaofeng Meng, and Ji-Rong Wen

The goal of query conceptualization is to map instances in a query to concepts defined in a certain ontology or knowledge base. Queries usually do not observe the syntax of a written language, nor do they contain enough signals for statistical inference. However, the available context, i.e., the verbs related to the instances, the adjectives and attributes of the instances, do provide valuable clues to understand instances. In this paper, we first mine a variety of relations among terms from a large web...

Publication details
Date: 1 July 2015
Type: Inproceeding
Elad Yom-Tov, Ingemar Johansson-Cox, Vasileios Lampos, and Andrew C. Hayward

Knowledge of the secondary attack rate (SAR) and serial interval (SI) of influenza is important for assessing the severity of seasonal epidemics of the virus. To date, such estimates have required extensive surveys of target populations. Here, we propose a method for estimating the intrafamily SAR and SI from postings on the Twitter social network. This estimate is derived from a large number of people reporting ILI symptoms in them and\or their immediate family members.

We analyze data from the...

Publication details
Date: 9 June 2015
Type: Article
Publisher: Wiley
Jialu Liu, Jingbo Shang, Chi Wang, Xiang Ren, and Jiawei Han

Text data are ubiquitous and play an essential role in big data applications. However, text data are mostly unstructured. Transforming unstructured text into structured units (e.g., semantically meaningful phrases) will substantially reduce semantic ambiguity and enhance the power and efficiency at manipulating such data using database technology. Thus mining quality phrases is a critical research problem in the field of databases. In this paper, we propose a new framework that extracts quality...

Publication details
Date: 1 June 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Sarah K. Tyler, Jaime Teevan, Peter Bailey, Sebastian de la Chica, and Nikhil Dandekar

Information on almost any given topic can be found on the Web, often accessible via many different websites. But even when the topical content is similar across websites, the websites can have different characteristics that appeal to different people. As a result, individuals can develop preferred websites to visit for certain topics. While it has long been speculated that such preferences exist, little is understood about how prevalent, clear, and stable these preferences actually are. We characterize...

Publication details
Date: 1 June 2015
Type: Technical report
Publisher: Microsoft Research
Number: MSR-TR-2015-48
Fotis Psallidas, Bolin Ding, Kaushik Chakrabarti, and Surajit Chaudhuri

An enterprise information worker is often aware of a few example tuples that should be present in the output of the query. Query discovery systems have been developed to discover project-join queries that contain the given example tuples in their output. However, they require the output to exactly contain all the example tuples and do not perform any ranking. To address this limitation, we study the problem of efficiently discovering top-k project join queries which approximately contain the...

Publication details
Date: 1 June 2015
Type: Proceedings
Publisher: ACM – Association for Computing Machinery
1–25 of 699
Sort
Show 25 | 50 | 100
1234567Next 
> Our research