Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Our research
Content type
+
Downloads (455)
+
Events (449)
 
Groups (151)
+
News (2742)
 
People (736)
 
Projects (1106)
+
Publications (12580)
+
Videos (5747)
Labs
Research areas
Algorithms and theory47205 (341)
Communication and collaboration47188 (215)
Computational linguistics47189 (243)
Computational sciences47190 (223)
Computer systems and networking47191 (762)
Computer vision208594 (911)
Data mining and data management208595 (106)
Economics and computation47192 (105)
Education47193 (85)
Gaming47194 (77)
Graphics and multimedia47195 (233)
Hardware and devices47196 (216)
Health and well-being47197 (92)
Human-computer interaction47198 (894)
Machine learning and intelligence47200 (893)
Mobile computing208596 (57)
Quantum computing208597 (32)
Search, information retrieval, and knowledge management47199 (691)
Security and privacy47202 (311)
Social media208598 (46)
Social sciences47203 (264)
Software development, programming principles, tools, and languages47204 (623)
Speech recognition, synthesis, and dialog systems208599 (138)
Technology for emerging markets208600 (32)
1–25 of 691
Sort
Show 25 | 50 | 100
1234567Next 
Bhanu Vattikonda, Vacha Dave, Saikat Guha, and Alex C. Scoeren
Publication details
Date: 1 October 2015
Type: Inproceeding
Alistair Moffat, Falk Scholer, Paul Thomas, and Peter Bailey

Evaluation of information retrieval systems with test collections makes use of a suite of fixed resources: a document corpus; a set of topics; and associated judgments of the relevance of each document to each topic. With large modern collections, exhaustive judging is not feasible. Therefore an approach called pooling is typically used where, for example, the documents to be judged can be determined by taking the union of all documents returned in the top positions of the answer lists returned...

Publication details
Date: 1 October 2015
Type: Article
Publisher: ACM – Association for Computing Machinery
Vasileios Lampos, Elad Yom-Tov, Richard Pebody, and Ingemar J. Cox

Assessing the effect of a health-oriented intervention by traditional epidemiological methods is commonly based only on population segments that use healthcare services. Here we introduce a complementary framework for evaluating the impact of a targeted intervention, such as a vaccination campaign against an infectious disease, through a statistical analysis of usergenerated content submitted on web platforms. Using supervised learning, we derive a nonlinear regression model for estimating the...

Publication details
Date: 7 September 2015
Type: Article
Publisher: Springer
Publication details
Date: 1 September 2015
Type: Proceedings
Publisher: ACL – Association for Computational Linguistics
Young-Bum Kim, Karl Stratos, Ruhi Sarikaya, and Minwoo Jeong

In natural language understanding (NLU), a user utterance can be labeled differently depending on the domain or application (e.g., weather vs. calendar). Standard domain adaptation techniques are not directly applicable to take advantage of the existing annotations because they assume that the label set is invariant. We propose a solution based on label embeddings induced from canonical correlation analysis (CCA) that reduces the problem to a standard domain adaptation task and allows use of a number of...

Publication details
Date: 29 August 2015
Type: Proceedings
Publisher: ACL – Association for Computational Linguistics
Young-Bum Kim, Karl Stratos, and Ruhi Sarikaya

In this paper, we apply the concept of pre-training to hidden-unit conditional random
fields (HUCRFs) to enable learning on unlabeled data. We present a simple yet effective pre-training technique that learns to associate words with their clusters, which are obtained in an unsupervised manner. The learned parameters are then used to initialize the supervised learning process. We also propose a word clustering technique based on canonical correlation analysis (CCA) that is sensitive to multiple word...

Publication details
Date: 28 August 2015
Type: Proceedings
Publisher: ACL – Association for Computational Linguistics
Bhaskar Mitra

Search logs contain examples of frequently occurring patterns of user reformulations of queries. Intuitively, the reformulation "san francisco" → "san francisco 49ers" is semantically similar to "detroit" →"detroit lions". Likewise, "london"→"things to do in london" and "new york"→"new york tourist attractions" can also be considered similar transitions in intent. The reformulation "movies" → "new movies" and "york" → "new york", however, are clearly different despite the lexical similarities in the two...

Publication details
Date: 1 August 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Chi Wang, Xueqing Liu, Yanglei Song, and Jiawei Han

Automatic construction of user-desired topical hierarchies over large volumes of text data is a highly desirable but challenging task. This study proposes to give users freedom to construct topical hierarchies via interactive operations such as expanding a branch and merging several branches. Existing hierarchical topic modeling techniques are inadequate for this purpose because (1) they cannot consistently preserve the topics when the hierarchy structure is modified; and (2) the slow inference prevents...

Publication details
Date: 1 August 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Emre Kıcıman and Matthew Richardson

Every day, people take action, trying to achieve their personal, high-order goals. People decide what actions to take based on their personal experience, knowledge and gut instinct. While this leads to positive outcomes for some people, many others do not have the necessary experience, knowledge and instinct to make good decisions. What if, rather than making decisions based solely on their own personal experience, people could take advantage of the reported experiences of hundreds of millions of other...

Publication details
Date: 1 August 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Anne Schuth, Katja Hofmann, and Filip Radlinski

The gold standard for online retrieval evaluation is AB testing. Rooted in the idea of a controlled experiment, AB tests compare the performance of an experimental system (treatment) on one sample of the user population, to that of a baseline system (control) on another sample. Given an online evaluation metric that accurately reflects user satisfaction, these tests enjoy high validity. However, due to the high variance across users, these comparisons often have low sensitivity, requiring millions of...

Publication details
Date: 1 August 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Bhanu Vattikonda, Santhosh Kodipaka, Hongyan Zhou, Vacha Dave, Saikat Guha, and Alex C. Scoeren
Publication details
Date: 1 August 2015
Type: Inproceeding
Peter Bailey, Alistair Moffat, Falk Scholer, and Paul Thomas

Test collection design eliminates sources of user variability to make statistical comparisons among information retrieval (IR) systems more affordable. Does this choice unnecessarily limit generalizability of the outcomes to real usage scenarios? We explore two aspects of user variability with regard to evaluating the relative performance of IR systems, assessing effectiveness in the context of a subset of topics from three TREC collections, with the embodied information needs categorized against three...

Publication details
Date: 1 August 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Zhongyuan Wang, Kejun Zhao, Haixun Wang, Xiaofeng Meng, and Ji-Rong Wen

The goal of query conceptualization is to map instances in a query to concepts defined in a certain ontology or knowledge base. Queries usually do not observe the syntax of a written language, nor do they contain enough signals for statistical inference. However, the available context, i.e., the verbs related to the instances, the adjectives and attributes of the instances, do provide valuable clues to understand instances. In this paper, we first mine a variety of relations among terms from a large web...

Publication details
Date: 1 July 2015
Type: Inproceeding
Elad Yom-Tov, Ingemar Johansson-Cox, Vasileios Lampos, and Andrew C. Hayward

Knowledge of the secondary attack rate (SAR) and serial interval (SI) of influenza is important for assessing the severity of seasonal epidemics of the virus. To date, such estimates have required extensive surveys of target populations. Here, we propose a method for estimating the intrafamily SAR and SI from postings on the Twitter social network. This estimate is derived from a large number of people reporting ILI symptoms in them and\or their immediate family members.

We analyze data from the...

Publication details
Date: 9 June 2015
Type: Article
Publisher: Wiley
Sarah K. Tyler, Jaime Teevan, Peter Bailey, Sebastian de la Chica, and Nikhil Dandekar

Information on almost any given topic can be found on the Web, often accessible via many different websites. But even when the topical content is similar across websites, the websites can have different characteristics that appeal to different people. As a result, individuals can develop preferred websites to visit for certain topics. While it has long been speculated that such preferences exist, little is understood about how prevalent, clear, and stable these preferences actually are. We characterize...

Publication details
Date: 1 June 2015
Type: Technical report
Publisher: Microsoft Research
Number: MSR-TR-2015-48
Jialu Liu, Jingbo Shang, Chi Wang, Xiang Ren, and Jiawei Han

Text data are ubiquitous and play an essential role in big data applications. However, text data are mostly unstructured. Transforming unstructured text into structured units (e.g., semantically meaningful phrases) will substantially reduce semantic ambiguity and enhance the power and efficiency at manipulating such data using database technology. Thus mining quality phrases is a critical research problem in the field of databases. In this paper, we propose a new framework that extracts quality...

Publication details
Date: 1 June 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Fotis Psallidas, Bolin Ding, Kaushik Chakrabarti, and Surajit Chaudhuri

An enterprise information worker is often aware of a few example tuples that should be present in the output of the query. Query discovery systems have been developed to discover project-join queries that contain the given example tuples in their output. However, they require the output to exactly contain all the example tuples and do not perform any ranking. To address this limitation, we study the problem of efficiently discovering top-k project join queries which approximately contain the...

Publication details
Date: 1 June 2015
Type: Proceedings
Publisher: ACM – Association for Computing Machinery
Yiwei Chen and Katja Hofmann

Online learning to rank holds great promise for learning personalized search result rankings. First algorithms have been proposed, namely absolute feedback approaches, based on contextual bandits learning; and relative feedback approaches, based on gradient methods and inferred preferences between complete result rankings. Both types of approaches have shown promise, but they have not previously been compared to each other. It is therefore unclear which type of...

Publication details
Date: 20 May 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Elad Yom-Tov

Syndromic surveillance refers to the analysis of medical information for the purpose of detecting outbreaks of disease earlier than would have been possible otherwise and to estimate the prevalence of the disease in a population. Internet data, especially search engine queries and social media postings, have shown promise in contributing to syndromic surveillance for in uenza and dengue fever. Here we focus on the recent outbreak of Ebola Virus Disease and ask whether three major sources of Internet...

Publication details
Date: 18 May 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, and Kuansan Wang

In this paper we describe a new release of a Web scale entity graph that serves as the backbone of Microsoft Academic Service (MAS), a major production effort with a broadened scope to the namesake vertical search engine that has been publicly available since 2008 as a research prototype. At the core of MAS is a heterogeneous entity graph comprised of six types of entities that model the scholarly activities: field of study, author, institution, paper, venue, and event. In addition to obtaining these...

Publication details
Date: 18 May 2015
Type: Inproceeding
Publisher: WWW – World Wide Web Consortium (W3C)
Kuansan Wang

Human is the only species on earth that has mastered the technologies in writing and printing to capture ephemeral thoughts and scientific discoveries. The capabilities to pass along knowledge, not only geographically but also generationally, have formed the bedrock of our civilizations. We are in the midst of a silent revolution driven by the technological advancements: no longer are computers just a fixture of our physical world but have they been so deeply woven into our daily routines that they are...

Publication details
Date: 18 May 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Yi Wei, Nirupama Chandrasekaran, Sumit Gulwani, and Youssef Hamadi

Software developers heavily rely on code snippets and API usage examples searched on the Internet. This paper presents Bing Code Search, a Visual Studio extension that allows developers to write, within an IDE, free-form natural language questions, and get C# code snippets answering those questions. Bing Code Search automatically adapts the suggested snippets into the user’s programming context via variable renaming, and records users’ interactions to improve its suggestions. Compared to prior related...

Publication details
Date: 11 May 2015
Type: Technical report
Number: MSR-TR-2015-36
Helen J. Wang, Alexander Moshchuk, Michael Gamon, Mona Haraty, Shamsi Iqbal, Eli T. Brown, Ashish Kapoor, Chris Meek, Eric Chen, Yuan Tian, Jaime Teevan, Mary Czerwinski, and Susan Dumais

In this paper, we advocate “activity” to be a central abstraction between people and computing instead of applications. We outline the vision of the activity platform as the next-generation social platform.

Publication details
Date: 8 May 2015
Type: Technical report
Publisher: Microsoft Research
Number: MSR-TR-2015-38
Ryen W. White, Matthew Richardson, and Wen-tau Yih

Search systems traditionally require searchers to formulate information needs as keywords rather than in a more natural form, such as questions. Recent studies have found that Web search engines are observing an increase in the fraction of queries phrased as natural language. As part of building better search engines, it is important to understand the nature and prevalence of these intentions, and the impact of this increase on search engine performance. In this work, we show that while 10.3% of queries...

Publication details
Date: 1 May 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Chi Wang, Kaushik Chakrabarti, Yeye He, Kris Ganjam, Zhimin Chen, and Phil A. Bernstein

We study the following problem: given the name of an ad-hoc concept as well as a few seed entities belonging to the concept, output all entities belonging to it. Since producing the exact set of entities is hard, we focus on returning a ranked list of entities belonging to the concept. Previous approaches either use seed entities as the only input, or inherently require negative examples. They suffer from input ambiguity and semantic drift, or are not viable options for ad-hoc tail concepts. In this...

Publication details
Date: 1 May 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
1–25 of 691
Sort
Show 25 | 50 | 100
1234567Next 
> Our research