Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Our research
Content type
+
Downloads (444)
+
Events (399)
 
Groups (150)
+
News (2603)
 
People (740)
 
Projects (1064)
+
Publications (12052)
+
Videos (5287)
Labs
Research areas
Algorithms and theory47205 (272)
Communication and collaboration47188 (189)
Computational linguistics47189 (188)
Computational sciences47190 (197)
Computer systems and networking47191 (685)
Computer vision208594 (875)
Data mining and data management208595 (69)
Economics and computation47192 (95)
Education47193 (79)
Gaming47194 (69)
Graphics and multimedia47195 (204)
Hardware and devices47196 (196)
Health and well-being47197 (78)
Human-computer interaction47198 (786)
Machine learning and intelligence47200 (750)
Mobile computing208596 (34)
Quantum computing208597 (19)
Search, information retrieval, and knowledge management47199 (621)
Security and privacy47202 (271)
Social media208598 (23)
Social sciences47203 (244)
Software development, programming principles, tools, and languages47204 (559)
Speech recognition, synthesis, and dialog systems208599 (76)
Technology for emerging markets208600 (25)
1–25 of 622
Sort
Show 25 | 50 | 100
1234567Next 
Wen Hua, Zhongyuan Wang, Haixun Wang, Kai Zheng, and Xiaofang Zhou

Understanding short texts is crucial to many applications, but challenges abound. First, short texts do not always observe the syntax of a written language. As a result, traditional natural language processing methods cannot be easily applied. Second, short texts usually do not contain suffi cient statistical signals to support many state-of-the-art approaches for text processing such as topic modeling. Third, short texts are usually more ambiguous. We argue that knowledge is needed in order to better...

Publication details
Date: 1 April 2015
Type: Inproceeding
Publication details
Date: 1 February 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Larry Heck and Hongzhao Huang

This paper presents an unsupervised neural knowledge graph embedding model and a coherence-based approach for semantic parsing of Twitter dialogs. The approach learns embeddings directly from knowledge graphs and scales to all of Wikipedia. Experiments show a 23.6% reduction in semanticparsing errors compared to the previously best reported results.

Publication details
Date: 1 December 2014
Type: Inproceeding
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Fang Wang, Zhongyuan Wang, Senzhang Wang, and Zhoujun Li

Keyphrase extraction is essential for many IR and NLP tasks. Existing methods usually use the phrases of the document separately without distinguishing the potential semantic correlations among them, or other statistical features from knowledge bases such as WordNet and Wikipedia. However, the mutual semantic information between phrases is also important, and exploiting their correlations may potentially help us more effectively extract the keyphrases. Generally, phrases in the title are more likely to...

Publication details
Date: 1 December 2014
Type: Inproceeding
Michael J. Paul, Ryen W. White, and Eric Horvitz

We seek to understand the evolving needs of people who are faced with a life-changing medical diagnosis based on analyses of queries extracted from an anonymized search query log. Focusing on breast cancer, we manually tag a set of Web searchers as showing disruptive shifts in focus of attention and long-term patterns of search behavior consistent with the diagnosis and treatment of breast cancer. We build and apply probabilistic classifiers to detect these searchers from multiple sessions and to detect...

Publication details
Date: 15 November 2014
Type: Technical report
Publisher: Microsoft Research
Number: MSR-TR-2014-144
Rakesh Agrawal, Sreenivas Gollapudi, Anitha Kannan, and Krishnaram Kenthapadi

The rapid proliferation of hand-held devices has led to the development of rich, interactive and immersive applications, such as e-readers for electronic books. These applications motivate retrieval systems that can implicitly satisfy any information need of the reader by exploiting the context of the user’s interactions. Such retrieval systems differ from traditional search engines in that the queries constructed using the context are typically complex objects (including the document and its...

Publication details
Date: 4 November 2014
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Sreenivas Gollapudi and Debmalya Panigrahi

where A key characteristic of a successful online market is the large specific participation of agents (producers and consumers) on both definition sides of the market. While there has been a long line of tion problems, impressive work on understanding such markets in terms of main revenue maximizing (also called max-sum) objectives, par- • ticularly in the context of allocating online impressions to interested advertisers, fairness considerations have surprisingly not received much attention in online...

Publication details
Date: 4 November 2014
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Yelong Shen, Xiaodong He, Jianfeng Gao, Li Deng, and Gregoire Mesnil

In this paper, we propose a new latent semantic model that incorporates a convolutional-pooling structure over word sequences to learn low-dimensional, semantic vector representations for search queries and Web documents. In order to capture the rich contextual structures in a query or a document, we start with each word within a temporal context window in a word sequence to directly capture contextual features at the word n-gram level. Next, the salient word n-gram features in the word sequence are...

Publication details
Date: 1 November 2014
Type: Inproceeding
Publisher: CIKM
Katja Hofmann, Bhaskar Mitra, Filip Radlinski, and Milad Shokouhi

Query Auto Completion (QAC) suggests possible queries to web search users from the moment they start entering a query. This popular feature of web search engines is thought to reduce physical and cognitive effort when formulating a query.

Perhaps surprisingly, despite QAC being widely used, users’ interactions with it are poorly understood. This paper begins to address this gap. We present the results of an in-depth user study of user interactions with QAC in web search. While study participants...

Publication details
Date: 1 November 2014
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Emine Yilmaz, Manisha Verma, Nick Craswell, Filip Radlinski, and Peter Bailey

Relevance judgments sit at the core of test collection construction, and are assumed to model the utility of documents to real users. However, comparisons of judgments with signals of relevance obtained from real users, such as click counts and dwell time, have demonstrated a systematic mismatch.

In this paper, we study one important source of the mismatch between user data and relevance judgments: Those due to the high degree of effort required by users to identify and consume the information in...

Publication details
Date: 1 November 2014
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Chi Wang, Kaushik Chakrabarti, Yeye He, Kris Ganjam, Zhimin Chen, and Philip A. Bernstein

We study the following problem: given the name of an ad-hoc concept as well as a few seed entities belonging to the concept, output all entities belonging to it. Since producing the exact set of entities is hard, we focus on returning a ranked list of entities belonging to the concept. Previous approaches either use seed entities as the only input, or inherently require negative examples. They suffer from input ambiguity and semantic drift, or are not viable options for ad-hoc tail concepts. In this...

Publication details
Date: 1 November 2014
Type: Technical report
Publisher: Microsoft Research
Number: MSR-TR-2014-143
Publication details
Date: 1 November 2014
Type: Inproceeding
Publication details
Date: 1 November 2014
Type: Inproceeding
Fang Wang, Zhongyuan Wang, Zhoujun Li, and Ji-Rong Wen

Most existing approaches for text classification represent texts as vectors of words, namely “Bag-of-Words.” This text representation results in a very high dimensionality of feature space and frequently suffers from surface mismatching. Short texts make these issues even more serious, due to their shortness and sparsity. In this paper, we propose using “Bag-of-Concepts” in short text representation, aiming to avoid the surface mismatching and handle the synonym and polysemy problem. Based on...

Publication details
Date: 1 October 2014
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen

We examine the embedding approach to reason new relational facts from a large-scale knowledge graph and a text corpus. We propose a novel method of jointly embedding entities and words into the same continuous vector space. The embedding process attempts to preserve the relations between entities in the knowledge graph and the concurrences of words in the text corpus. Entity names and Wikipedia anchors are utilized to align the embeddings of entities and words in the same space. Large scale experiments...

Publication details
Date: 1 October 2014
Type: Inproceeding
Publisher: ACL – Association for Computational Linguistics
James Cook, Abhimanyu Das, Krishnaram Kenthapadi, and Nina Mishra

A discussion group is a repeated, synchronized conversation organized around a specific topic. Groups are extremely valuable to the attendees, creating a sense of community among like-minded users. While groups may involve many users, there are many outside the group that would benefit from participation. However, finding the right group is not easy given their quantity and given topic overlap. We study the following problem: given a search query, find a good ranking of discussion groups. We...

Publication details
Date: 1 October 2014
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Jianfeng Gao, Patrick Pantel, Michael Gamon, Xiaodong He, Li Deng, and Yelong Shen

This paper presents a deep semantic model (DSM) for recommending target documents to be of interest to a user based on a source document she is reading. We observe, identify, and detect naturally occurring signals of interestingness in click transitions on the Web between source and target documents, which we collect from commercial Web browser logs. The DSM is trained on millions of Web transitions, and maps source-target document pairs to feature vectors in a latent space in such a...

Publication details
Date: 1 October 2014
Type: Proceedings
Publisher: EMNLP
Ahmad Bassiouny and Motaz El Saban

We introduce a novel approach towards scene recognition using semantic segmentation maps as image representation. Given a set of images and a list of possible categories for each image, our goal is to assign a category from that list to each image. Our approach is based on representing an image by its semantic segmentation map, which is a mapping from each pixel to a pre-defined set of labels. Among similar high-level approaches, ours has the capability of not only representing what semantic labels the...

Publication details
Date: 1 September 2014
Type: Inproceeding
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Mihai Budiu, Gordon Plotkin, Yuan Yu, and Li Zhang

We present JPath, a JSON database query language, and its syntax, semantics, and implementation. We introduce an indexing data structure for answering JPath queries, and provide a theory unifying query execution on data and index trees using operations on matrices with lattice-valued elements.

Publication details
Date: 1 September 2014
Type: Technical report
Publisher: Microsoft Research
Number: MSR-TR-2014-129
Siyu Qiu, Qing Cui, Jiang Bian, Bin Gao, and Tie-Yan Liu

The techniques of using neural networks to learn distributed word representations (i.e., word embeddings) have been used to solve a variety of natural language processing tasks. The recently proposed methods, such as CBOW and Skip-gram, have demonstrated their effectiveness in learning word embeddings based on context information such that the obtained word embeddings can capture both semantic and syntactic relationships between words. However, it is quite challenging to produce high-quality word...

Publication details
Date: 1 August 2014
Type: Inproceeding
Edith Cohen, Daniel Delling, Thomas Pajor, and Renato Werneck

Propagation of contagion through networks is a fundamental process. It is used to model the spread of information, influence, or a viral infection. Diffusion patterns can be specified by a probabilistic model, such as Independent Cascade (IC), or captured by a set of representative traces.

Basic computational problems in the study of diffusion are influence queries (determining the potency of a specified seed set of nodes) and Influence Maximization (identifying the...

Publication details
Date: 1 August 2014
Type: Technical report
Publisher: Microsoft Research
Number: MSR-TR-2014-110
Fei Tian, Hanjun Dai, Jiang Bian, Bin Gao, Rui Zhang, Enhong Chen, and Tie-Yan Liu

Distributed word representations have been widely used and proven to be useful in quite a few natural language processing and text mining tasks. Most of existing word embedding models aim at generating only one embedding vector for each individual word, which, however, limits their effectiveness because huge amounts of words are polysemous (such as \emph{bank} and \emph{star}). To address this problem, it is necessary to build multi embedding vectors to represent different meanings of a word...

Publication details
Date: 1 August 2014
Type: Inproceeding
Sandeep Panem, Manish Gupta, and Vasudeva Varma

As soon as natural disaster events happen, users are eager to know more about them. However, search engines currently provide a ten blue links interface for queries related to such events. Relevance of results for such queries can be significantly improved if users are shown a structured summary of the fresh events related to such queries. This would not just reduce the number of user clicks to get the relevant information but would also help users get updated with more fine grained attribute-level...

Publication details
Date: 1 August 2014
Type: Inproceeding
Rishiraj Saha Roy, Rahul Katare, Niloy Ganguly, Srivatsan Laxman, and Monojit Choudhury

Identifying and interpreting user intent are fundamental to semantic search. In this paper, we investigate the association of intent with individual words of a search query. We propose that words in queries can be classified as either content or intent, where content words represent the central topic of the query, while users add intent words to make their requirements more explicit. We argue that intelligent processing of intent words can be vital to improving the result quality, and...

Publication details
Date: 1 August 2014
Type: Article
Publisher: Elsevier
Edith Cohen

Distance queries are a basic tool in data analysis. They are used for detection and localization of change for the purpose of anomaly detection, monitoring, or planning. Distance queries are particularly useful when data sets such as measurements, snapshots of a system, content, traffic matrices, and activity logs are collected repeatedly.

Random sampling, which can be efficiently performed over streamed or distributed data, is an important tool for scalable data analysis. The sample...

Publication details
Date: 1 August 2014
Type: Technical report
Publisher: ACM – Association for Computing Machinery
Number: MSR-TR-2014-111
1–25 of 622
Sort
Show 25 | 50 | 100
1234567Next 
> Our research