Our research
Content type
+
Downloads (435)
+
Events (390)
 
Groups (149)
+
News (2560)
 
People (820)
 
Projects (1055)
+
Publications (11851)
+
Videos (5114)
Labs
Research areas
Algorithms and theory47205 (247)
Communication and collaboration47188 (182)
Computational linguistics47189 (173)
Computational sciences47190 (180)
Computer systems and networking47191 (656)
Computer vision208594 (27)
Data mining and data management208595 (48)
Economics and computation47192 (89)
Education47193 (77)
Gaming47194 (66)
Graphics and multimedia47195 (190)
Hardware and devices47196 (187)
Health and well-being47197 (69)
Human-computer interaction47198 (758)
Machine learning and intelligence47200 (686)
Mobile computing208596 (23)
Quantum computing208597 (8)
Search, information retrieval, and knowledge management47199 (601)
Security and privacy47202 (255)
Social media208598 (13)
Social sciences47203 (228)
Software development, programming principles, tools, and languages47204 (528)
Speech recognition, synthesis, and dialog systems208599 (44)
Technology for emerging markets208600 (24)
1–25 of 173
Sort
Show 25 | 50 | 100
1234567Next 
Yelong Shen, Xiaodong He, Jianfeng Gao, Li Deng, and Gregoire Mesnil

In this paper, we propose a new latent semantic model that incorporates a convolutional-pooling structure over word sequences to learn low-dimensional, semantic vector representations for search queries and Web documents. In order to capture the rich contextual structures in a query or a document, we start with each word within a temporal context window in a word sequence to directly capture contextual features at the word n-gram level. Next, the salient word n-gram features in the word sequence are...

Publication details
Date: 1 November 2014
Type: Inproceeding
Publisher: CIKM
Jianfeng Gao, Patrick Pantel, Michael Gamon, Xiaodong He, Li Deng, and Yelong Shen

This paper presents a deep semantic model (DSM) for recommending target documents to be of interest to a user based on a source document she is reading. We observe, identify, and detect naturally occurring signals of interestingness in click transitions on the Web between source and target documents, which we collect from commercial Web browser logs. The DSM is trained on millions of Web transitions, and maps source-target document pairs to feature vectors in a latent space in such a...

Publication details
Date: 1 October 2014
Type: Proceedings
Publisher: EMNLP
Kai-Wei Chang, Wen-tau Yih, Bishan Yang, and Christopher Meek

While relation extraction has traditionally been viewed as a task relying solely on textual data, recent work has shown that by taking as input existing facts in the form of entity-relation triples from both knowledge bases and textual data, the performance of relation extraction can be improved significantly. Following this new paradigm, we propose a tensor decomposition approach for knowledge base embedding that is highly scalable, and is especially suitable for relation extraction. By leveraging...

Publication details
Date: 1 October 2014
Type: Inproceeding
Publisher: ACL – Association for Computational Linguistics
Publication details
Date: 1 August 2014
Type: Technical report
Publisher: Microsoft Research
Number: MSR-TR-2014-109
Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen

We deal with embedding a large scale knowledge graph composed of entities and relations into a continuous vector space. TransE is a promising method proposed recently, which is very efficient while achieving state-of-the-art predictive performance. We discuss some mapping properties of relations which should be considered in embedding, such as reflexive, one-to-many, many-to-one, and many-to-many. We note that TransE does not do well in dealing with these properties. Some complex models are capable of...

Publication details
Date: 1 July 2014
Type: Inproceeding
Publisher: AAAI - Association for the Advancement of Artificial Intelligence
Jiajun Zhang, Shujie Liu, Mu Li, Ming Zhou, and Chengqing Zong
Publication details
Date: 1 July 2014
Type: Proceedings
Publisher: AAAI - Association for the Advancement of Artificial Intelligence
Patrick Pantel, Michael Gamon, and Ariel Fuxman

Natural touch interfaces, common now in devices such as tablets and smartphones, make it cumbersome for users to select text. There is a need for a new text selection paradigm that goes beyond the high acuity selection-by-mouse that we have relied on for decades. In this paper, we introduce such a paradigm, called Smart Selection, which aims to recover a user’s intended text selection from her touch input. We model the problem using an ensemble learning approach, which leverages multiple linguistic...

Publication details
Date: 6 June 2014
Type: Inproceeding
Publisher: ACL – Association for Computational Linguistics
Wen-tau Yih, Xiaodong He, and Christopher Meek

We develop a semantic parsing framework based on semantic similarity for open domain question answering (QA). We focus on single-relation questions and decompose each question into an entity mention and a relation pattern. Using convolutional neural network models, we measure the similarity of entity mentions with entities in the knowledge base (KB) and the similarity of relation patterns and relations in the KB. We score relational triples in the KB using these measures and select the top scoring...

Publication details
Date: 1 June 2014
Type: Inproceeding
Publisher: Association for Computational Linguistics
Jianfeng Gao, Xiaodong He, Wen-tau Yih, and Li Deng

This paper tackles the sparsity problem in estimating phrase translation probabilities by learning continuous phrase representations, whose distributed nature enables the sharing of related phrases in their repre-sentations. A pair of source and target phrases are projected into continuous-valued vector representations in a low-dimensional latent space, where their translation score is computed by the distance between the pair in this new space. The projection is performed by a neural network whose...

Publication details
Date: 1 June 2014
Type: Inproceeding
Publisher: Association for Computational Linguistics
Yangfeng Ji, Dilek Hakkani-Tur, Asli Celikyilmaz, Larry Heck, and Gokhan Tur

State-of-the art spoken language understanding models that automatically capture user intents in human to machine dialogs are often trained with a small number of manually annotated examples collected from the application domain. Search query logs provide a large number of unlabeled queries that would be beneficial to improve such supervised classification. Furthermore, the contents of user queries as well as the URLs they click provide information about user’s intent. In this paper, we propose a...

Publication details
Date: 1 May 2014
Type: Inproceeding
Publisher: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
Ali El-Kahky, Derek Liu, Ruhi Sarikaya, Gokhan Tur, Dilek Hakkani-Tur, and Larry Heck

This paper proposes a new technique to enable Natural Language Understanding (NLU) systems to handle user queries beyond their original semantic schemas defined by their intents and slots. Knowledge graph and search query logs are used to extend NLU system’s coverage by transferring intents from other domains to a given domain. The transferred intents as well as existing intents are then applied to a set of new slots that they are not trained with. The knowledge graph and search click logs are used to...

Publication details
Date: 1 May 2014
Type: Inproceeding
Publisher: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
Yann Dauphin, Gokhan Tur, Dilek Hakkani-Tur, and Larry Heck

We propose a novel zero-shot learning method for semantic utterance classification (SUC). It learns a classifier f : X -> Y for problems where none of the semantic categories Y are present in the training set. The framework uncovers the link between categories and utterances through a semantic space. We show that this semantic space can be learned by deep neural networks trained on large amounts of search engine query log data. What’s more, we propose a novel method that can learn discriminative...

Publication details
Date: 1 April 2014
Type: Inproceeding
Publisher: International Conference on Learning Representations (ICLR)
Larry Heck

The past decade has seen the emergence of web-scale structured and linked semantic knowledge graphs (KGs). These KGs provide a scalable “schema for the web,” representing a significant opportunity for the NLP and conversational-interaction (CI) research communities. This lecture describes new research that leverages KGs to bootstrap web-scale CI with no requirement for semantic schema design, no data collection, and no manual annotations. In effect, the method completes a "join" of semantic KGs to...

Publication details
Date: 1 March 2014
Type: Technical report
Number: MSR-TR-2014-70
Lu Wang, Larry Heck, and Dilek Hakkani-Tur

Training statistical dialog models in spoken dialog systems (SDS) requires large amounts of annotated data. The lack of scalable methods for data mining and annotation poses a significant hurdle for state-of-the-art SDS. This paper presents an approach that directly leverage billions of web search and browse sessions to overcome this hurdle. The key insight is that task completion through web search and browse sessions is (a) predictable and (b) generalizes to spoken dialog task completion. The new...

Publication details
Date: 1 January 2014
Type: Inproceeding
Publisher: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
Seyed Omid Sadjadi and Larry Heck

Co-channel speech, which occurs in monaural audio recordings of two or more overlapping talkers, poses a great challenge for automatic speech applications. Automatic speech recognition (ASR) performance, in particular, has been shown to degrade significantly in the presence of a competing talker. In this paper, assuming a known target talker scenario, we present two different masking strategies based on speaker verification to alleviate the impact of the competing talker (a.k.a. masker) interference on...

Publication details
Date: 1 January 2014
Type: Inproceeding
Publisher: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
Michael Gamon, Tae Yano, Xinying Song, Johnson Apacible, and Patrick Pantel

We propose a system that determines the salience of entities within web documents. Many recent advances in commercial search engines leverage the identification of entities in web pages. However, for many pages, only a small subset of entities are central to the document, which can lead to degraded relevance for entity triggered experiences. We address this problem by devising a system that scores each entity on a web page according to its centrality to the page content. We propose salience...

Publication details
Date: 1 November 2013
Type: Inproceeding
Publisher: ACM International Conference on Information and Knowledge Management (CIKM)
Michael Gamon, Tae Yano, Xinying Song, Johnson Apacible, and Patrick Pantel

We propose a system that determines the salience of entities within web documents. Many recent advances in commercial search engines leverage the identification of entities in web pages. However, for many pages, only a small subset of entities are important, or central, to the document, which can lead to degraded relevance for entity triggered experiences. We address this problem by devising a system that scores each entity on a web page according to its centrality to the page content. We propose...

Publication details
Date: 27 October 2013
Type: Technical report
Publisher: Microsoft Technical Report
Number: MSR-TR-2013-73
Joshua L. Moore, Christopher J.C. Burges, Erin Renshaw, and Wen-tau Yih

Animacy detection is a problem whose solution has been shown to be beneficial for a number of syntactic and semantic tasks. We present a state-of-the-art system for this task which uses a number of simple classifiers with heterogeneous data sources in a voting scheme. We show how this framework can give us direct insight into the behavior of the system, allowing us to more easily diagnose sources of error.

Publication details
Date: 1 October 2013
Type: Inproceeding
Publisher: ACL – Association for Computational Linguistics
Kai-Wei Chang, Wen-tau Yih, and Chris Meek

We present Multi-Relational Latent Semantic Analysis (MRLSA) which generalizes Latent Semantic Analysis (LSA). MRLSA provides an elegant approach to combining multiple relations between words by constructing a 3-way tensor. Similar to LSA, a low-rank approximation of the tensor is derived using a tensor decomposition. Each word in the vocabulary is thus represented by a vector in the latent semantic space and each relation is captured by a latent square matrix. The degree of two words having a specific...

Publication details
Date: 1 October 2013
Type: Inproceeding
Publisher: ACL – Association for Computational Linguistics
Publication details
Date: 1 October 2013
Type: Inproceeding
Publisher: Association for Computational Linguistics
Publication details
Date: 1 October 2013
Type: Proceedings
Publisher: ACM International Conference on Information and Knowledge Management (CIKM)
Michel Galley, Chris Quirk, Colin Cherry, and Kristina Toutanova

Minimum Error Rate Training (MERT) remains one of the preferred methods for tuning linear parameters in machine translation systems, yet it faces significant issues. First, MERT is an unregularized learner and is therefore prone to overfitting. Second, it is commonly used on a noisy, non-convex loss function that becomes more difficult to optimize as the number of parameters increases. To address these issues, we study the addition of a regularization term to the MERT objective function. Since standard...

Publication details
Date: 1 October 2013
Type: Inproceeding
Publisher: Association for Computational Linguistics
Seyed Omid Sadjadi, Malcolm Slaney, and Larry Heck

This report serves as a user manual for the tools available in the Microsoft Research (MSR) Identity Toolbox. This toolbox contains a collection of MATLAB tools and routines that can be used for research and development in speaker recognition. It provides researchers with a test bed for developing new front-end and back-end techniques, allowing replicable evaluation of new advancements. It will also help newcomers in the field by lowering the “barrier to entry”, enabling them to quickly build baseline...

Publication details
Date: 1 September 2013
Type: Technical report
Publisher: Microsoft Research Technical Report
Number: MSR-TR-2013-133
Riham Hassan Mansour, Nesma Refaei, Michael Gamon, Khaled Sami, and Ahmed Abdel-Hamid

In this paper we undertake a large cross-domain investigation of sentiment domain adaptation, challenging the practical necessity of sentiment domain adaptation algorithms. We first show that across a wide set of domains, a simple “all-in-one” classifier that utilizes all available training data from all but the target domain tends to outperform published domain adaptation methods. A very simple ensemble classifier also performs well in these scenarios. Combined with the fact that labeled data nowadays...

Publication details
Date: 1 September 2013
Type: Inproceeding
Publisher: ACL/SIGPARSE
Jianfeng Gao, Xiaodong He, Wen-tau Yih, and Li Deng

This paper presents a novel semantic-based phrase translation model. A pair of source and target phrases are projected into continuous-valued vector representations in a low-dimensional latent semantic space, where their translation score is computed by the distance between the pair in this new space. The projection is performed by a multi-layer neural net-work whose weights are learned on parallel training data. The learning is aimed to directly optimize the quality of end-to-end machine translation...

Publication details
Date: 1 September 2013
Type: Technical report
Publisher: Choose...
Number: MSR-TR-2013-88
1–25 of 173
Sort
Show 25 | 50 | 100
1234567Next 
> Our research