Our research
Content type
+
Downloads (441)
+
Events (396)
 
Groups (150)
+
News (2593)
 
People (804)
 
Projects (1066)
+
Publications (12006)
+
Videos (5240)
Labs
Research areas
Algorithms and theory47205 (268)
Communication and collaboration47188 (187)
Computational linguistics47189 (186)
Computational sciences47190 (197)
Computer systems and networking47191 (680)
Computer vision208594 (47)
Data mining and data management208595 (64)
Economics and computation47192 (95)
Education47193 (79)
Gaming47194 (69)
Graphics and multimedia47195 (199)
Hardware and devices47196 (196)
Health and well-being47197 (77)
Human-computer interaction47198 (779)
Machine learning and intelligence47200 (722)
Mobile computing208596 (33)
Quantum computing208597 (19)
Search, information retrieval, and knowledge management47199 (618)
Security and privacy47202 (268)
Social media208598 (21)
Social sciences47203 (240)
Software development, programming principles, tools, and languages47204 (556)
Speech recognition, synthesis, and dialog systems208599 (73)
Technology for emerging markets208600 (25)
1–25 of 186
Sort
Show 25 | 50 | 100
1234567Next 
Qi Li, Gokhan Tur, Dilek Hakkani-Tur, Xiang Li, Tim Paek, Asela Gunawardana, and Chris Quirk

Traditional spoken dialog systems are usually based on centralized architecture, in which the number of domains is predefined, and the provider is fixed for a given domain and intent. The spoken language understanding (SLU) component is responsible for detecting domain and intents, and filling domain-specific slots. It is expensive and time-consuming for this architecture to add new and/or competing domains, intents, or providers. The rapid growth of service providers in mobile computing market calls...

Publication details
Date: 1 December 2014
Type: Inproceeding
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Jiajun Zhang, Shujie Liu, Mu Li, Ming Zhou, and Chengqing Zong

Measuring the quality of the translation rules and their composition is an essential issue in the conventional statistical machine translation (SMT) framework. To express the translation quality, the previous lexical and phrasal probabilities are calculated only according to the co-occurrence statistics in the bilingual corpus, and may be not reliable due to the data sparseness problem. To address this issue, we propose to measure the quality of the translation rules and their composition in the...

Publication details
Date: 1 November 2014
Type: Article
Publisher: ACM – Association for Computing Machinery
Yelong Shen, Xiaodong He, Jianfeng Gao, Li Deng, and Gregoire Mesnil

In this paper, we propose a new latent semantic model that incorporates a convolutional-pooling structure over word sequences to learn low-dimensional, semantic vector representations for search queries and Web documents. In order to capture the rich contextual structures in a query or a document, we start with each word within a temporal context window in a word sequence to directly capture contextual features at the word n-gram level. Next, the salient word n-gram features in the word sequence are...

Publication details
Date: 1 November 2014
Type: Inproceeding
Publisher: CIKM
Jianfeng Gao, Patrick Pantel, Michael Gamon, Xiaodong He, Li Deng, and Yelong Shen

This paper presents a deep semantic model (DSM) for recommending target documents to be of interest to a user based on a source document she is reading. We observe, identify, and detect naturally occurring signals of interestingness in click transitions on the Web between source and target documents, which we collect from commercial Web browser logs. The DSM is trained on millions of Web transitions, and maps source-target document pairs to feature vectors in a latent space in such a...

Publication details
Date: 1 October 2014
Type: Proceedings
Publisher: EMNLP
Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen

We examine the embedding approach to reason new relational facts from a large-scale knowledge graph and a text corpus. We propose a novel method of jointly embedding entities and words into the same continuous vector space. The embedding process attempts to preserve the relations between entities in the knowledge graph and the concurrences of words in the text corpus. Entity names and Wikipedia anchors are utilized to align the embeddings of entities and words in the same space. Large scale experiments...

Publication details
Date: 1 October 2014
Type: Inproceeding
Publisher: ACL – Association for Computational Linguistics
Kai-Wei Chang, Wen-tau Yih, Bishan Yang, and Christopher Meek

While relation extraction has traditionally been viewed as a task relying solely on textual data, recent work has shown that by taking as input existing facts in the form of entity-relation triples from both knowledge bases and textual data, the performance of relation extraction can be improved significantly. Following this new paradigm, we propose a tensor decomposition approach for knowledge base embedding that is highly scalable, and is especially suitable for relation extraction. By leveraging...

Publication details
Date: 1 October 2014
Type: Inproceeding
Publisher: ACL – Association for Computational Linguistics
Zhenghao Wang, Shengquan Yan, Huaming Wang, and Xuedong Huang

Question answering (QA) over an existing knowledge base (KB) such as Microsoft Satori or open Freebase is one of the most important natural language processing applications. There are approaches based on web-search motivated statistic techniques as well as linguistically oriented knowledge engineering. Both methods face the key challenge on how to handle diverse ways of naturally expressing predicates and entities existing in the KB. The domain independent web information extracted from the massive...

Publication details
Date: 3 September 2014
Type: Technical report
Publisher: Microsoft Research
Number: MSR-TR-2014-121
Hany Hassan, Lee Schwartz, Dilek Hakkani-Tur, and Gokhan Tur

In this paper we focus on the effect of on-line speech segmentation and disfluency removal methods on conversational speech translation. In a real-time conversational speech to speech translation system, on-line segmentation of speech is required to avoid latency beyond few seconds. While sentential unit segmentation and disfluency removal have been heavily studied mainly for off-line speech processing, to the best of our knowledge, the combined effect of these tasks on conversational speech translation...

Publication details
Date: 1 September 2014
Type: Inproceeding
Publisher: ISCA - International Speech Communication Association
Publication details
Date: 1 August 2014
Type: Technical report
Publisher: Microsoft Research
Number: MSR-TR-2014-109
A Kumaran, Melissa Dunsmore, and Shaishav Kumar

We propose the use of a game with a purpose (GWAP) to facilitate crowd-sourcing of phrase-equivalents, as an alternative to expert or paid crowd-sourcing. Doodling is an online multi-player game, in which one player (drawer), draws pictures on a shared board to get the other players (guessers) to guess the meaning behind an assigned phrase. In this paper we describe the system and results from several experiments intended to improve the quality of information generated by the play. In...

Publication details
Date: 1 August 2014
Type: Inproceeding
Publisher: ACL – Association for Computational Linguistics
Rishiraj Saha Roy, Rahul Katare, Niloy Ganguly, and Monojit Choudhury

Natural languages (NL) can be classified as prepositional or postpositional based on the order of the noun phrase and the adposition. Categorizing a language by its adposition typology helps in addressing several challenges in linguistics and natural language processing (NLP). Understanding the adposition typologies for less-studied languages by manual analysis of large text corpora can be quite expensive, yet automatic discovery of the same has received very little attention till date. This research...

Publication details
Date: 1 August 2014
Type: Inproceeding
Publisher: Coling 2014
Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen

We deal with embedding a large scale knowledge graph composed of entities and relations into a continuous vector space. TransE is a promising method proposed recently, which is very efficient while achieving state-of-the-art predictive performance. We discuss some mapping properties of relations which should be considered in embedding, such as reflexive, one-to-many, many-to-one, and many-to-many. We note that TransE does not do well in dealing with these properties. Some complex models are capable of...

Publication details
Date: 1 July 2014
Type: Inproceeding
Publisher: AAAI - Association for the Advancement of Artificial Intelligence
Jiajun Zhang, Shujie Liu, Mu Li, Ming Zhou, and Chengqing Zong
Publication details
Date: 1 July 2014
Type: Proceedings
Publisher: AAAI - Association for the Advancement of Artificial Intelligence
Rishiraj Saha Roy, Yogarshi Vyas, Niloy Ganguly, and Monojit Choudhury

We present a generic method for augmenting unsupervised query segmentation by incorporating Parts-of-Speech (POS) sequence information to detect meaningful but rare n-grams. Our initial experiments with an existing English POS tagger employing two different POS tagsets and an unsupervised POS induction technique specifically adapted for queries show that POS information can significantly improve query segmentation performance in all these cases.

Publication details
Date: 1 July 2014
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Patrick Pantel, Michael Gamon, and Ariel Fuxman

Natural touch interfaces, common now in devices such as tablets and smartphones, make it cumbersome for users to select text. There is a need for a new text selection paradigm that goes beyond the high acuity selection-by-mouse that we have relied on for decades. In this paper, we introduce such a paradigm, called Smart Selection, which aims to recover a user’s intended text selection from her touch input. We model the problem using an ensemble learning approach, which leverages multiple linguistic...

Publication details
Date: 6 June 2014
Type: Inproceeding
Publisher: ACL – Association for Computational Linguistics
Jianfeng Gao, Xiaodong He, Wen-tau Yih, and Li Deng

This paper tackles the sparsity problem in estimating phrase translation probabilities by learning continuous phrase representations, whose distributed nature enables the sharing of related phrases in their repre-sentations. A pair of source and target phrases are projected into continuous-valued vector representations in a low-dimensional latent space, where their translation score is computed by the distance between the pair in this new space. The projection is performed by a neural network whose...

Publication details
Date: 1 June 2014
Type: Inproceeding
Publisher: Association for Computational Linguistics
Wen-tau Yih, Xiaodong He, and Christopher Meek

We develop a semantic parsing framework based on semantic similarity for open domain question answering (QA). We focus on single-relation questions and decompose each question into an entity mention and a relation pattern. Using convolutional neural network models, we measure the similarity of entity mentions with entities in the knowledge base (KB) and the similarity of relation patterns and relations in the KB. We score relational triples in the KB using these measures and select the top scoring...

Publication details
Date: 1 June 2014
Type: Inproceeding
Publisher: Association for Computational Linguistics
Yangfeng Ji, Dilek Hakkani-Tur, Asli Celikyilmaz, Larry Heck, and Gokhan Tur

State-of-the art spoken language understanding models that automatically capture user intents in human to machine dialogs are often trained with a small number of manually annotated examples collected from the application domain. Search query logs provide a large number of unlabeled queries that would be beneficial to improve such supervised classification. Furthermore, the contents of user queries as well as the URLs they click provide information about user’s intent. In this paper, we propose a...

Publication details
Date: 1 May 2014
Type: Inproceeding
Publisher: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
Ali El-Kahky, Derek Liu, Ruhi Sarikaya, Gokhan Tur, Dilek Hakkani-Tur, and Larry Heck

This paper proposes a new technique to enable Natural Language Understanding (NLU) systems to handle user queries beyond their original semantic schemas defined by their intents and slots. Knowledge graph and search query logs are used to extend NLU system’s coverage by transferring intents from other domains to a given domain. The transferred intents as well as existing intents are then applied to a set of new slots that they are not trained with. The knowledge graph and search click logs are used to...

Publication details
Date: 1 May 2014
Type: Inproceeding
Publisher: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
Rishiraj Saha Roy, M. Dastagiri Reddy, Niloy Ganguly, and Monojit Choudhury

Web search queries have been observed to exhibit properties of a rudimentary language system, distinct from the mother language from which words of the queries are drawn. It has been hypothesized that the language of search queries is fast growing in complexity, reflected in the steady increase of query lengths over the years. In this research, we make the first attempts to quantify change in the linguistic structure of search queries by examining large query logs spaced four years apart. We adopt a...

Publication details
Date: 1 April 2014
Type: Inproceeding
Publisher: EVOLANG
Yann Dauphin, Gokhan Tur, Dilek Hakkani-Tur, and Larry Heck

We propose a novel zero-shot learning method for semantic utterance classification (SUC). It learns a classifier f : X -> Y for problems where none of the semantic categories Y are present in the training set. The framework uncovers the link between categories and utterances through a semantic space. We show that this semantic space can be learned by deep neural networks trained on large amounts of search engine query log data. What’s more, we propose a novel method that can learn discriminative...

Publication details
Date: 1 April 2014
Type: Inproceeding
Publisher: International Conference on Learning Representations (ICLR)
Larry Heck

The past decade has seen the emergence of web-scale structured and linked semantic knowledge graphs (KGs). These KGs provide a scalable “schema for the web,” representing a significant opportunity for the NLP and conversational-interaction (CI) research communities. This lecture describes new research that leverages KGs to bootstrap web-scale CI with no requirement for semantic schema design, no data collection, and no manual annotations. In effect, the method completes a "join" of semantic KGs to...

Publication details
Date: 1 March 2014
Type: Technical report
Number: MSR-TR-2014-70
Lu Wang, Larry Heck, and Dilek Hakkani-Tur

Training statistical dialog models in spoken dialog systems (SDS) requires large amounts of annotated data. The lack of scalable methods for data mining and annotation poses a significant hurdle for state-of-the-art SDS. This paper presents an approach that directly leverage billions of web search and browse sessions to overcome this hurdle. The key insight is that task completion through web search and browse sessions is (a) predictable and (b) generalizes to spoken dialog task completion. The new...

Publication details
Date: 1 January 2014
Type: Inproceeding
Publisher: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
Seyed Omid Sadjadi and Larry Heck

Co-channel speech, which occurs in monaural audio recordings of two or more overlapping talkers, poses a great challenge for automatic speech applications. Automatic speech recognition (ASR) performance, in particular, has been shown to degrade significantly in the presence of a competing talker. In this paper, assuming a known target talker scenario, we present two different masking strategies based on speaker verification to alleviate the impact of the competing talker (a.k.a. masker) interference on...

Publication details
Date: 1 January 2014
Type: Inproceeding
Publisher: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
Michael Gamon, Tae Yano, Xinying Song, Johnson Apacible, and Patrick Pantel

We propose a system that determines the salience of entities within web documents. Many recent advances in commercial search engines leverage the identification of entities in web pages. However, for many pages, only a small subset of entities are central to the document, which can lead to degraded relevance for entity triggered experiences. We address this problem by devising a system that scores each entity on a web page according to its centrality to the page content. We propose salience...

Publication details
Date: 1 November 2013
Type: Inproceeding
Publisher: ACM International Conference on Information and Knowledge Management (CIKM)
1–25 of 186
Sort
Show 25 | 50 | 100
1234567Next 
> Our research