Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Our research
Content type
+
Downloads (449)
+
Events (427)
 
Groups (147)
+
News (2667)
 
People (740)
 
Projects (1080)
+
Publications (12268)
+
Videos (5516)
Labs
Research areas
Algorithms and theory47205 (307)
Communication and collaboration47188 (203)
Computational linguistics47189 (205)
Computational sciences47190 (200)
Computer systems and networking47191 (723)
Computer vision208594 (892)
Data mining and data management208595 (89)
Economics and computation47192 (98)
Education47193 (79)
Gaming47194 (73)
Graphics and multimedia47195 (223)
Hardware and devices47196 (201)
Health and well-being47197 (85)
Human-computer interaction47198 (823)
Machine learning and intelligence47200 (827)
Mobile computing208596 (44)
Quantum computing208597 (22)
Search, information retrieval, and knowledge management47199 (650)
Security and privacy47202 (285)
Social media208598 (36)
Social sciences47203 (249)
Software development, programming principles, tools, and languages47204 (587)
Speech recognition, synthesis, and dialog systems208599 (104)
Technology for emerging markets208600 (28)
1–25 of 205
Sort
Show 25 | 50 | 100
1234567Next 
Alessandro Sordoni, Michel Galley, Michael Auli, Chris Brockett, Yangfeng Ji, Meg Mitchell, Jian-Yun Nie, and Bill Dolan
Publication details
Date: 1 June 2015
Type: Inproceeding
Publisher: Conference of the North American Chapter of the Association for Computational Linguistics – Human Language Technologies (NAACL-HLT 2015)
Jialu Liu, Jingbo Shang, Chi Wang, Xiang Ren, and Jiawei Han

Text data are ubiquitous and play an essential role in big data applications. However, text data are mostly unstructured. Transforming unstructured text into structured units (e.g., semantically meaningful phrases) will substantially reduce semantic ambiguity and enhance the power and efficiency at manipulating such data using database technology. Thus mining quality phrases is a critical research problem in the field of databases. In this paper, we propose a new framework that extracts quality...

Publication details
Date: 1 June 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
Hao Fang, Saurabh Gupta, Forrest Iandola, Rupesh Srivastava, Li Deng, Piotr Dollar, Jianfeng Gao, Xiaodong He, Margaret Mitchell, John Platt, Lawrence Zitnick, and Geoffrey Zweig

This paper presents a novel approach for automatically generating image descriptions: visual detectors and language models learn directly from a dataset of image captions.We use Multiple Instance Learning to train visual detectors for words that commonly occur in captions, including many different parts of speech such as nouns, verbs, and adjectives. The word detector outputs serve as conditional inputs to a maximum-entropy language model. The language model learns from a set of over 400,000 image...

Publication details
Date: 1 June 2015
Type: Article
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Lucy Vanderwende, Arul Menezes, and Chris Quirk

In this demonstration, we will present our online parser that allows users to submit any sentence and obtain an analysis following the specification of AMR (Banarescu et al., 2014) to a large extent. This AMR analysis is generated by a small set of rules that convert a native Logical Form analysis provided by a pre-existing parser (see Vanderwende, 2015) into the AMR format. While we demonstrate the performance of our AMR parser on data sets annotated by the LDC, we will focus attention in the demo on...

Publication details
Date: 1 June 2015
Type: Inproceeding
Publisher: Proceedings of NAACL 2015
Publication details
Date: 1 May 2015
Type: Article
Publisher: NAACL
刘树杰, 董力, 张家俊, 韦福如, 李沐, and 周明
Publication details
Date: 1 April 2015
Type: Article
Lucy Vanderwende

In this techreport, we provide an introduction to the NLPwin system, a NLP system under development at Microsoft Research. We describe the development methodology, the linguistic representations captured by NLPwin, and we also discuss some of the design decisions that were made in the NLPwin project. A full bibliography is included that covers the papers written about NLPwin as well as the papers written that make use of the NLPwin system output.

Publication details
Date: 1 March 2015
Type: Technical report
Publisher: Microsoft Research
Number: MSR-TR-2015-23
Grégoire Mesnil, Yann Dauphin, Kaisheng Yao, Yoshua Bengio, Li Deng, Dilek Hakkani-Tur, Xiaodong He, Larry Heck, Gokhan Tur, Dong Yu, and Geoffrey Zweig

Semantic slot filling is one of the most challenging problems in spoken language understanding (SLU). In this paper, we propose to use recurrent neural networks (RNNs) for this task, and present several novel architectures designed to efficiently model past and future temporal dependencies. Specifically, we implemented and compared several important RNN architectures, including Elman, Jordan, and hybrid variants. To facilitate reproducibility, we implemented these networks with the publicly available...

Publication details
Date: 1 March 2015
Type: Article
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Hamid Palangi, Li Deng, Yelong Shen, Jianfeng Gao, Xiaodong He, Jianshu Chen, Xinying Song, and Rabab Ward

This paper develops a model that addresses sentence embedding using recurrent neural networks (RNN) with Long Short Term Memory (LSTM) cells. The proposed LSTM-RNN model sequentially takes each word in a sentence, extracts its information, and embeds it into a semantic vector. Due to its ability to capture long term memory, the LSTM-RNN accumulates increasingly richer information as it goes through the sentence, and when it reaches the last word, the hidden layer of the network provides a semantic...

Publication details
Date: 1 February 2015
Type: Article
Publisher: arXiv
Spandana Gella, Kalika Bali, and Monojit Choudhury

Language identification is a necessary prerequisite for processing any user generated text, where the language is unknown. It becomes even more challenging when the text is code-mixed, i.e., two or more languages are used within the same text. Such data is commonly seen in social media, where further challenges might arise due to contractions and transliterations. The existing language identification systems are not designed to deal with codemixed text, and as our experiments show, perform poorly on a...

Publication details
Date: 1 December 2014
Type: Inproceeding
Publisher: NLPAI
Xiaohu Liu and Ruhi Sarikaya

Spoken language understanding (SLU) systems use various features to detect the domain, intent and semantic slots of a query. In addition to n-grams, features generated from entity dictionaries are often used in model training. Clean or properly weighted dictionaries are critical to improve model’s coverage and accuracy for unseen entities during test time. However, clean dictionaries are hard to obtain for some applications since they are automatically generated and can potentially contain millions of...

Publication details
Date: 1 December 2014
Type: Proceedings
Publisher: IEEE – Institute of Electrical and Electronics Engineers
Yelong Shen, Xiaodong He, Jianfeng Gao, Li Deng, and Gregoire Mesnil

In this paper, we propose a new latent semantic model that incorporates a convolutional-pooling structure over word sequences to learn low-dimensional, semantic vector representations for search queries and Web documents. In order to capture the rich contextual structures in a query or a document, we start with each word within a temporal context window in a word sequence to directly capture contextual features at the word n-gram level. Next, the salient word n-gram features in the word sequence are...

Publication details
Date: 1 November 2014
Type: Inproceeding
Publisher: CIKM
Sauleh Eetemadi and Kristina Toutanova

Distinct properties of translated text have been the subject of research in linguistics for many year (Baker, 1993). In recent years computational methods have been developed to empirically verify the linguistic theories about translated text (Baroni and Bernardini, 2006). While many characteristics of translated text are more apparent in comparison to the original text, most of the prior research has focused on monolingual features of translated and original text. The contribution of this work is...

Publication details
Date: 1 November 2014
Type: Inproceeding
Publisher: Association for Computational Linguistics
Jiajun Zhang, Shujie Liu, Mu Li, Ming Zhou, and Chengqing Zong

Measuring the quality of the translation rules and their composition is an essential issue in the conventional statistical machine translation (SMT) framework. To express the translation quality, the previous lexical and phrasal probabilities are calculated only according to the co-occurrence statistics in the bilingual corpus, and may be not reliable due to the data sparseness problem. To address this issue, we propose to measure the quality of the translation rules and their composition in the...

Publication details
Date: 1 November 2014
Type: Article
Publisher: ACM – Association for Computing Machinery
Publication details
Date: 1 November 2014
Type: Inproceeding
Publisher: CIKM
Jianfeng Gao, Patrick Pantel, Michael Gamon, Xiaodong He, Li Deng, and Yelong Shen

This paper presents a deep semantic model (DSM) for recommending target documents to be of interest to a user based on a source document she is reading. We observe, identify, and detect naturally occurring signals of interestingness in click transitions on the Web between source and target documents, which we collect from commercial Web browser logs. The DSM is trained on millions of Web transitions, and maps source-target document pairs to feature vectors in a latent space in such a...

Publication details
Date: 1 October 2014
Type: Proceedings
Publisher: EMNLP
Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen

We examine the embedding approach to reason new relational facts from a large-scale knowledge graph and a text corpus. We propose a novel method of jointly embedding entities and words into the same continuous vector space. The embedding process attempts to preserve the relations between entities in the knowledge graph and the concurrences of words in the text corpus. Entity names and Wikipedia anchors are utilized to align the embeddings of entities and words in the same space. Large scale experiments...

Publication details
Date: 1 October 2014
Type: Inproceeding
Publisher: ACL – Association for Computational Linguistics
Kai-Wei Chang, Wen-tau Yih, Bishan Yang, and Christopher Meek

While relation extraction has traditionally been viewed as a task relying solely on textual data, recent work has shown that by taking as input existing facts in the form of entity-relation triples from both knowledge bases and textual data, the performance of relation extraction can be improved significantly. Following this new paradigm, we propose a tensor decomposition approach for knowledge base embedding that is highly scalable, and is especially suitable for relation extraction. By leveraging...

Publication details
Date: 1 October 2014
Type: Inproceeding
Publisher: ACL – Association for Computational Linguistics
Yogarshi Vyas, Spandana Gella, Jatin Sharma, Kalika Bali, and Monojit Choudhury

Code-mixing is frequently observed in user generated content on social media, especially from multilingual users. The linguistic complexity of such content is compounded by presence of spelling variations, transliteration and non-adherence to formal grammar. We describe our initial efforts to create a multi-level annotated corpus of Hindi-English code-mixed text collated from Facebook forums, and explore language identification, back-transliteration, normalization and POS tagging of this data. Our...

Publication details
Date: 1 October 2014
Type: Inproceeding
Publisher: Association for Computational Linguistics
Gokul Chittaranjan, Yogarshi Vyas, Kalika Bali, and Monojit Choudhury

We describe a CRF based system for word-level language identification of code-mixed text. Our method uses lexical, contextual, character n-gram, and special character features, and therefore, can easily be replicated across languages. Its performance is benchmarked against the test sets provided by the shared task on code-mixing (Solorio et al., 2014) for four language pairs, namely, English-Spanish (En-Es), English-Nepali (En-Ne),English-Mandarin (En-Cn), and Standard Arabic-Arabic (Ar-Ar) Dialects....

Publication details
Date: 1 October 2014
Type: Inproceeding
Publisher: Association for Computational Linguistics
Kalika Bali, Jatin Sharma, Monojit Choudhury, and Yogarshi Vyas

Code-Mixing is a frequently observed phenomenon in social media content generated by multi-lingual users. The processing of such data for linguistic analysis as well as computational modelling is challenging due to the linguistic complexity resulting from the nature of the mixing as well as the presence of non-standard variations in spellings and grammar, and transliteration. Our analysis shows the extent of Code-Mixing in English-Hindi data. The classification of Code-Mixed words based on frequency and...

Publication details
Date: 1 October 2014
Type: Inproceeding
Publisher: Association for Computational Linguistics
Zhenghao Wang, Shengquan Yan, Huaming Wang, and Xuedong Huang

Question answering (QA) over an existing knowledge base (KB) such as Microsoft Satori or open Freebase is one of the most important natural language processing applications. There are approaches based on web-search motivated statistic techniques as well as linguistically oriented knowledge engineering. Both methods face the key challenge on how to handle diverse ways of naturally expressing predicates and entities existing in the KB. The domain independent web information extracted from the massive...

Publication details
Date: 3 September 2014
Type: Technical report
Publisher: Microsoft Research
Number: MSR-TR-2014-121
Hany Hassan, Lee Schwartz, Dilek Hakkani-Tur, and Gokhan Tur

In this paper we focus on the effect of on-line speech segmentation and disfluency removal methods on conversational speech translation. In a real-time conversational speech to speech translation system, on-line segmentation of speech is required to avoid latency beyond few seconds. While sentential unit segmentation and disfluency removal have been heavily studied mainly for off-line speech processing, to the best of our knowledge, the combined effect of these tasks on conversational speech translation...

Publication details
Date: 1 September 2014
Type: Inproceeding
Publisher: ISCA - International Speech Communication Association
Rishiraj Saha Roy, Rahul Katare, Niloy Ganguly, and Monojit Choudhury

Natural languages (NL) can be classified as prepositional or postpositional based on the order of the noun phrase and the adposition. Categorizing a language by its adposition typology helps in addressing several challenges in linguistics and natural language processing (NLP). Understanding the adposition typologies for less-studied languages by manual analysis of large text corpora can be quite expensive, yet automatic discovery of the same has received very little attention till date. This research...

Publication details
Date: 1 August 2014
Type: Inproceeding
Publisher: Coling 2014
Publication details
Date: 1 August 2014
Type: Technical report
Publisher: Microsoft Research
Number: MSR-TR-2014-109
1–25 of 205
Sort
Show 25 | 50 | 100
1234567Next 
> Our research