Our research
Content type
+
Downloads (437)
+
Events (393)
 
Groups (149)
+
News (2569)
 
People (819)
 
Projects (1050)
+
Publications (11875)
+
Videos (5146)
Labs
Research areas
Algorithms and theory47205 (4)
Communication and collaboration47188 (6)
Computational linguistics47189 (13)
Computational sciences47190 (24)
Computer systems and networking47191 (24)
Computer vision208594 (0)
Data mining and data management208595 (0)
Economics and computation47192 (0)
Education47193 (2)
Gaming47194 (1)
Graphics and multimedia47195 (17)
Hardware and devices47196 (8)
Health and well-being47197 (14)
Human-computer interaction47198 (22)
Machine learning and intelligence47200 (15)
Mobile computing208596 (0)
Quantum computing208597 (0)
Search, information retrieval, and knowledge management47199 (23)
Security and privacy47202 (13)
Social media208598 (0)
Social sciences47203 (3)
Software development, programming principles, tools, and languages47204 (26)
Speech recognition, synthesis, and dialog systems208599 (0)
Technology for emerging markets208600 (0)
1–13 of 13
Sort
Show 25 | 50 | 100
1
We introduce a new corpus of descriptions of Xbox avatars created by actual gamers. Each avatar is specified by 19 attributes, including clothing and body type, allowing for more than 10^20 possibilities. Using Amazon Mechanical Turk, we collected literal and sentimental descriptions of complete avatars and many of their component parts. In all, there are over 100K descriptions, including relative and multilingual descriptions, which will support different tasks, such as learning to automatically describe...
Details
Date: 25 June 2013
Version: 1.0
Size: 3.03 MB
Type: Download
The Query Representation and Understanding (QRU) data set contains a set of similar queries that can be used in web research such as query transformation and relevance ranking. QRU contains similar queries that are related to existing benchmark data sets, such as TREC query sets. The QRU data set was created by extracting 100 TREC queries, training a query-generation model and a commercial search engine, generating similar queries from TREC queries with the model, and removal of mistakenly generated...
Details
Date: 9 August 2011
Version: 1.0
Size: 0.01 MB
Type: Download
This data set is used to test various models for creating translingual document representations. We sampled 60,730 English Wikipedia articles and their Spanish counterparts and transformed each of them to 20,000-dimensional sparse term vectors. The data set will not contain the original articles, just the term vectors and the vocabulary file.
Details
Date: 8 August 2011
Version: 1.0.0
Size: 218.44 MB
Type: Download
This download is provided for the purpose of the Speller Challenge. This is a development dataset based on the publicly available TREC queries (2008 Million Query Track). Queries are annotated by using the same guidelines and processes as in the creation of the Bing Test Dataset.
Details
Date: 14 January 2011
Version: 1.0
Size: 0.32 MB
Type: Download
This data consists of about 120K sentences collected during the summer of 2010. Workers on Mechanical Turk were paid to watch a short video snippet and then summarize the action in a single sentence. The result is a set of roughly parallel descriptions of more than 2,000 video snippets. Because the workers were urged to complete the task in the language of their choice, both paraphrase and bilingual alternations are captured in the data. We expect this data to be useful for training and testing translation...
Details
Date: 12 November 2010
Version: 1.0
Size: 2.54 MB
Type: Download
This data takes the output of eight translation systems Microsoft Research combined for the 2008 National Institute of Standards and Technology Open Machine Translation Evaluation and completely masks it so you can determine only how each system re-ordered the source sentence and when the systems disagreed in word choice. This will enable researchers to explore approaches to combine or rank re-ordering decisions from machine-translation systems.
Details
Date: 28 March 2008
Version: 1.0
Size: 3.11 MB
Type: Download
Data sets for comparative study of parameter-estimation methods for statistical natural-language processing.
Details
Date: 2 June 2007
Version: 1.0
Size: 45.08 MB
Type: Download
This archive contains phrase tables generated by aligning the two paraphrase data sets described in Quirk, Brockett & Dolan (2004) and Dolan, Quirk & Brockett (2004). The alignments are bidirectional, created using the method described in Och & Ney (2000) and are as much as seven grams in length.
Details
Date: 10 October 2006
Version: 1.0
Size: 698.27 MB
Type: Download
The ESL_123_MASS_NOUN dataset is a set of 123 sentences, found on the World Wide Web, that apparently were written by native speakers of languages spoken in China. Each sentence contains an example of at least one of a set of 14 mass nouns. Not all sentences contain errors specifically relating to these words. These data are being made available to the research community for non-commercial, scholarly purposes only and must not be used for any other purpose. Each sentence is accompanied by the targeted...
Details
Date: 18 July 2006
Version: 1.0
Size: 0.20 MB
Type: Download
This download consists of data only: it provides a test data set for the task of Japanese character conversion for text input. The data set consists of: (1) reference files, which consist of Japanese sentences that are randomly extracted from news articles (no more than one sentence has been extracted per news article); (2) reading files, which consist of corresponding kana readings for the sentences in the reference files; (3) n-best files, which contain 100-best conversion candidates for each sentence in...
Details
Date: 21 December 2005
Version: 1.0
Size: 4.29 MB
Type: Download
This download consists of data only: a text file containing 5800 pairs of sentences which have been extracted from news sources on the web, along with human annotations indicating whether each pair captures a paraphrase/semantic equivalence relationship. No more than 1 sentence has been extracted from any given news article. We have made a concerted effort to correctly associate with each sentence information about its provenance and any associated information about its author. If any attribution...
Details
Date: 3 March 2005
Version: 1.0
Size: 1.30 MB
Type: Download
When people translate documents from one language to another, not all sentences are translated one-for-one. This Perl code implements an algorithm for finding which sentences do translate one-for-one in a parallel bilingual corpus.
Details
Date: 14 May 2003
Version: 1.0
Size: 0.02 MB
Type: Download
Prolog implementations of two versions of the unfication grammar sentence realization aglorithm described in "A Complete, Efficient Sentence Realization Algorithm for Unification Grammar," appearing in the Proceedings of the International Natural Language Generation Conference, INLG'02, plus other associated code and data files.
Details
Date: 6 May 2003
Version: 1.0
Size: 0.03 MB
Type: Download
1–13 of 13
Sort
Show 25 | 50 | 100
1
> Our research