Share this page
  • Share this page on Twitter Share this page on Facebook Share this page on Digg Share this page on Del.icio.us Read the Inside Microsoft Research blog
  • E-mail this page Print this page
  • RSS feeds
Home
Data Set of English-Spanish Term Vectors from Wikipedia

This data set consists of the term vectors extracted from 60,730 Wikipedia English articles and their comparable Spanish articles, sampled in 2009. We used this data set to test various models for creating translingual document representations, work published in [Platt et al. EMNLP-2010] and [Yih et al. CoNLL-2011]. More detail of this data set can be found in the ReadMe file.

Download Details

File Name: EN-ES_Wiki.zip
Version: 1.0.0
Date Published: 8 August 2011
Download Size: 218.44 MB

Note: By installing, copying, or otherwise using this software, you agree to be bound by the terms of its license. Read the license.