Microsoft Research IME Corpus

This download consists of data only: it provides a test data set for the task of Japanese character conversion for text input. The data set consists of: (1) reference files, which consist of Japanese sentences that are randomly extracted from news articles (no more than one sentence has been extracted per news article); (2) reading files, which consist of corresponding kana readings for the sentences in the reference files; (3) n-best files, which contain 100-best conversion candidates for each sentence in the reading files. More detailed information about the corpus is found in the technical report, Microsoft Research IME Corpus, MSR-TR-2005-168.

Download Details

File Name: MSRIMECorpus.zip
Version: 1.0
Date Published: 21 December 2005
Download Size: 4.29 MB

Note: By installing, copying, or otherwise using this software, you agree to be bound by the terms of its license. Read the license.

Share
Share this page on Facebook
Share this page on Twitter
Share this page on LinkedIn
E-mail this page
RSS feeds