Microsoft Research IME Corpus

This download consists of data only: it provides a test data set for the task of Japanese character conversion for text input. The data set consists of: (1) reference files, which consist of Japanese sentences that are randomly extracted from news articles (no more than one sentence has been extracted per news article); (2) reading files, which consist of corresponding kana readings for the sentences in the reference files; (3) n-best files, which contain 100-best conversion candidates for each sentence in the reading files. More detailed information about the corpus is found in the technical report, Microsoft Research IME Corpus, MSR-TR-2005-168.

Download details

File Name MSRIMECorpus.zip
Version 1.0
Date Published 21 December 2005
Download Size 4.29 MB

Note By installing, copying, or otherwise using this software, you agree to be bound by the terms of its license. Read the license.

Share
Share this page on Facebook
Share this page on Twitter
Share this page on LinkedIn
E-mail this page
RSS feeds