Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Microsoft Research IME Corpus

This download consists of data only: it provides a test data set for the task of Japanese character conversion for text input. The data set consists of: (1) reference files, which consist of Japanese sentences that are randomly extracted from news articles (no more than one sentence has been extracted per news article); (2) reading files, which consist of corresponding kana readings for the sentences in the reference files; (3) n-best files, which contain 100-best conversion candidates for each sentence in the reading files. More detailed information about the corpus is found in the technical report, Microsoft Research IME Corpus, MSR-TR-2005-168.

Details

TypeDownload
File NameMSRIMECorpus.zip
Version1.0
Date Published21 December 2005
Download Size4.29 MB

Note By installing, copying, or otherwise using this software, you agree to be bound by the terms of its license. Read the license.