The MSR sentence completion challenge is intended to stimulate research in the area of semantic modeling. The challenge set consists of fill-in-the-blank questions similar to those found on the widely used Scholastic Aptitude Test. The sentence completion questions we focus on test the students ability to select words which are meaningful and coherent in the the context of a complete sentence. In general, this determination cannot be made on the basis of grammatical correctness alone.
We have selected a set of 1,040 sentences from five Sherlock Holmes novels by Sir Arthur Conan Doyle. In each sentence, an infrequent word is chosen as the focus of the question. Four alternates to each word were chosen by hand from a list of thirty possibilities suggested by an N-gram language model. Both training and test data are available, and described in a companion technical report.
- Scott Yih, Geoffrey Zweig, and John Platt, Polarity Inducing Latent Semantic Analysis, in Experimental Methods in Natural Language Processing 2012, ACL/SIGPARSE, July 2012.
- Geoffrey Zweig, John C. Platt, Christopher Meek, Christopher J.C. Burges, Ainur Yessenalina, and Qiang Liu, Computational Approaches to Sentence Completion, in ACL 2012, ACL/SIGPARSE, July 2012.
- Geoffrey Zweig and Chris J.C. Burges, A Challenge Set for Advancing Language Modeling, in Workshop on the Future of Language Modeling for HLT, NAACL-HLT 2012, ACL/SIGPARSE, June 2012.
- Geoffrey Zweig and Christopher J.C. Burges, The Microsoft Research Sentence Completion Challenge, no. MSR-TR-2011-129, December 2011.