This site contains further information about our work on
computer-assisted grading at Microsoft Research. We invite you to read the papers below,
and then if interested download the corpus we developed for use in your
Videos and Demos
The 5-minute video below gives an overview of our power-assisted
grading research approach (Paper 1) as well as a demonstration of our
user interface, including results from a study of teachers using the
system (Paper 2) . For further details please read the papers
Powergrading-1.0 Short Answer Grading Corpus
The Powergrading-1.0 short answer corpus
contains the original data analyzed in our TACL
paper. It consists of responses from 100+698 Mechanical Turk
workers to each of 20 questions from the
100 questions published by the United States Citizenship and Immigration
Services as preparation for the citizenship test.
Please be aware that this data
contains raw, unfiltered answers from Mechanical Turk Workers; as such
some may contain profanity or be otherwise offensive.
The corpus is in several files, each in tab-separated format (TSV),
where the first row contains the column headings. The files are as
- questions_answer_key.tsv: question text and answer key entries for each question.
The first column is the question number, the second is the question,
and the remainder of the columns in each row contain entries from
the answer key for that question.
- studentanswers_grades_698.tsv: main set of 698
Turker responses to all 20 questions. Questions 1-8,13, and 20 were
used in the experiments in the paper. The first column is the
student ID (anonymized from the Turker ID), the second is the
question number, the third is the student's answer, but with
additional columns 4-6 with hand labels of correctness for each
answers from graders G1, G2, and G3. A grade of 1 means correct and
0 means incorrect. A grade of -1 means a grade was not available,
which is the case for questions 9-12 and 14-19.
- studentanswers_grades_100.tsv: pilot set of 100
Turker responses to all 20 questions. Same format as
studentanswers_grades_698.tsv above, though no manual grades are
available for this set and as such all grade entries are -1.
- answer_groupings.tsv: grouping annotations for
answers judged to be semantically similar used for training the
similarity metric. All answers come from the pilot set of 100
(studentanswers_grades_100.tsv). The first column is the question
number, the second column is the label for the group, and the
remainder of the columns contain the text of the answers judged to
be in that group (one answer per column). The groups with label
"None" contain all the answers for that question not contained in
any other group. Note that only groupings for questions 9-12 and
14-19 are available, since the other questions were used for the
main experiments - even though the pilot set is separate from the
full set of 698, training a similarity metric on answers to the same
questions as used in the experiments would represent an unrealistic
advantage not available in a practical setting.
Corpus Download and License
corpus and associated licensing information
are available at
this link. If you do end up using the corpus in your research, please
refer to it as the "Powergrading-1.0" corpus
and cite the paper above, as we may have later
versions with more data.
Contributors and Contact Information
If you're interested in finding out more, please feel free to contact
(sumitb at microsoft
(cjacobs at microsoft
dot com), and
microsoft dot com) are all with Microsoft Research.
(mjbrooks at uw dot edu) was an intern at Microsoft Research from the
University of Washington, working with us in 2013.