The HLT 2007 Definition Scoring Gold Standard consists of 665 judged responses over 75 target words from 3 human judges. The goal of creating this dataset was to provide a gold standard for judging whether a short snippet of text was a correct match for a word definition. The labels from the human coders have the following meanings:
  • 0: completely incorrect match with true definition.
  • 1: Some partial aspect is correct.
  • 2: One major aspect, or more than one minor aspect, is correct.
  • 3: Covers all aspects of the true definition correctly.
  • The data file is a text file in comma-separated (csv) format, and is encoded as UTF-8.

    The fields have the following meanings.

  • target: the word being defined
  • target-syn-lf1: a low-frequency synonym for the target
  • target-syn-hf1: a high-frequency synonym for the target
  • target-def: the text of the correct definition
  • subj-num: the unique ID of the experiment subject providing the response
  • Corrected-Coder-00: Labels from human coder 00
  • Corrected-Coder-8: Labels from human coder 8
  • Corrected-Coder-44: Labels from human coder 44
  • If you use this dataset, please reference it with the following citation: K. Collins-Thompson and J. Callan. "Automatic and human scoring of word definition responses." Proceedings of the NAACL-HLT 2007 Conference. Rochester, U.S.A. pp. 476-483.

    Further details on the dataset and some baseline comparison results are available in the above paper.

    The responses in this dataset were kindly provided by Charles Perfetti and D.J. Bolger of the University of Pittsburgh Psychology Department. The coding was performed by the University of Pittsburgh Qualitative Data Analysis Program (QDAP), supervised by Stuart Shulman. This project was partially funded by U.S. Department of Education grant R305G03123.

    Download data file here (.csv format)