Entailment: An Effective Metric for Comparing and Evaluating Hierarchical and Non-hierarchical Annotation Schemes

Rohan Ramanath, Monojit Choudhury, and Kalika Bali

Abstract

Hierarchical or nested annotation of linguistic data often co-exists with simpler non-hierarchical or flat counterparts, a classic example being that of annotations used for parsing and chunking. In this work, we propose a general strategy for comparing across these two schemes of annotation using the concept of entailment that formalizes a correspondence between them. We use crowdsourcing to obtain query and sentence chunking and show that entailment can not only be used as an effective evaluation metric to assess the quality of annotations, but it can also be employed to filter out noisy annotations.

Details

Publication typeInproceedings
Published inProceedings of LAW VII and ID
PublisherAssociation for Computational Linguistics

Previous versions

Rohan Ramanath, Monojit Choudhury, Kalika Bali, and Rishiaj Saha Roy. Crowd Prefers the Middle Path: A New IAA Metric for Crowdsourcing Reveals Turker Biases in Query Segmentation, Association for Computational Linguistics, July 2013.

> Publications > Entailment: An Effective Metric for Comparing and Evaluating Hierarchical and Non-hierarchical Annotation Schemes