A Unified Framework for Automatic Evaluation using N-gram Co-Occurrence Statistics

  • R. Soricut ,
  • E. Brill ,
  • Eric Brill

Proceedings of ACL 2004 |

In this paper we propose a unified framework for automatic evaluation of NLP applications using N-gram co-occurrence statistics. The automatic evaluation metrics proposed to date for Machine Translation and Automatic Summarization are particular instances from the family of metrics we propose. We show that different members of the same family of metrics explain best the variations obtained with human evaluations, according to the application being evaluated (Machine Translation, Automatic Summarization, and Automatic Question Answering) and the evaluation guidelines used by humans for evaluating such applications.