Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
The Pythy Summarization System: Microsoft Research at DUC 2007

Kristina Toutanova, Chris Brockett, Michael Gamon, Jagadeesh Jagarlamundi, Hisami Suzuki, and Lucy Vanderwende


PYTHY is a trainable extractive summarization engine that learns a log-linear sentence ranking model by maximizing three metrics of sentence goodness: two of the metrics are based on ROUGE scores against model summaries and one is based on Semantic Content Unit (SCU) weights associated with sentences selected by past peers that were obtained during the Pyramid evaluations. In addition to sentences from the document set, our system considers simplified sentences for inclusion in the generated summaries. The feature weights of the model are optimized on the DUC 2005 data, with the final feature set for the submitted system being determined by ROUGE-2 scores against the DUC 2006 model summaries. For the DUC update task, the model was augmented with a novelty detection classifier.


Publication typeInproceedings
Published inProceedings of DUC-2007
PublisherAssociation for Computational Linguistics
> Publications > The Pythy Summarization System: Microsoft Research at DUC 2007