
Topic:
Paraphrase and Textual Entailment
Abstract:
This lecture will give an overview of the related fields of paraphrase and textual entailment. Many natural language processing applications require the ability to recognize when two text segments – however superficially distinct – overlap semantically. Question-Answering (QA), Information Extraction (IE), command-and-control, and multi-document summarization are examples of applications that need precise information about the relationship between different text segments. The concepts of entailment and paraphrase are useful in characterizing these relationships. For instance, does the meaning of one text entail all or part of the other, as in the following example?
I bought a science fiction novel, I bought a book
Or are the two texts so close in meaning that they can be considered paraphrases, linked by many bidirectional entailments?
On its way to an extended mission at Saturn, the Cassini probe on Friday makes its closest rendezvous with Saturn's dark moon Phoebe.
The Cassini spacecraft, which is en route to Saturn, is about to make a close pass of the ringed planet's mysterious moon Phoebe
Quantifying semantic overlap is a fundamental challenge that encompasses issues of lexical choice, syntactic alternation, and reference/discourse structure. The last few years have seen a surge in interest in modeling techniques aimed at measuring semantic equivalence and entailment. I will discuss some of these techniques, go into some detail on the PASCAL Recognizing Textual Entailment Challenges (http://www.pascal-network.org/Challenges/RTE), and explore some of the challenges that the field faces in finding suitable training data.
Recommended Reading:
Roy Bar-Haim, Ido Dagan, Bill Dolan, Lisa Ferro, Danilo Giampiccolo, Bernardo Magnini and Idan Szpektor. 2006. The Second PASCAL Recognising Textual Entailment Challenge. In Proceedings of the Second PASCAL Challenges Workshop on Recognising Textual Entailment.
http://www.cs.biu.ac.il/~dagan/RTE2/Proceedings/01.pdf
Regina Barzilay , Lillian Lee, Learning to paraphrase: an unsupervised approach using multiple-sequence alignment, Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, p.16-23, May 27-June 01, 2003, Edmonton, Canada
http://portal.acm.org/citation.cfm?id=1073448&dl=ACM&coll=ACM&CFID=15151515&CFTOKEN=6184618
DIRT Discovery of Inference Rules from Text (2001) Dekang Lin, Patrick Pantel. Knowledge Discovery and Data Mining.
Quirk, C., Brockett, C., and Dolan, W. B. (2004). Monolingual machine translation for paraphrase generation. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pages 142-149, Barcelona Spain.
http://acl.ldc.upenn.edu/acl2004/emnlp/pdf/Quirk.pdf
Shinyama, Y. and Sekine, S. (2003). Paraphrase acquisition for information extraction. In IWP2003.
http://citeseer.ist.psu.edu/643288.html
Bio of the Speaker:
Bill Dolan is a Senior Researcher and Manager of the Natural Language Processing Group at Microsoft Research. Bill Dolan has been at Microsoft Research since 1992, and became the manager of the Natural Language Processing Group in 2000. He received his B.A. in Linguistics from Berkeley and a Ph.D. in Computational Linguistics from UCLA with a dissertation on the formal properties of lexical systems. While a graduate student, he worked for three years working at the IBM LA Scientific Center on different natural language processing projects, including machine translation and an automated help system. His work has focused primarily on semantic processing, including word sense disambiguation and MindNet, a large-scale lexical knowledge base built automatically from free text. His current interests include paraphrase recognition/generation and machine translation. He is also a member of the NAACL Executive Board.
Homepage: http://research.microsoft.com/~billdol/
E-mail: billdol@microsoft.com
Additional Material (References, Slides & Lecture Notes):