Learning and Exploiting Non-Consecutive String Patterns for Information Extraction

This paper is concerned with the problem of learning and exploiting string patterns in natural language processing, particularly information extraction. We propose a new algo-rithm for learning such patterns. Our algorithm is novel in that it can learn non-consecutive patterns with constraints , which are necessary for information extraction. Specifically, it employs an extended version of the so-called apriori algorithm at the pattern generation step. Our experimental results indicate that in information extraction the use of non-consecutive patterns with constraints is significantly better than the use of only consecutive patterns.

tr-2003-33.pdf
PDF file
tr-2003-33.doc
Word document
tr-2003-33.ps
PostScript file

Details

TypeTechReport
NumberMSR-TR-2003-33
Pages8
InstitutionMicrosoft Research
> Publications > Learning and Exploiting Non-Consecutive String Patterns for Information Extraction