Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Learning and Exploiting Non-Consecutive String Patterns for Information Extraction

Yunbo Cao, Hang Li, and Shenjie Li

Abstract

This paper is concerned with the problem of learning and exploiting string patterns in natural language processing, particularly information extraction. We propose a new algo-rithm for learning such patterns. Our algorithm is novel in that it can learn non-consecutive patterns with constraints , which are necessary for information extraction. Specifically, it employs an extended version of the so-called apriori algorithm at the pattern generation step. Our experimental results indicate that in information extraction the use of non-consecutive patterns with constraints is significantly better than the use of only consecutive patterns.

Details

Publication typeTechReport
NumberMSR-TR-2003-33
Pages8
InstitutionMicrosoft Research
> Publications > Learning and Exploiting Non-Consecutive String Patterns for Information Extraction