Scalable Adhoc Entity Extraction from Text Collections

Sanjay Agrawal, Kaushik Chakrabarti, Surajit Chaudhuri, and Venkatesh Ganti


Supporting entity extraction from large document collec-

tions is important for enabling a variety of important data

analysis tasks. In this paper, we introduce the ad -hoc" en-

tity extraction task where entities of interest are constrained

to be from a list of entities that is specific to the task. In such

scenarios, traditional entity extraction techniques that pro-

cess all the documents for each ad-hoc entity extraction task

can be significantly expensive. In this paper, we propose an

efficient approach that leverages the inverted index on the

documents to identify the subset of documents relevant to

the task and processes only those documents. We demon-

strate the efficiency of our techniques on real datasets.


Publication typeInproceedings
Published inVLDB Conference
> Publications > Scalable Adhoc Entity Extraction from Text Collections