Sanjay Agrawal, Kaushik Chakrabarti, Surajit Chaudhuri, and Venkatesh Ganti
2008
Supporting entity extraction from large document collec-
tions is important for enabling a variety of important data
analysis tasks. In this paper, we introduce the \ad-hoc" en-
tity extraction task where entities of interest are constrained
to be from a list of entities that is specific to the task. In such
scenarios, traditional entity extraction techniques that pro-
cess all the documents for each ad-hoc entity extraction task
can be significantly expensive. In this paper, we propose an
efficient approach that leverages the inverted index on the
documents to identify the subset of documents relevant to
the task and processes only those documents. We demon-
strate the efficiency of our techniques on real datasets.
![]() PDF file |
In: VLDB Conference
| Type: | Inproceedings |