Less is More: Eliminating index terms for subordinate clauses

We perform a linguistic analysis of documents during indexing for information retrieval. By eliminating index terms that occur only in subordinate clauses, index size is reduced by approximately 30% without adversely affecting precision or recall. These results hold for two corpora: a sample of the world wide web and an electronic encyclopedia.

tr-99-51.doc
Word document
tr-99-51.ps
PostScript file

Details

TypeTechReport
NumberMSR-TR-99-51
Pages8
InstitutionMicrosoft Research
> Publications > Less is More: Eliminating index terms for subordinate clauses