Simon Corston-Oliver and William B. Dolan
July 1999
We perform a linguistic analysis of documents during indexing for information retrieval. By eliminating index terms that occur only in subordinate clauses, index size is reduced by approximately 30% without adversely affecting precision or recall. These results hold for two corpora: a sample of the world wide web and an electronic encyclopedia.
![]() Word document | ![]() PostScript file |
| Type | TechReport |
| Number | MSR-TR-99-51 |
| Pages | 8 |
| Institution | Microsoft Research |