Okapi at TREC-6: Automatic adhoc, VLC, routing, filtering and QSDR

  • S. Walker ,
  • Stephen Robertson ,
  • M. Boughanem ,
  • G. J. F. Jones ,
  • K. Sparck Jones

The Sixth Text REtrieval Conference (TREC-6) |

Published by Gaithersburg, MD: NIST

Publication

The Okapi Basic Search System (BSS) is a set-oriented ranked output system designed primarily for probabilistic-type retrieval of textual material using inverted indexes. There is a family of built-in weighting functions. In addition to weighting and ranking facilities, it has the usual Boolean and quasi-Boolean (positional) operations and a number of non-standard set operations. Indexes are of a fairly conventional inverted type. For VLC, we were interested to find out whether the Okapi BSS could handle more than 20 gigabytes of text and 8 million documents without major modification. There was no problem with data structures, but one or two system parameters had to be altered. In the interests of speed and because of limited disk space, indexes without full positional information were used. This meant that it was not possible to use passage searching. Apart from this, the runs were done in the same way as the ad hoc, but with parameters intended to maximize precision at 20 documents. Several pairs of runs were done, but only one-based on the full topic statements-was submitted. For QSDR, some small-scale experiments were run at Cambridge, using Okapi-type methods, with the QSDR data. These tests gave some indication (albeit qualified by the size of the experiment) that the methods are sufficiently robust to give satisfactory performance with appropriate tuning.