Improving the Effectiveness of Information Retrieval with Clustering and Fusion

Computational Linguistics and Chinese Language Processing | , pp. 109-125

Fusion and clustering are two approaches to improving the effectiveness of information retrieval. In fusion, ranked lists are combined together by various means. The motivation is that different IR systems usually emphasize different query features when determining relevance and therefore retrieve different sets of documents. In clustering, documents are clustered either before or after retrieval. The motivation is that closely associated documents tend to be relevant to the same query so that it is likely to retrieve more relevant documents by clustering. In this paper, we present a novel fusion technique that can be combined with clustering to achieve consistent improvements. Our method involves three steps: (1) clustering, (2) re-ranking, and (3) fusion. Experiments show that our approach is more efficient than conventional approaches.