Revisiting the Divergence Minimization Feedback Model

  • Yuanhua Lv ,
  • ChengXiang Zhai

Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management |

Published by ACM

Publication

Pseudo-relevance feedback (PRF) has proven to be an effective strategy for improving retrieval accuracy. In this paper, we revisit a PRF method based on statistical language models, namely the divergence minimization model (DMM). DMM not only has apparently sound theoretical foundation, but also has been shown to satisfy most of the retrieval constraints. However, it turns out to perform surprisingly poorly in many previous experiments. We investigate the cause, and reveal that DMM inappropriately tackles the entropy of the feedback model, which generates highly skewed feedback model. To address this problem, we propose a maximum-entropy divergence minimization model (MEDMM) by introducing an entropy term to regularize DMM. Our experiments on various TREC collections demonstrate that MEDMM not only works much better than DMM, but also outperforms several other state of the art PRF methods, especially on web collections. Moreover, unlike existing PRF models that have to be combined with the original query to perform well, MEDMM can work effectively even without being combined with the original query.