We introduce a new theoretical derivation, evaluation methods, and extensive empirical analysis for an automatic query expansion framework in which model estimation is cast as a robust constrained optimization problem. This framework provides a powerful method for modeling and solving complex expansion problems, by allowing multiple sources of domain knowledge or evidence to be encoded as simultaneous optimization constraints. Our robust optimization approach provides a clean theoretical way to model not only expansion benefit, but also expansion risk, by optimizing over uncertainty sets for the data. In addition, we introduce risk-reward curves to visualize expansion algorithm performance and analyze parameter sensitivity. We show that a robust approach significantly reduces the number and magnitude of expansion failures for a strong baseline algorithm, with no loss in average gain. Our approach is implemented as a highly efficient post-processing step that assumes little about the baseline expansion method used as input, making it easy to apply to existing expansion methods. We provide analysis showing that this approach is a natural and effective way to do selective expansion, automatically reducing or avoiding expansion in risky scenarios, and successfully attenuating noise in poor baseline methods.
|Published in||Proceedings of CIKM 2009|
|Publisher||Association for Computing Machinery, Inc.|
Copyright © 2009 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or firstname.lastname@example.org. The definitive version of this paper can be found at ACM’s Digital Library --http://www.acm.org/dl/.