Xiaolong Li, Jianfeng Gao, and Kuansan Wang
23 July 2010
Prior studies on multi-style language model reveal that, when elaborated modeling techniques are employed to properly account for various language style usages in the documents, the resultant language modeling scores appear to be a reasonable proxy to hu-man relevance judgments. As the need of manually labeled data remains a formidable impediment to large scale such as web ap-plications, the notion that manual labels can be approximated by some automatic means has profound implications. To investigate the issue further, we conduct a series of experiments to assess the impacts of the language style on the retrieval performance with an emphasis on eliminating the needs of manual intervention. The results confirm that a retrieval system indirectly optimized for query likelihood can achieve comparable performance to those directly optimized with retrieval relevance. Furthermore, when the resultant language model scores are analyzed against the judgment labels, positive correlations emerge. These findings support the feasibility of creating or even evaluating an IR system without expensive human efforts in relevance judgments.
In Proceedings of the 33rd Annual ACM SIGIR Conference
Publisher Association for Computing Machinery, Inc.
Copyright © 2007 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or firstname.lastname@example.org. The definitive version of this paper can be found at ACM’s Digital Library --http://www.acm.org/dl/.