Further Studies on Multi-Style Language Model for Web Information Retrieval

Xiaolong Li, Jianfeng Gao, and Kuansan Wang

Abstract

Prior studies on multi-style language model reveal that, when elaborated modeling techniques are employed to properly account for various language style usages in the documents, the resultant language modeling scores appear to be a reasonable proxy to hu-man relevance judgments. As the need of manually labeled data remains a formidable impediment to large scale such as web ap-plications, the notion that manual labels can be approximated by some automatic means has profound implications. To investigate the issue further, we conduct a series of experiments to assess the impacts of the language style on the retrieval performance with an emphasis on eliminating the needs of manual intervention. The results confirm that a retrieval system indirectly optimized for query likelihood can achieve comparable performance to those directly optimized with retrieval relevance. Furthermore, when the resultant language model scores are analyzed against the judgment labels, positive correlations emerge. These findings support the feasibility of creating or even evaluating an IR system without expensive human efforts in relevance judgments.

Details

Publication typeInproceedings
Published inProceedings of the 33rd Annual ACM SIGIR Conference
PublisherAssociation for Computing Machinery, Inc.
> Publications > Further Studies on Multi-Style Language Model for Web Information Retrieval