Modeling and Solving Term Mismatch for Full-Text Retrieval

Even though modern retrieval systems typically use a multitude of features to rank documents, the backbone for search ranking is usually the standard tf.idf retrieval models.

We address a limitation of the fundamental retrieval models, the modeling of the vocabulary mismatch between query terms and relevant documents. Vocabulary mismatch happens when query terms fail to appear in the documents that are relevant to the query, causing suboptimal retrieval. Mismatch is a well-known and long standing problem in retrieval. However, it is not well understood how often query terms mismatch relevant documents, neither how mismatch affects retrieval performance. We formally define term mismatch. We show that mismatch is a very common problem in search, that it allows us to understand the common failures of the current retrieval models and the behaviors of many of the retrieval techniques, and that a large potential gain is possible by simply making the retrieval models mismatch-aware. We also demonstrate several initial successes in addressing term mismatch in retrieval using novel mismatch prediction methods and theoretically motivated retrieval techniques, which could lead to even larger gains in retrieval.

Speaker Details

Le Zhao is a PhD candidate at the Language Technologies Institute, Carnegie Mellon University, expected to graduate in August 2012. Le does research on computer facilitated human problem solving with a focus on search technology, is the owner of www.wikiquery.org and is constantly defining and solving problems. Le’s work on information retrieval covers structured documents, structured queries and retrieval models. Le has worked on legal discovery, bio/medical/chemical patent retrieval, and has contributed to the open source search engine Lemur/Indri, the crawling of the billion-page ClueWeb09 dataset, and related large scale processing and mining efforts using MapReduce. Le has also worked on the search engine support for human language technology applications such as intelligent tutoring, question answering and knowledge extraction from the Web. Le received a BE and ME from the Department of Computer Science and Technology, Tsinghua University.

Date:: April 16, 2012
Speakers:: Le Zhao
Affiliation:: Carnegie Mellon University

- Jeff Running

Modeling and Solving Term Mismatch for Full-Text Retrieval

Speaker Details

Speakers

Jeff Running