In this paper we explore the use of Amazon's Mechanical Turk (MTurk) service to aid in the creation of a test collection for the evaluation of information retrieval systems on a large collection of digitized books. The context of our work is the INEX Book Track, which aims to evaluate approaches for supporting users in reading, searching, and navigating the full texts of digitized books. Our specific focus is the evaluation of book search systems based on the Cranfield paradigm, which requires the construction of a test collection, comprised of a set of digitized books, a set of user queries (or topics), and relevance assessments. We review the Book Track's efforts in the past three years to create such a collection with the help of its participants and explore a new approach using crowdsourcing techniques and employing MTurk workers to create the topics of the test collection. Our results show crowdsourcing to be a viable option that can easily generate a high volume of test topics, but topic quality can vary greatly, leading to a rejection rate of 37%.
|Published in||Third Workshop on Very Large Digital Libraries (VLDL 2010)|