Large-scale Retrieval with Ivory and MapReduce

It is commonly acknowledged that web-scale collections have outgrown the capabilities of individual machines, necessitating the use of clusters to tackle many problems in information retrieval. The release of the 25-terabyte billion-page ClueWeb09 collection in 2009 and the increasing popularity of Hadoop, the open source implementation of the MapReduce distributed framework, have motivated academic researchers to think more seriously about cluster-based distributed retrieval solutions.
In this talk, we will first introduce Ivory, an end-to-end open-source distributed retrieval system built at University of Maryland, College Park; Ivory takes full advantage of Hadoop and its underlying distributed file system for both indexing and retrieval. We will then present an overview of several research projects evolved around Ivory, such as approximate positional indexing for efficient ranked retrieval, scalable monolingual and cross-lingual pairwise document similarity, and automatically-extracted pseudo test collections for learning ranking functions for the task of web search.

Speaker Details

Tamer Elsayed received his B.Sc. and M.Sc. degrees in Computer Science from Alexandria University in Egypt, and Ph.D. degree in Computer Science from the University of Maryland, College Park (UMD) in the US in 2009. His main research interest is in information retrieval with an emphasis on large-scale text analysis. His PhD work focused on identity resolution in email collections in cases when the searcher is unfamiliar with the people involved in the collection. He spent one year as a post-doctoral researcher at the Cloud Computing Center at UMD, where he participated in the design, development, and evaluation of an open-source retrieval engine called Ivory. Before joining Cairo Microsoft Innovation Lab (CMIC) two weeks ago as a researcher, he joined the Advanced Systems Lab at King Abdullah University of Science and Technology (KAUST) as a post-doctoral fellow in the Division of Mathematics and Computer Science in 2010, where he taught two graduate courses and worked on two research projects: asynchronous iterations support for MapReduce and real-time search in Twitter.

Date:
Speakers:
Tamer Elsayed
Affiliation:
CMIC
    • Portrait of Jeff Running

      Jeff Running

    • Portrait of Tamer Elsayed

      Tamer Elsayed