MSR India & IRSI Pre-FIRE Workshop on Multilingual Information Retrieval

Goal:

The goal of this three day workshop, organised by Microsoft Research India and Information Retrieval Society of India , is to provide a basic introduction to Information Retrieval along with a good bit of hands-on experience in using existing tools and resources to build a complete end-to-end IR system. The specific focus of the workshop will be to equip the participants with technical know-how and theoretical background needed for participating in the FIRE 2013 shared tasks on Multilingual and Cross-lingual IR, especially for Indian languages.

Workshop Schedule:

Time

Topic

15th June (Saturday)

08:30 – 09:00

Collection of Registration Kit

09:00 – 09:30

Welcome address and Orientation

09:30 – 11:00

Lecture 1: Introduction to IR and Basic Retrieval Models

11:00 – 11:30

Coffee Break

11:30 – 12:30

Lecture 2: Evaluation of IR Systems

12:30 – 14:00

Lunch Break

14:00 – 15:00

Lecture 3: Experiment Design and Shared Tasks

15:00 – 16:00

Lab I: Building an end-to-end IR system from scratch

16:00 – 16:30

Coffee Break

16:30 – 18:00

Lab II: Building an end-to-end IR system from scratch

16th June (Sunday)

09:00 – 10:00

Lecture 4: Natural Language Processing for IR

10:00 – 11:00

Lecture 5: Cross-lingual IR

11:00 – 11:30

Coffee Break

11:30 – 12:30

Lecture 6: Ranking and Re-ranking

12:30 – 14:00

Lunch Break

14:00 – 16:00

Lab III: Working with existing IR systems

16:00 – 16:30

Coffee break

16:30 – 18:00

Lab IV: Understanding Indian language IR

17th June (Monday)

09:00 – 10:00

Lecture 7: Query and Document Understanding

10:00 – 11:00

Invited Talk by Sree Hari Nagaralu, Bing, IDC Microsoft Hyderabad

11:00 – 11:30

Coffee Break

11:30 – 12:30

FIRE Shared Task Planning, Open Discussion & Closing

About FIRE 2013:

The importance of reusable, large-scale standard test collections in Information Access research has been widely recognized. The success of TREC, CLEF, and NTCIR has clearly established the importance of an evaluation workshop that facilitates research by providing the data and a common forum for comparing models and techniques. The Forum for Information Retrieval Evaluation (FIRE) follows in the footsteps of TREC, CLEF and NTCIR with the following aims:

  • Encourage research in South Asian language Information Access technologies by providing reusable large-scale test collections for IR experiments.
  • Explore new Information Retrieval / Access tasks that arise as our information needs evolve, and new needs emerge.
  • Provide a common evaluation infrastructure for comparing the performance of different IR systems.
  • Investigate evaluation methods for Information Access techniques and methods for constructing a reusable large-scale data set for IR experiments.

FIRE is one of the prime IR and NLP events in India, organized by Information Retrieval Society of India [www.irsi.res.in/]. Proceedings of FIRE are published by ACM TALIP (two special issues) and Springer LNCS. The associate organizing institutes are Indian Statistical Institute, Dhirubhai Ambani Institute of Information and Communication Technology. This year, FIRE will be organized in New Delhi from 4th to 6th December 2013. For more information please visit FIRE website: http://www.isical.ac.in/~fire/.

This year, along with the regular ad hoc retrieval task, Microsoft Research India will organize another shared task on ad hoc retrieval for transliterated queries and documents. The task will involve a few subtasks such as retrieval of Bollywood songs, where the query can be a Hindi song title transliterated into Roman text (e.g., babu ji dheere chalnaa) and the documents can be either in Roman transliterated form but with spelling variations (e.g., … babooji dhire calna …) or in Devanagari (e.g., … बाबूजी धीरे चलना …). The details of these tasks will be discussed during the workshop and will also be published on FIRE 2013 website in due course.

Call for Participation

There is no registration fee for the workshop. Participants will be selected through a competitive programing task. Applications are invited from teams consisting of one to a maximum of three members, who are typically but not necessarily, students (undergraduate, post-graduate, PhD) and/or faculty members with background in Computer Science or Information Technology and basic programming skills. Participants from other disciplines can also apply, provided they have some sufficient background in computer science and programming.

The application process (one per team) must be completed by 15th May. On 17th May, the details of the task will be posted on the website and the same will be notified to the applicants through email. The solution code has to be uploaded by 22nd May 2013. Since there are only a limited number of seats, only the top teams will be selected solely based on their performance on the task(s), and notified by 27th May. Note: Participants should bring their own laptops for the lab sessions. In case this is not possible, participants must arrange for at least one laptop for their team.

Interested participants may apply at https://cmt.research.microsoft.com/PREFIRE.

Incentives and Facilities:

Please note that the participants have to arrange for their accommodation and travel expenses. Only lunch will be provided by the organizers. A certificate of participation will be provided to all participants. The main objective of the workshop, however, is to encourage the participants to take part in the FIRE 2013 shared tasks and provide hands-on training for that. Individuals and teams who want to participate in the shared task will be supported beyond the workshop. The winning team of the FIRE shared task organized by MSR India on ad hoc retrieval for transliterated queries and documents might be considered for internships at MSR India or Bing IDC.

Tentative List of Topics to be covered

Basics of Information Retrieval, tools and resources for IR, document ranking and indexing, evaluation of IR systems, natural language processing for IR, multilingual and Cross-lingual IR techniques, etc.

Guest lecture: Sree Hari Nagaralu, Bing, IDC Hyderabad.

 

When: 15-17 June 2013

Where: Microsoft Research India Lab

            Vigyan, #9 Lavelle Road  

            Bangalore 560001  India

Workshop Reference material - MUST READ

Terrier-QuickStart

Lab II Handout

Lab III Handout

Lab IV Handout

Co-chairs:

Monojit Choudhury, Microsoft Research India

Prasenjit Majumder, DA-IICT Gandhinagar

Mandar Mitra, Indian Statistical Institute Kolkata

Student Organizers:

Harsha Kokel, DA-IICT Gandhinagar

Parth Mehta, DA-IICT Gandhinagar

Rohan Ramanath, RV College of Engineering Bangalore

Rishiraj Saha Roy, Indian Institute of Technology Kharagpur

Supported by the Microsoft Research Connections India Team

Contacts:

monojitc@microsoft.com

rishiraj.saharoy@gmail.com

 

Important Dates

Application submission:      May 15, 2013

Release of test problems:   May 17, 2013

Solution submission:           May 22, 2013

Notification of selection:     May 27, 2013

Workshop:                         June 15-17, 2013

Wi-Fi Connectivity during the workshop

Please refer to instructions in this ppt