Fourth International Workshop On Cross Lingual Information Access.

COLING 2010, Beijing

28th August, 2010

The fourth workshop on Cross Lingual Information Access aims to bring together researchers from a variety of fields and practitioners from government and industry to address the issue of information need of multi-lingual societies. This workshop will also highlight the contributions of NLP and computational linguistic aspects to CLIA, in addition to the previously better represented viewpoint from Information Retrieval.

Workshop Program
Invited Talks

1. Word Sense Disambiguation and IR
Pushpak Bhattacharyya, IIT Bombay

Abstract: In this talk we will examine the rather controversial question of whether WSD helps or harms IR. After building a perspective through some classic papers like Voorese's, we describe the findings of CLEF tasks on WSD. Following this we narrate our experience of Cross Lingual IR in Indian languages, where WSD seems to be required. We describe some of our work on domain and language adapted WSD. We end the presentation with our proposal for and work on multilingual pseudo relevance feedback which does simultaneous query expansion and query disambiguation.

Dr. Pushpak Bhattacharyya is a Professor of Computer Science and Engineering at IIT Bombay. He received his B.Tech from IIT Kharagpur, M.Tech from IIT Kanpur and PhD from IIT Bombay. He has held visiting postions at MIT, Cambridge, USA, Stanford University, USA and University Joseph Fourier, Grenoble, France. Dr. Bhattacharyya's research interests include Natural Language Processing, Machine Translation and Machine Leaning. He has had more than 130 publications in top conferences and journals and has served as program chair, area chair, workshop chair and PC member of top fora like ACL, COLING, LREC, SIGIR, CIKM, NAACL, GWC and others. He has guided 7 PhDs and over 100 masters and undergraduate students in their thesis work. Dr. Bhattacharyya leads India's large scale projects on Machine Translation, Cross Lingual Search, and Wordnet and Dictionary Development. Dr. Bhattacharyya received a number of prestigious awards including IBM Innovation Award, United Nations Research Grant, MIcrosoft Research Grant, IIT Bombay's Patwardhan Award for Technology Development and Ministry of IT and Digital India Foundation's Manthan Award. Recently he has been appointed Associate Editor of the prestigious journal ACM Transactions on Asian Language Information Processing.

2. Multilinguality at NTCIR, and moving on
Tetsuya Sakai, Microsoft Research Asia

Abstract: NTCIR, often referred to as the Asian TREC, is eleven years old now. From NTCIR-1 (1999) to NTCIR-6 (2007), I was a task participant. From NTCIR-7 (2008), I started to serve as an organiser. From NTCIR-9 (2011), I will be serving as an NTCIR evaluation co-chair. In this talk, I will first look back on the past NTCIR rounds with a focus on crosslingual and multilingual tasks, e.g. Advanced Crosslingual Information Access (ACLIA). Then I will briefly discuss future plans for NTCIR which is currently going through drastic structural changes.

Tetsuya Sakai received a Master's degree from Waseda University in 1993 and joined the Toshiba Corporate R&D Center in the same year. He received a Ph.D from Waseda University in 2000 for his work on information retrieval and filtering systems. From 2000 to 2001, he was a visiting researcher at the University of Cambridge Computer Laboratory. In 2007, he became Director of the Natural Language Processing Laboratory at NewsWatch, Inc. In 2009, he joined Microsoft Research Asia. He is Chair of IPSJ SIG-IFAT, Evaluation Co-chair of NTCIR, and Regional Representative to the ACM SIGIR Executive Committee (Asia/Pacific). He has served as a Senior PC member for ACM SIGIR, CIKM and AIRS. He is on the editorial board of Information Processing and Management and that of Information Retrieval the Journal. He has received several awards in Japan, mostly from IPSJ.

Accepted Papers

Multi-Word Expression-Sensitive Word Alignment
Tsuyoshi Okita, Alfredo Maldonado Guerra, Yvette Graham and Andy Way

Co-occurrence Graph Based Iterative Bilingual Lexicon Extraction From Comparable Corpora
Diptesh Chatterjee, Sudeshna Sarkar and Arpit Mishra

Filtering news for epidemic surveillance: towards processing more languages with fewer resources
Gael Lejeune, Antoine Doucet, Roman Yangarber and Nadine Lucas

The Noisier the Better: Identifying Multilingual Word Translations Using a Single Monolingual Corpus
Reinhard Rapp and Michael Zock

Ontology driven content extraction using interlingual annotation of texts in the OMNIA project
Achille Falaise, David Rouquet, Didier Schwab, Herve Blanchon and Christian Boitet

Towards multi-lingual summarization: A comparative analysis of sentence extraction methods on English and Hebrew corpora
Marina Litvak, Mark Last, Slava Kisilevich, Daniel Keim, Hagay Lipman and Assaf Ben Gur

Multilinguization and Personalization of NL-based Systems
Najeh Hajlaoui and Christian Boitet

A Voting Mechanism for Named Entity Translation in EnglishChinese Question Answering
Ling-Xiang Tang, Shlomo Geva, Andrew Trotman and Yue Xu

More Languages, More MAP?: A Study of Multiple Assisting Languages in Multilingual PRF
Vishal Vachhani, Manoj Chinnakotla, Mitesh Khapra and Pushpak Bhattacharyya

How to Get the Same News from Different Language News Papers
T Pattabhi R K Rao and Sobha Lalitha Devi


  • 30th May, 2010 Paper submission due
  • 30th June, 2010 Paper notification of acceptance
  • 10th July, 2010 Paper Camera-ready due
  • 28th August, 2010 CLIA 2010 Workshop


Cross-lingual information access (CLIA) is concerned with technologies and applications that enable people to freely access information that is expressed in any language that may differ from the query language. With the rapid development of globalization and digital online information in Internet, a growing demand for CLIA has emerged. Ordinary netizens who are surfing the Internet for special information and communicating in social networks, global companies which provide multilingual services to their multinational customers, governments who aim to lower the barriers to international commerce and collaboration and homeland security are in need of cross lingual access. This has triggered vigorous research and development in CLIA. This workshop is the fourth in a series of workshops and aims to address the need of cross-lingual information access. The previous three workshops were held during IJCAI 2007 in Hyderabad, IJCNLP 2008 in Hyderabad, and NAACL 2009 in Colorado.

In this workshop, in addition to Cross-lingual Information Retrieval (CLIR), the focus is on multi-lingual information extraction, information integration, summarization and other key technologies that are useful for CLIA. The workshop aims to bring together researchers from a variety of fields such as information retrieval, computational linguistics, machine translation, and digital library, and practitioners from government and industry to address the issue of information need of multi-lingual societies. This workshop also aims to highlight and emphasize the contributions of NLP and computational linguistic aspects to CLIA, in addition to the previously better represented viewpoint from Information Retrieval. We thus solicit submissions in the following and related topics:

Multi-lingual knowledge acquisition

• Acquisition of multi-lingual parallel/comparable/non-comparable corpora

• Multi-lingual document/sentence/word alignment

• Multi-lingual lexicon/term extraction

• Multi-lingual new words / named entity detection and translation

Machine translation in CLIA

• Interaction between cross-lingual information retrieval and machine translation

• Query translation and document translation

• Developing statistical machine translation systems from multi-lingual corpora

• Domain adaptation in machine translation

• Multi-lingual / Cross-lingual named entity recognition


• Multi-lingual summarization

• Multi-lingual information extraction

• Multi-lingual question answering

• Multi-lingual text categorization and clustering

• Multi-lingual opinion study and sentiment analysis

• Mono-lingual processing leveraging on multi-lingual resources

General CLIA

• Approaches to cross-lingual/multi-lingual information access

• Domain specific cross-lingual/multi-lingual information access

• Cross-lingual cross media search (speech, video, audio)

• Machine Learning for multi-lingual information access

• Scalability issues in cross-lingual/multi-lingual information access

• system evaluation

• Web-scale cross-lingual search

• User studies / interactive CLIA


Paper submissions to CLIA 2010 should follow the COLING 2010 paper submission policy, including paper format, blind review policy and title and author format convention. The workshop papers are in two-column format with at least two (2) pages and up to eight (8) pages of content plus one extra page for references. A detailed abstract with two (2) pages to address your on-going work is also welcome.

  • Eneko Agirre (University of the Basque Country)
  • Ai Ti Aw (Institute for Infocomm Research)
  • Sivaji Bandyopadhyay (Jadavpur University)
  • Pushpak Bhattacharya (IIT Bombay)
  • Nicola Cancedda (Xerox Research Centre Europe)
  • Patrick Saint Dizier (IRIT, Universite Paul Sabatier)
  • Nicola Ferro (University of Padua)
  • Guohong Fu (Heilongjiang University)
  • Cyril Goutte (National Research Council of Canada)
  • A Kumaran (Microsoft Research of India)
  • Gareth Jones (Dublin City University)
  • Joemon Jose University of Glasgow
  • Gina-Anne Levow National Centre for Text Mining (UK)
  • Haizhou Li (Institute for Infocomm Research)
  • Qun Liu (ICT/CAS)
  • Ting Liu (Harbin Institute of Technology)
  • Paul McNamee (Johns Hopkins University)
  • Yao Meng (Fujitsu R&D Center Co. Ltd., China)
  • Mandar Mitra (ISI Kolkata)
  • Doug Oard (University of Maryland, College Park)
  • Carol Peters (Istituto di Scienza e Tecnologie dell'Informazione and CLEF campaign)
  • Maarten de Rijke (University of Amsterdam)
  • Paolo Rosso (Technical University of Valencia)
  • Hendra Setiawan (University of Maryland)
  • L Sobha (AU-KBC, Chennai)
  • Rohini Srihari (University at Buffalo, SUNY)
  • Ralf Steinberger (European Commission - Joint Research Centre, Italy)
  • Le Sun (Institute of Software, CAS)
  • Chew Lim Tan (National University of Singapore)
  • Vasudeva Varma (IIIT Hyderabad)
  • Thuy Vu (Institute for Infocomm Research)
  • Haifeng Wang (Baidu, China)
  • Yunqing Xia (TsingHua University)
  • Deyi Xiong (Institute for Infocomm Research)
  • Guodong Zhou (SooChow University)
  • Chengqing Zong (Institute of Automation, CAS)