COLING 2010, Beijing
28th August, 2010
The fourth workshop on Cross Lingual Information Access aims to bring together researchers from a variety of fields and practitioners from government and industry to address the issue of information need of multi-lingual societies. This workshop will also highlight the contributions of NLP and computational linguistic aspects to CLIA, in addition to the previously better represented viewpoint from Information Retrieval.
Workshop Program
Will be announced on 5th August, 2010.
Invited Talks
1. Word Sense Disambiguation and IR
Pushpak Bhattacharyya, IIT Bombay
Abstract: In this talk we will examine the rather controversial question of whether WSD helps or harms IR. After building a perspective through some classic papers like Voorese's, we describe the findings of CLEF tasks on WSD. Following this we narrate our experience of Cross Lingual IR in Indian languages, where WSD seems to be required. We describe some of our work on domain and language adapted WSD. We end the presentation with our proposal for and work on multilingual pseudo relevance feedback which does simultaneous query expansion and query disambiguation.
Dr. Pushpak Bhattacharyya is a Professor of Computer Science and Engineering at IIT Bombay. He received his B.Tech from IIT Kharagpur, M.Tech from IIT Kanpur and PhD from IIT Bombay. He has held visiting postions at MIT, Cambridge, USA, Stanford University, USA and University Joseph Fourier, Grenoble, France. Dr. Bhattacharyya's research interests include Natural Language Processing, Machine Translation and Machine Leaning. He has had more than 130 publications in top conferences and journals and has served as program chair, area chair, workshop chair and PC member of top fora like ACL, COLING, LREC, SIGIR, CIKM, NAACL, GWC and others. He has guided 7 PhDs and over 100 masters and undergraduate students in their thesis work. Dr. Bhattacharyya leads India's large scale projects on Machine Translation, Cross Lingual Search, and Wordnet and Dictionary Development. Dr. Bhattacharyya received a number of prestigious awards including IBM Innovation Award, United Nations Research Grant, MIcrosoft Research Grant, IIT Bombay's Patwardhan Award for Technology Development and Ministry of IT and Digital India Foundation's Manthan Award. Recently he has been appointed Associate Editor of the prestigious journal ACM Transactions on Asian Language Information Processing.
2. Multilinguality at NTCIR, and moving on
Tetsuya Sakai, Microsoft Research Asia
Abstract: NTCIR, often referred to as the Asian TREC, is eleven years old now. From NTCIR-1 (1999) to NTCIR-6 (2007), I was a task participant. From NTCIR-7 (2008), I started to serve as an organiser. From NTCIR-9 (2011), I will be serving as an NTCIR evaluation co-chair. In this talk, I will first look back on the past NTCIR rounds with a focus on crosslingual and multilingual tasks, e.g. Advanced Crosslingual Information Access (ACLIA). Then I will briefly discuss future plans for NTCIR which is currently going through drastic structural changes.
Tetsuya Sakai received a Master's degree from Waseda University in 1993 and joined the Toshiba Corporate R&D Center in the same year. He received a Ph.D from Waseda University in 2000 for his work on information retrieval and filtering systems. From 2000 to 2001, he was a visiting researcher at the University of Cambridge Computer Laboratory. In 2007, he became Director of the Natural Language Processing Laboratory at NewsWatch, Inc. In 2009, he joined Microsoft Research Asia. He is Chair of IPSJ SIG-IFAT, Evaluation Co-chair of NTCIR, and Regional Representative to the ACM SIGIR Executive Committee (Asia/Pacific). He has served as a Senior PC member for ACM SIGIR, CIKM and AIRS. He is on the editorial board of Information Processing and Management and that of Information Retrieval the Journal. He has received several awards in Japan, mostly from IPSJ.
Accepted Papers
Multi-Word Expression-Sensitive Word Alignment
Tsuyoshi Okita, Alfredo Maldonado Guerra, Yvette Graham and Andy Way
Co-occurrence Graph Based Iterative Bilingual Lexicon Extraction From Comparable Corpora
Diptesh Chatterjee, Sudeshna Sarkar and Arpit Mishra
Filtering news for epidemic surveillance: towards processing more languages with fewer resources
Gael Lejeune, Antoine Doucet, Roman Yangarber and Nadine Lucas
The Noisier the Better: Identifying Multilingual Word Translations Using a Single Monolingual Corpus
Reinhard Rapp and Michael Zock
Ontology driven content extraction using interlingual annotation of texts in the OMNIA project
Achille Falaise, David Rouquet, Didier Schwab, Herve Blanchon and Christian Boitet
Towards multi-lingual summarization: A comparative analysis of sentence extraction methods on English and Hebrew corpora
Marina Litvak, Mark Last, Slava Kisilevich, Daniel Keim, Hagay Lipman and Assaf Ben Gur
Multilinguization and Personalization of NL-based Systems
Najeh Hajlaoui and Christian Boitet
A Voting Mechanism for Named Entity Translation in EnglishChinese Question Answering
Ling-Xiang Tang, Shlomo Geva, Andrew Trotman and Yue Xu
More Languages, More MAP?: A Study of Multiple Assisting Languages in Multilingual PRF
Vishal Vachhani, Manoj Chinnakotla, Mitesh Khapra and Pushpak Bhattacharyya
How to Get the Same News from Different Language News Papers
T Pattabhi R K Rao and Sobha Lalitha Devi
IMPORTANT DATES
- 30th May, 2010 Paper submission due
- 30th June, 2010 Paper notification of acceptance
- 10th July, 2010 Paper Camera-ready due
- 28th August, 2010 CLIA 2010 Workshop
CALL FOR PAPERS
Cross-lingual information access (CLIA) is concerned with technologies and applications that enable people to freely access information that is expressed in any language that may differ from the query language. With the rapid development of globalization and digital online information in Internet, a growing demand for CLIA has emerged. Ordinary netizens who are surfing the Internet for special information and communicating in social networks, global companies which provide multilingual services to their multinational customers, governments who aim to lower the barriers to international commerce and collaboration and homeland security are in need of cross lingual access. This has triggered vigorous research and development in CLIA. This workshop is the fourth in a series of workshops and aims to address the need of cross-lingual information access. The previous three workshops were held during IJCAI 2007 in Hyderabad, IJCNLP 2008 in Hyderabad, and NAACL 2009 in Colorado.
In this workshop, in addition to Cross-lingual Information Retrieval (CLIR), the focus is on multi-lingual information extraction, information integration, summarization and other key technologies that are useful for CLIA. The workshop aims to bring together researchers from a variety of fields such as information retrieval, computational linguistics, machine translation, and digital library, and practitioners from government and industry to address the issue of information need of multi-lingual societies. This workshop also aims to highlight and emphasize the contributions of NLP and computational linguistic aspects to CLIA, in addition to the previously better represented viewpoint from Information Retrieval. We thus solicit submissions in the following and related topics:
Multi-lingual knowledge acquisition
Acquisition of multi-lingual parallel/comparable/non-comparable corpora
Multi-lingual document/sentence/word alignment
Multi-lingual lexicon/term extraction
Multi-lingual new words / named entity detection and translation
Machine translation in CLIA
Interaction between cross-lingual information retrieval and machine translation
Query translation and document translation
Developing statistical machine translation systems from multi-lingual corpora
Domain adaptation in machine translation
Multi-lingual / Cross-lingual named entity recognition
NLP/CL/IR for CLIA
Multi-lingual summarization
Multi-lingual information extraction
Multi-lingual question answering
Multi-lingual text categorization and clustering
Multi-lingual opinion study and sentiment analysis
Mono-lingual processing leveraging on multi-lingual resources
General CLIA
Approaches to cross-lingual/multi-lingual information access
Domain specific cross-lingual/multi-lingual information access
Cross-lingual cross media search (speech, video, audio)
Machine Learning for multi-lingual information access
Scalability issues in cross-lingual/multi-lingual information access
system evaluation
Web-scale cross-lingual search
User studies / interactive CLIA
PAPER SUBMISSION
Paper submissions to CLIA 2010 should follow the COLING 2010 paper submission policy, including paper format, blind review policy and title and author format convention. The workshop papers are in two-column format with at least two (2) pages and up to eight (8) pages of content plus one extra page for references. A detailed abstract with two (2) pages to address your on-going work is also welcome.
Submission must conform to the official COLING 2010 style guidelines. Submission is electronic using paper submission software at https://www.softconf.com/coling2010/CLIA2010/. For details, please refer to http://www.coling-2010.org.
NEWS AND UPDATES
- 5th August, 2010 Workshop Program
- 25th December, 2009 Call for Papers
ORGANIZING COMMITTEE
- Sudeshna Sarkar (Indian Institute of Technology Kharagpur)
- Min Zhang (Institute for InfoComm Research)
- Adam Lopez (The University of Edinburgh)
- Raghavendra Udupa (Microsoft Research)
PROGRAM COMMITTEE
- Eneko Agirre (University of the Basque Country)
- Ai Ti Aw (Institute for Infocomm Research)
- Sivaji Bandyopadhyay (Jadavpur University)
- Pushpak Bhattacharya (IIT Bombay)
- Nicola Cancedda (Xerox Research Centre Europe)
- Patrick Saint Dizier (IRIT, Universite Paul Sabatier)
- Nicola Ferro (University of Padua)
- Guohong Fu (Heilongjiang University)
- Cyril Goutte (National Research Council of Canada)
- A Kumaran (Microsoft Research of India)
- Gareth Jones (Dublin City University)
- Joemon Jose University of Glasgow
- Gina-Anne Levow National Centre for Text Mining (UK)
- Haizhou Li (Institute for Infocomm Research)
- Qun Liu (ICT/CAS)
- Ting Liu (Harbin Institute of Technology)
- Paul McNamee (Johns Hopkins University)
- Yao Meng (Fujitsu R&D Center Co. Ltd., China)
- Mandar Mitra (ISI Kolkata)
- Doug Oard (University of Maryland, College Park)
- Carol Peters (Istituto di Scienza e Tecnologie dell'Informazione and CLEF campaign)
- Maarten de Rijke (University of Amsterdam)
- Paolo Rosso (Technical University of Valencia)
- Hendra Setiawan (University of Maryland)
- L Sobha (AU-KBC, Chennai)
- Rohini Srihari (University at Buffalo, SUNY)
- Ralf Steinberger (European Commission - Joint Research Centre, Italy)
- Le Sun (Institute of Software, CAS)
- Chew Lim Tan (National University of Singapore)
- Vasudeva Varma (IIIT Hyderabad)
- Thuy Vu (Institute for Infocomm Research)
- Haifeng Wang (Baidu, China)
- Yunqing Xia (TsingHua University)
- Deyi Xiong (Institute for Infocomm Research)
- Guodong Zhou (SooChow University)
- Chengqing Zong (Institute of Automation, CAS)