Director - Applied Sciences & Head - Multilingual Systems Research
As the head of Applied Sciences group in Microsoft Research India, I coordinate all collaborative research activities between the Online Services Division's Ad Centre and Microsoft Research India. We are actively engaged in several research projects - Keyword Suggestions Platform, Bid Traffic Estimation, Smart Pricing, Privacy Preserving Environments, etc. - that has potential product impact on Online Advertising area of Ad Centre's product suite.
As a part of my research in the area of Multilingual Systems, I am interested in all multilingual technologies that involve transparent handling of information in multiple langauges simultaneously. My research interests include Machine Translation and Transliteration, Crosslingual Information Access and creation of multilingual data for research.
I am actively involved in exploring crowdsourcing as a methodology for creation of language data through a collaborative research project - WikiBABEL - with Wikimedia Foundation. WikiBABEL explored creation of multilingual Wikipedia content and produce as a by-product parallel data for Machine Translation research. In Oct 2010, we released an open source content creation tool for Wikipedia - WikiBhasha - as a MediaWiki Extension for open source developers, and as a user-gadget for Wikipedia content creators. We are actively engaged with Wikipedia communities around the world, for stydying the adoption of WikiBhasha and its potential for data creation.
As a part of Multilingual Systems area in Microsoft Research India, we collaborate with many researchers in India and around the world.
Along with other researchers in the Multilingual Systems research area, I championed the creation of Pan Indian POS linguistic annotation standards that was adopted by the Bureau of Indian Standards. I have organized the Named Entities WorkShop (NEWS) focused on Named Entities in multilingual corpora, and the Crosslingual Information Access (CLIA) workshop series. The NEWS workshops, co-located with the ACL (in 2009, 2010 and 2012) and IJCNLP (in 2011) conferences, baselines the state of the art Machine Transliteration & Mining in about a dozen languages from around the world (العربية, বাংলা, 中文, English, עברית, हिन्दी, 日本語, ಕನ್ನಡ, 한국어, Русский, தமிழ் and ไทย). The CLIA workshops, co-located with IJCNLP/NAACL/COLING Conferences (in 2008, 2009, 2010 and 2011) focus primarily on the emerging area of cross-language technologies and systems.
2012
- A Kumaran, Sujay Kumar Jauhar, and Sumit Basu, Doodling: A Gaming Paradigm for Generating Language Data, in proceedings of the Human Computation Workshop 2012, American Association for Artificial Intelligence , July 2012
- Min Zhang, Haizhou Li, A Kumaran, and Ming Liu, Report of NEWS 2012 Machine Transliteration Shared Task, in proceedings of the ACL 2012 Named Entities WorkShop (NEWS), Jeju Island, South Korea, Association for Computational Linguistics, June 2012
- K Saravanan, Monojit Choudhury, Raghavendra Udupa, and A Kumaran, An Empirical Study of the Occurrence and Co-Occurrence of Named Entities in Natural Language Corpora, in In Proceedings of LREC 2012, European Language Resources Association, May 2012
- K Saravanan, Raghavendra Udupa, and A Kumaran, Improving Cross-Language Information Retrieval by Transliteration Generation and Mining, in To be published by Springer in an LNCS volume on FIRE-2010 Proceeding, Springer, January 2012
2011
- A Kumaran, Naren Datha, Vikram Dendi, and Ashwani Sharma, WikiBhasha: OurExperiences with Multilingual Content Creation Tool for Wikipedia, in Proceedings of the Wikipedia India Conference 2011, Wikimedia Foundation, December 2011
- A Kumaran and K Saravanan, Improving Tamil-English Cross-Language Information Retrieval by Transliteration Generation and Mining, in proceedings of Tamil Internet Conference 2011, in Philadelphia, PA, INFITT, July 2011
- A Kumaran, Mitesh Khapra, and Pushpak Bhattacharyya, Compositional Machine Transliteration, in ACM Transactions on Asian Language Information Processing (TALIP) Journal , Association for Computing Machinery, Inc., January 2011
2010
- A Kumaran, Naren Datha, B Ashok, K Saravanan, Anil Ande, Ashwani Sharma, Sridhar Vedantham, Vidya Natampally, Vikram Dendi, and Sandor Maurice, WikiBABEL: A System for Multilingual Wikipedia Content, in in Proceedings of the 'Collaborative Translation: technology, crowdsourcing, and the translator perspective' Workshop (co-located with AMTA 2010 Conference), Denver, Colorado, Association for Machine Translation in the Americas, 31 October 2010
- A Kumaran, Mitesh Khapra, and Haizhou Li, Report of NEWS 2010 Transliteration Mining Shared Task, in the ACL 2010 Named Entities WorkShop (NEWS-2010), Uppsala, Sweden, Association for Computational Linguistics, July 2010
- Haizhou Li, A Kumaran, Vladimir Pervouchine, and Min Zhang, Report of NEWS 2010 Machine Transliteration Shared Task, in the ACL 2010 Named Entities WorkShop (NEWS-2010), Uppsala, Sweden, Association for Computational Linguistics, July 2010
- Mitesh Khapra, A Kumaran, and Pushpak Bhattacharyya, Everybody loves a rich cousin: An empirical study of Transliteration through Bridge Languages, in the 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-2010), Los Angeles, USA, Association for Computational Linguistics, June 2010
- A Kumaran, கன்னித்தமிழிலிருந்து கணினித்தமிழுக்கு... : ஒரு கண்ணோட்டம், in Proceedings of World Classical Tamil Conference, Coimbatore, India, June 2010
- K Saravanan, Raghavendra Udupa, and A Kumaran, Crosslingual Information Retrieval System Enhanced with Transliteration Generation and Mining, in the Forum for Information Retrieval Evaluation (FIRE-2010) Workshop, Kolkata, India, February 2010
- Mitesh Khapra, Raghavendra Udupa, A. Kumaran, and Pushpak Bhattacharya, "$PR + RQ approx PQ$: Transliteration Mining Using Bridge Language", in the Proceedings of American Association of Artificial Intelligence (AAAI 2010) Conference, Atlanta, USA, American Association for Artificial Intelligence , 2010
2009
- Haizhou Li, A Kumaran, Vladimir Pervouchine, and Min Zhang, Report of NEWS 2009 Machine Transliteration Shared Task, in the ACL/IJCNLP-2009 Named Entities WorkShop (NEWS-2009), Singapore, Singapore, Association for Computational Linguistics, August 2009
- A Kumaran, Naren Datha, K Saravanan, Vikram Dendi, and Sandor Maurice, WikiBABEL: A Wiki-style Platform for Creation of Parallel Data, in the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL/IJCNLP-2009), Singapore, Singapore, Association for Computational Linguistics, August 2009
- Raghavendra Udupa, K Saravanan, A Kumaran, and Jagadeesh Jagarlamudi, MINT: A Method for Effective and Scalable Mining of Named Entity Transliterations from Large Comparable Corpora, in 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2009), Athens, Greece, Association for Computational Linguistics, March 2009
2008
- Kalika Bali, Sankaran Baskaran, and A Kumaran, Dependency Treelet-based Phrasal SMT: Evaluation and Issues in English-Hindi Language Pair, in the 6th International Conference on Natural Language Processing (ICON-2008), Pune, India., December 2008
- A Kumaran, Ranbeer Makin, Vijay Pattisapu, Shaik Sharif, and Lucy Vanderwende, Evaluating the Quality of Automatically Extracted Synonymy Information, in Journal for Language Technology and Computational Linguistics (JLDV), December 2008
- Raghavendra Udupa, K Saravanan, A Kumaran, and Jagadeesh Jagarlamudl, Mining Named Entity Transliteration Equivalents from Comparable Corpora, in the 17th ACM conference on Information and knowledge management (CIKM 2008), Napa Valley, USA, Association for Computing Machinery, Inc., October 2008
- A Kumaran, K Saravanan, and Sandor Maurice, WikiBABEL: Community Creation of Multilingual Data, in the WikiSYM 2008 Conference, Porto, Portugal, Association for Computing Machinery, Inc., September 2008
- Tanuja Joshi, Joseph Joy, Tobias Kellner, Udayan Khurana, A Kumaran, and Vibhuti Sengar, Crosslingual Location Search, in the 31st annual international ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR 2008), Singapore, Singapore, Association for Computing Machinery, Inc., July 2008
- K Saravanan and A Kumaran, Some Experiments in Mining Named Entity Transliteration Pairs from Comparable Corpora, in the 2nd International Workshop on Crosslingual Information Access, Hyderabad, India, January 2008
2007
- Jagadeesh Jagarlamudi and A Kumaran, Crosslingual Information Retrieval System for Indian Languages, in the 8th Workshop of the Cross-Language Evaluation Forum (CLEF 2007), Budapest, Hungary, Springer Verlag, September 2007
- A Kumaran and Tobias Kellner, A Generic Framework for Machine Transliteration, in the 30th annual international ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR 2007), Amsterdam, Netherlands, Association for Computing Machinery, Inc., July 2007
- A Kumaran and Tobias Kellner, Babel: A Machine Transliteration Workbench, in the 30th annual international ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR 2007), Amsterdam, Netherlands, Association for Computing Machinery, Inc., July 2007
- A Kumaran and Peter Carlin, Multilingual Semantic Matching with OrdPath in Relational Systems, in IEEE Data Engineering Bulletin: Multi-lingual Information Systems, IEEE, March 2007
2006
- A Kumaran, Ranbeer Makin, Vijay Pattisapu, Shaik Sharif, Gary Kacmarcik, and Lucy Vanderwende, Automatic Extraction of Synonymy Information, in the Ontologies in Text Technology Workshop, Osnabruck, Germany, December 2006
- A Kumaran, Pavan K. Chowdary, and Jayant R. Haritsa, On Pushing Multilingual Query Operators into Relational Engines, in the 22nd IEEE International Conference on Data Engineering (ICDE 2006), Atlanta, USA, IEEE, March 2006
2005
- A Kumaran, Multilingual Information Processing on Relational Database Architectures, Indian Institute of Science (PhD Thesis), December 2005

I joined Microsoft Research India in July 2005, and am currently the Director of Applied Sciences groups, and heading the Multilingual Systems Research group.
I did my PhD in Indian Institute of Science, Bangalore, India. I have a Bachelors degree from College of Engineering, Chennai, India and a Masters degree from
|
