I am a researcher in Microsoft Research Lab India since 2007. My research interests cut across the areas of Linguistics, Cognition and Computation. Currently, I am working on script and code-mixing, especially in social media and web search. We have introduced the notion of Mixed-Script Information Retrieval, where the query and the documents can be in different, and possibly, more than one scripts but in the same language; the task is to retrieve the relevant documents across scripts. Such situations arise quite commonly for Indian languages, where the documents (say song lyrics or posts on discussion forums) can be either written in the native script or in Romanized form. In fact, a large amount of Indian language (and also Greek, Arabic, etc.) content on the Web is available in Romanized form. Mixed-script IR entails challenges such as indexing cross-script indexing, handling transliteration induced spelling variations in queries and documents, code-mixed query understanding and query completion.
Code-mixing or use of more than one languages in a single conversation or utterance is a phenomenon that is observed in all multilingual societies. Due to social media and online forums, code-mixing is now rampant on the Internet. I am interested in developing core NLP techniques for identifying and processing code-mixed text. I am also interested in studying the extent, distribution and socio-linguistic factors influencing code-mixing.
I am also work on computational musicology. I would like to understand how the (computationally defined) structure of music correlates to and causes certain emotional responses and preferences in individuals and cultures. In particular, I am studying the usage of musical scales and their evolution across the musical cultures of the world, and the cognitive models of scale perception.
I also work on various NLP and Information Retrieval techniques for Indian languages. In the past I have worked on language evolution, evolution of the structure of Web search queries and complex networks.
Rafiya Begum, Kalika Bali, Monojit Choudhury, Koustav Rudra, and Niloy Ganguly, Functions of Code-Switching in Tweets: An Annotation Scheme and Some Initial Experiments, LREC, May 2016.
Rishiraj Saha Roy, Anusha Suresh, Niloy Ganguly, and Monojit Choudhury, Improving Document Ranking for Long Queries with Nested Query Segmentation, ECIR, March 2016.
Rishiraj Saha Roy, Anusha Suresh, Niloy Ganguly, Monojit Choudhury, Deepak Shankar, and Tanwita Nimiar, Improving Document Ranking for Long Queries with Nested Query Segmentation, no. MSR-TR-2015-91, 31 December 2015.
Royal Sequiera, Monojit Choudhury, Parth Gupta, Paolo Rosso, Shubham Kumar, Somnath Banerjee, Sudip Kumar Naskar, Sivaji Bandyopadhyay, Gokul Chittaranjan, Amitava Das, and Kunal Chakma, Overview of FIRE-2015 Shared Task on Mixed Script Information Retrieval, FIRE, December 2015.
Royal Sequiera, Monojit Choudhury, and Kalika Bali, POS Tagging of Hindi-English Code Mixed Text from Social Media: Some Machine Learning Experiments, NLPAI, December 2015.
Spandana Gella, Kalika Bali, and Monojit Choudhury, "ye word kis lang ka hai bhai?" Testing the Limits of Word level Language Identification, NLPAI, December 2014.
Yogarshi Vyas, Spandana Gella, Jatin Sharma, Kalika Bali, and Monojit Choudhury, POS Tagging of English-Hindi Code-Mixed Social Media Content, in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Doha, Qatar, October 2014.
Kalika Bali, Jatin Sharma, Monojit Choudhury, and Yogarshi Vyas, "I am borrowing ya mixing ?" An Analysis of English-Hindi Code Mixing in Facebook, in Proceedings of the First Workshop on Computational Approaches to Code Switching, Association for Computational Linguistics, Doha, Qatar, October 2014.
Gokul Chittaranjan, Yogarshi Vyas, Kalika Bali, and Monojit Choudhury, Word-level Language Identification using CRF: Code-switching Shared Task Report of MSR India System, in Proceedings of the First Workshop on Computational Approaches to Code Switching, Association for Computational Linguistics, Doha, Qatar, October 2014.
Rishiraj Saha Roy, Rahul Katare, Niloy Ganguly, and Monojit Choudhury, Automatic Discovery of Adposition Typology, in Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, Coling 2014, August 2014.
Panini Linguistics Olympiad 2015 camp starts on 24th May. It will be held in Microsoft Research Lab India. The camp is only for the students. However, if you are a teacher, linguist or language-enthusiast and want to attend the inaugural (24th May) or closing session (31st May) of the camp, please drop me an email.