Monojit Choudhury

Monojit Choudhury
RESEARCHER
.

I am a researcher in Microsoft Research Lab India since 2007. My research interests cut across  the areas of Linguistics, Cognition and Computation. Currently, I am working on script and code-mixing, especially in social media and web search. We have introduced the notion of Mixed-Script Information Retrieval, where the query and the documents can be in different, and possibly, more than one scripts but in the same language; the task is to retrieve the relevant documents across scripts. Such situations arise quite commonly for Indian languages, where the documents (say song lyrics or posts on discussion forums) can be either written in the native script or in Romanized form. In fact, a large amount of Indian language (and also Greek, Arabic, etc.) content on the Web is available in Romanized form. Mixed-script IR entails challenges such as indexing cross-script indexing, handling transliteration induced spelling variations in queries and documents, code-mixed query understanding and query completion.

Code-mixing or use of more than one languages in a single conversation or utterance is a phenomenon that is observed in all multilingual societies. Due to social media and online forums, code-mixing is now rampant on the Internet. I am interested in developing core NLP techniques for identifying and processing code-mixed text. I am also interested in studying the extent, distribution and socio-linguistic factors influencing code-mixing.

I am also work on computational musicology. I would like to understand how the (computationally defined) structure of music correlates to and causes certain emotional responses and preferences in individuals and cultures. In particular, I am studying the usage of musical scales and their evolution across the musical cultures of the world, and the cognitive models of scale perception.  

I also work on various NLP and Information Retrieval techniques for Indian languages. In the past I have worked on language evolution, evolution of the structure of Web search queries and complex networks.

I am actively involved in the organization of the Panini Linguistics Olympiad - the Indian national version of the International Linguistics Olympiad.

Publications

2014

Rishiraj Saha Roy, Rahul Katare, Niloy Ganguly, and Monojit Choudhury, Automatic Discovery of Adposition Typology, in Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, Coling 2014, August 2014

Rishiraj Saha Roy, Rahul Katare, Niloy Ganguly, Srivatsan Laxman, and Monojit Choudhury, Discovering and understanding word level user intent in Web search queries, in Web Semantics: Science, Services and Agents on the World Wide Web, Elsevier, August 2014

Parth Gupta, Kalika Bali, Rafael E. Banchs, Monojit Choudhury, and Paolo Rosso, Query Expansion for Mixed-Script Information Retrieval, ACM – Association for Computing Machinery, July 2014

Rishiraj Saha Roy, Yogarshi Vyas, Niloy Ganguly, and Monojit Choudhury, Improving Unsupervised Query Segmentation using Parts-of-Speech Sequence Information, in Proceedings of the 37th Annual ACM SIGIR Conference on Research and Development on Information Retrieval (SIGIR '14), ACM – Association for Computing Machinery, July 2014

Rishiraj Saha Roy, M. Dastagiri Reddy, Niloy Ganguly, and Monojit Choudhury, Understanding the Linguistic Structure and Evolution of Web Search Queries, EVOLANG, April 2014

2013

Monojit Choudhury, Ranjita Bhagwan, and Kalika Bali, The use of Melodic Scales in Bollywood Music: An Empirical Study, in Proceedings of the 14th International Society for Music Information Retrieval Conference, International Society for Music Information Retrieval, November 2013

Sai Sumanth Miryala, Ranjita Bhagwan, Monojit Choudhury, and Kalika Bali, Automatically Identifying Vocal Expressions for Music Transcription, in 2013 International Society of Music Information Retrieval, November 2013

Rohan Ramanath, Monojit Choudhury, and Kalika Bali, Entailment: An Effective Metric for Comparing and Evaluating Hierarchical and Non-hierarchical Annotation Schemes, in Proceedings of LAW VII and ID, Association for Computational Linguistics, July 2013

Rohan Ramanath, Monojit Choudhury, Kalika Bali, and Rishiaj Saha Roy, Crowd Prefers the Middle Path: A New IAA Metric for Crowdsourcing Reveals Turker Biases in Query Segmentation, in Proceedings of ACL, Association for Computational Linguistics, July 2013

Rishiraj Saha Roy, Anusha Suresh, Niloy Ganguly, and Monojit Choudhury, Place value: Word position shifts vital to search dynamics, in Proceedings of WWW (Companion Volume), WWW Conference 2013, 2013

All publications...

Current Activities:

Please consider submitting your work to First Workshop on Language Technologies for Indian Social Media Text, to be held in conjunction to ICON 2014 (deadline:  7th Nov 2014). As a part of this workshop, I am offering a tutorial on Code-mixing in Social Media.

We are organizing the FIRE Shared Task on Transliterated Search. The deadline for task registration has passed.