*
Quick Links|Home|Worldwide
Microsoft*
Search for


MULTILINGUAL SYSTEMS 

:: overview :: people :: projects :: publications :: careers ::

 

 

overview  

The Multilingual Systems (MLS) group in Microsoft Research India focuses on research that develops a true natural-language-neutral approach in all aspects of linguistic computing; Specifically, Language Computing Systems-related issues, such as, technologies for the multilingual information interfaces, organization and access, information retrieval, etc., and, Computational Linguistics-related issues, such as, language understanding, summarization, translation, cross-lingual searches, etc.

In addition our goal is to help enable deep support for Indian languages in Microsoft Products and Platforms and to play a significant and symbiotic role with the research community in the Indian languages computing arena.


people

Core Team:

A Kumaran S Baskaran Kalika Bali Raghavendra Udupa Jagadeesh Jagarlamudi
kumaran baskaran kalikab raghavu jags
Group Lead Assistant Researcher Associate Researcher Associate Researcher Assistant Researcher

 

Affiliate Members:

           
Ganesh Ananthanarayan Archana Prasad Joseph Joy
ganeshan archanap josephj
Assistant Researcher Assistant Designer Development Manager

 

Developers and Interns:

         
K Saravanan Tejaswi Tenneti Lucia Specia Sandeep Sripada Karthik Raghunathan
v-sarak t-ttejas t-lucias t-sansri t-kraghu
Anna University IIIT-Allahabad Sao Paulo Austin BITS-Pilani   NIT-Calicut

 

Alumni

   
Shaik Sharif V Surendhaaren Priyanka Biswas Vijay Pattisapu Ranbeer Makin Shravan Kumar
         
BITS-Pilani BITS-Pilani Delhi U U of Texas   IIIT-Hyderabad BITS-Pilani

 

 

 


projects

:: Project MIME: The goal of the Multilingual Information Management Environment project is to explore natural-language-neutral technologies to collate, index, access and retrieve information from a collection of multilingual text documents.  Applications for such environment could range from document management in desktop computer (desktop or email search) to web search (a large heterogeneous collection of multilingual pages).

The research issues that need to be addressed are:

  • Alternate Semantics for Search and Query Processing (Unilingual vs Multilingual vs Crosslingual)
  • Effective and efficient access structures, leveraging the linguistic resources
  • User-customizable interface for multiple languages
> People involved: Kumaran

 

:: Project Machine Translation: This project explores the issues in automatic translation of text between English and Indian Languages, using statistical machine translation technologies. The information and models inferred from large monolingual, comparable and parallel multilingual corpora are used to translate new text accurately and intuitively, for building practical and scalable systems. Currently, we have adapted a generic Statistical Machine Translation framework, and have functioning machine translation system between English and Hindi. In the future, we plan to experiment with comparable corpora instead of parallel corpora for more natural translation tasks. An Instant Messenger platform that leverages such a MT service is also being developed in Microsoft Research India.
> People involved: Kumaran, Baskaran, Ganesh

 

:: Project Tools : Linguistic Tools in Indian Languages : This project aims at developing basic tools, such as, morphological analyzers, parts-of-speech taggers, parsers, named-entity recognizers, etc., in Indian languages. The approach for such development is primarily language-neutral machine learning based techniques. We collaborate with our research partners to enable availability of such tools in Indian languages.
> People involved: Baskaran, Raghavendra, Saravanan, Kumaran ...
     

:: Project Corpora collection: Machine translation using statistical machine learning techniques require huge parallel corpora for learning and training. This project explores the collection of parallel data over the web by encouraging community participation for creating news articles in multiple languages. In addition to encourage data sharing across institutions researching on natural languages, MSRIndia is driving an effort for the creation of public Indian languages corpora, as a service to the computational linguistics community. We are very keen on driving this effort as a community affair, involving all players including academia, government and industry. Check back for more updates on this exciting project soon.
> People involved: Kumaran, Joseph, Archana, Saravanan …

 

 

:: Project Ontologies: Automatic Generation of Linguistic Ontologies : This project explores generation of automatic Ontologies and evaluate its quality and coverage vis-à-vis hand-crafted ones. The purpose of this project is to see if automatic assembling the bits of linguistic information gleaned from a large corpus for a specific domain, may substitute effectively the expensive hand-crafting of Ontologies in that domain. If successful, a large number of resource-poor languages of the world may benefit from automatic Ontology generation. Currently, we are exploring such strategy in English, using English WordNet and an automatically generated Ontology using MSR’s MindNet.
> People involved: Kumaran

 


publications

  • On Pushing Multilingual Query Operators into Relational Engines
    A Kumaran, P K Chowdary and J R Haritsa.
    Published in the 22nd IEEE International Conference on Data Engineering (ICDE-2006), April 2006, in Atlanta, USA.
  • Statistical POS-Tagging and Chunking in Indian Languages
    S Baskaran.
    Workshop on Machine Learning in Natural Language Processing, July 2006, in Mumbai, India.
  • Automatic Extraction of Synonymy Information [Extended Abstract]
    A Kumaran, V Pattisapu, R Makin, S Sharif,  L Vanderwende and G Kacmarcik.
    Ontologies in Text Technology: Approaches to Extract Semantic Knowledge from Structured Information, January 2007, in Osnabruck, Germany.
  • Multilingual Semantic Matching with OrdPath in Relational Systems
    A Kumaran and P Carlin.
    Bulletin for the Technical Committee on Data Engineering (Vol. 30, No 1), March 2007.
  • The Pythy Summarization System: Microsoft Research at DUC2007
    K Toutanova, C Brockett, M Gamon, J Jagarlamudi, H Suzuki and L Vanderwende.
    Document Understanding Conference (DUC-2007), a HLT-NAACL Workshop, April 2007, Rochester, USA.
  • A Generic Framework for Machine Transliteration (Poster)
    A Kumaran and T Kellner.
    30th Annual ACM SIGIR Conference, July 2007, in Amsterdam, Netherlands. 
  • A Machine Transliteration Workbench (Demo)
    A Kumaran and T Kellner.
    30th Annual ACM SIGIR Conference, July 2007, in Amsterdam, Netherlands
  • Automatic Extraction of Synonymy Information
    A Kumaran, V Pattisapu, R Makin, S K Vuggrala and L Vanderwende.
    To Appear in the GLDV-Journal for Computational Linguistics and Language Technology, 2007.

careers  

The Multilingual Systems Group at MSR India strives to do world class research that develops a true natural-language-neutral approach in all aspects of language-based computing, and enables adding Indic-language functionalities in Microsoft products.

Researchers work independently or with a team to conduct high-quality academic research in their field. They work to enhance their presence in their field of research outside of Microsoft, through paper publication, conference attendance, and otherwise interacting with an international academic community. As members of the world-wide MSR family of researchers, they collaborate with researchers in all our labs, and with universities around the world. In addition, in Multilingual Systems Group, they need to interface with other public/private research institutions and government agencies, for coordination and standardization activities.

Qualified candidates will have a PhD (Masters for Asst. Researcher position), a strong record of publications appropriate for their experience, and a desire to grow their career in research. Strong oral and written communication skills are essential. Entrepreneurial experience or experience in industry is helpful.  If you are interested in a position in Multilingual Systems Research group, please mail a note with your resume to a.kumaran@microsoft.com.  For specific current openings in our group (and other groups of MSR India), please visit this link: https://career-intl/international/default.asp?loc=MSRI.

 



©2008 Microsoft Corporation. All rights reserved. Terms of Use |Trademarks |Privacy Statement