WikiBABEL
What is WikiBABEL Project about?WikiBABEL project explores community collaborative creation of linguistic data for research by specfic language communities. |
|
Approach
Our current focus is on collecting parallel data that is vitally needed for Machine Translation research, using the largest community participatory site, Wikipedia. The WikiBABEL leverages the existing information arbitrage between different languages in Wikipedia, to provide a rough initial content in a given target language that may be corrected by the community for creating high-quality content in target language Wikipedia. Given the large disparities in content between different Wikipedias, and given the aspiration of many Wikipedia communities to improve their presence in Wikipedia, there may be sufficient interest in using such a methodology to create new content.

As shown in the above figure, WikiBABEL sits as a thin transparent edit layer (WikiBABEL CORE) on any Wiki site, in particular, Wikipedia. This layer integrates cloud-based services for discovery, linguistic and collaborative features that are supported in WikiBABEL. Specific modules may be designed for specific wiki-systems, say Wikipedia.
Current StatusAmong the first deployments of WikiBABEL is MSDNwiki - a Microsoft site that hosts user generated information for Microsoft developer communities, for creating information for specific demographics.WikiBhasha beta is released as an open source MediaWiki extension in http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/WikiBhasha/, with Javascript files under Apache 2 license and the PHP files uder GPL2 license.For immediate use, WikiBhasha beta is available as an installable bookmarklet from the WikiBhasha homepage, and also as a user-script (WikiBhasha) from WikiBhasha.MSR user. |
Publications
- A Kumaran, Naren Datha, Vikram Dendi, and Ashwani Sharma, WikiBhasha: OurExperiences with Multilingual Content Creation Tool for Wikipedia, in Proceedings of the Wikipedia India Conference 2011, Wikimedia Foundation, December 2011
- kumarana, narend, Ashwani Sharma, and Vikram Dendi, WikiBhasha:OurExperiences with Multilingual Content Creation Tool for Wikipedia, in Proceedings of Wikipedia Conference India, Wikimedia Foundation, November 2011
- A Kumaran, Naren Datha, B Ashok, K Saravanan, Anil Ande, Ashwani Sharma, Sridhar Vedantham, Vidya Natampally, Vikram Dendi, and Sandor Maurice, WikiBABEL: A System for Multilingual Wikipedia Content, in in Proceedings of the 'Collaborative Translation: technology, crowdsourcing, and the translator perspective' Workshop (co-located with AMTA 2010 Conference), Denver, Colorado, Association for Machine Translation in the Americas, 31 October 2010
- A Kumaran, Naren Datha, K Saravanan, Vikram Dendi, and Sandor Maurice, WikiBABEL: A Wiki-style Platform for Creation of Parallel Data, in the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL/IJCNLP-2009), Singapore, Singapore, Association for Computational Linguistics, August 2009
- A Kumaran, K Saravanan, and Sandor Maurice, WikiBABEL: Community Creation of Multilingual Data, in the WikiSYM 2008 Conference, Porto, Portugal, Association for Computing Machinery, Inc., September 2008
