Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
How difficult is it to develop a perfect spell-checker? A cross-linguistic analysis through complex network approach

Monojit Choudhury, Markose Thomas, Animesh Mukherjee, Anupam Basu, and Niloy Ganguly

Abstract

The difficulties involved in spelling error detection and correction in a language have been investigated in this work through the conceptualization of SpellNet – the weighted network of words, where edges indicate orthographic proximity between two words. We construct SpellNets for three languages - Bengali, English and Hindi. Through appropriate mathematical analysis and/or intuitive justification, we interpret the different topological metrics of SpellNet from the perspective of the issues related to spell-checking. We make many interesting observations, the most significant among them being that the probability of making a real word error in a language is propotionate to the average weighted degree of SpellNet, which is found to be highest for Hindi, followed by Bengali and English.

Details

Publication typeInproceedings
Published inProceedings of HLT-NAACL Workshop - TextGraphs 2
URLhttp://aclweb.org/anthology-new/W/W07/W07-0212.pdf
Pages81-88
PublisherAssociation for Computational Linguistics
> Publications > How difficult is it to develop a perfect spell-checker? A cross-linguistic analysis through complex network approach