Social Text Normalization using Contextual Graph Random Walks

Hany Hassan; Arul Menezes

Social Text Normalization using Contextual Graph Random Walks

Hany Hassan ,
Arul Menezes

The 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013) | August 2013

Published by Association for Computational Linguistics

Download BibTex

We introduce a social media text normalization system that can be deployed as a preprocessing step for Machine Translation and various NLP applications to handle social media text. The proposed system is based on unsupervised learning of the normalization equivalences from unlabeled text. The proposed approach uses Random Walks on a contextual similarity bipartite graph constructed from n-gram sequences on large unlabeled text corpus. We show that the proposed approach has a very high precision of (92.43) and a reasonable recall of (56.4). When used as a preprocessing step for a state-of-the-art machine translation system, the translation quality on social media text improved by 6%. The proposed approach is domain and language independent and can be deployed as a preprocessing step for any NLP application to handle social media text.