Unsupervised Parts-of-Speech Induction for Bengali

We present a study of the word interaction networks of Bengali in the framework of complex networks. The topological properties of these networks reveal interesting insights into the morpho-syntax of the language, whereas clustering helps in the induction of the natural word classes leading to a principled way of designing POS tagsets. We compare different network construction techniques and clustering algorithms based on the cohesiveness of the word clusters. Cohesiveness is measured against two gold-standard tagsets by means of the novel metric of tag-entropy. The approach presented here is a generic one that can be easily extended to any language.

In  Proceedings of LREC 2008

Publisher  European Language Resources Association
Printed / Distributed with the permission of ELRA. This paper was published within the proceedings of the LREC'2008 Conference. © 2007 ELRA - European Language Resources Association. All rights reserved.

Details

TypeInproceedings
URLhttp://www.lrec-conf.org/proceedings/lrec2008/pdf/309_paper.pdf
> Publications > Unsupervised Parts-of-Speech Induction for Bengali