Share this page
Share this page E-mail this page Print this page RSS feeds
Home > Publications > Word clustering with parallel spoken language corpora
Word clustering with parallel spoken language corpora

We introduce a word clustering algorithm which uses a bilingual, parallel corpus to group together words in the source and target language. Our method generalizes previous mutual information clustering algorithms for monolingual data by incorporating a statistical translation model. Preliminary experiments have shown that the algorithm can effectively employ the constraints implicit in bilingual data to extract classes which are well suited to machine translation tasks.

1996-yeyiwang-icslp.pdf
PDF file

In: Fourth International Conference on Spoken Language Processing

Publisher: International Speech Communication Association
© 2007 ISCA. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the ISCA and/or the author.

Details

Type: Inproceedings
Pages: 2364 - 2367
Volume: 4
Address: Philadelphia, PA, USA