Empirical Properties of Multilingual Phone-to-Word Transduction

G. Zweig and J. Nedel


This paper explores the error-robustness of phone-to-word transductionfl

across a variety of languages. We implement a noisyfl

channel model in which a phonetic input stream is corruptedfl

by an error model, and then transduced back to words usingfl

the inverse error model and linguistic constraints. By controllingfl

the error level, we are able to measure the sensitivityfl

of different languages to degradation in the phonetic inputfl

stream. This analysis is carried further to measure the importancefl

of each phone in each language individually. We studyfl

Arabic, Chinese, English, German and Spanish, and find thatfl

they behave similarly in this paradigm: in each case, a phonefl

error produces about 1.4 word errors, and frequently incorrectfl

phones matter slightly less than others. In the absence offl

phone errors, transduced word errors are still present, and wefl

use the conditional entropy of words given phones to explainfl

the observed behavior.


Publication typeInproceedings
Published inIn Proceedings of ICASSP
> Publications > Empirical Properties of Multilingual Phone-to-Word Transduction