Empirical Properties of Multilingual Phone-to-Word Transduction

G. Zweig and J. Nedel

Abstract

This paper explores the error-robustness of phone-to-word transductionfl across a variety of languages. We implement a noisyfl channel model in which a phonetic input stream is corruptedfl by an error model, and then transduced back to words usingfl the inverse error model and linguistic constraints. By controllingfl the error level, we are able to measure the sensitivityfl of different languages to degradation in the phonetic inputfl stream. This analysis is carried further to measure the importancefl of each phone in each language individually. We studyfl Arabic, Chinese, English, German and Spanish, and find thatfl they behave similarly in this paradigm: in each case, a phonefl error produces about 1.4 word errors, and frequently incorrectfl phones matter slightly less than others. In the absence offl phone errors, transduced word errors are still present, and wefl use the conditional entropy of words given phones to explainfl the observed behavior.

Details

Publication typeInproceedings
Published inIn Proceedings of ICASSP
> Publications > Empirical Properties of Multilingual Phone-to-Word Transduction