Xiaodong He, Li Deng, Dilek Hakkani-Tur, and Gokhan Tur
Given the increasingly available machine translation (MT) services nowadays, one efficient strategy for cross-lingual spoken language understanding (SLU) is to first translate the input utterance from the second language into the primary language, and then call the primary language SLU system to decode the semantic knowledge. However, errors introduced in the MT process create a condition similar to the “mismatch” condition encountered in robust speech recognition. Such mismatch makes the performance of cross-lingual SLU far from acceptable. Motivated by successful solutions developed in robust speech recognition, we in this paper propose a multi-style adaptive training method to improve the robustness of the SLU system for cross-lingual SLU tasks. For evaluation, we created an English-Chinese bilingual ATIS database, and then carried out a series of experiments on that database to experimentally assess the proposed methods. Experimental results show that, without relying on any data in the second language, the proposed method significantly improves the performance on a cross-lingual SLU task while producing no degradation for input in the primary language. This greatly facilitates porting SLU to as many languages as there are MT systems without any human effort. We further study the robustness of this approach to another type of mismatch condition, caused by speech recognition errors, and demonstrate its success also.
Publisher IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)