Did You Say U2 or Youtube? Inferring Implicit Transcripts from Voice Search Logs

Proceedings of the 25th International World Wide Web Conference (WWW2016) |

Published by ACM - Association for Computing Machinery

Web search via voice is becoming increasingly popular, taking advantage of recent advances in automatic speech recognition. Speech recognition systems are trained using audio transcripts, which can be generated by a paid annotator listening to some audio and manually transcribing it. This paper considers an alternative source of training data for speech recognition, called implicit transcription. This is based onWeb search clicks and reformulations, which can be interpreted as validating or correcting the recognition done during a real Web search. This can give a large amount of free training data that matches the exact characteristics of real incoming voice searches and the implicit transcriptions can better reflect the needs of real users because they come from the user who generated the audio. On an overall basis we demonstrate that the new training data has value in improving speech recognition. We further show that the in-context feedback from real users can allow the speech recognizer to exploit contextual signals, and reduce the recognition error rate further by up to 23%.