End-to-End Crowdsourcing Approach for Unbiased High-Quality Transcription of Speech

HCOMP, Proc. HCOMP |

Published by AAAI - Association for the Advancement of Artificial Intelligence

We present an end-to-end implementation of a crowdsourcing speech transcription pipeline that aims at achieving multiple goals including high transcription fidelity, minimal bias towards machine-generated recognition hypotheses and low cost. Our approach consists of two stages: unassisted transcription and variant selection. Each stage is realized as an iterative process where opinions are solicited from judges as long as no reliable decision regarding final utterance transcription can be made. Acknowledging possible ambiguity of the hypothesis space, our final consensus hypotheses can comprise several alternative transcriptions for each utterance merging them into a single word confusion network. Using lexicographic transcription task for Microsoft Cortana, we show that our approach produces low cost transcriptions that are superior even to the professional transcriptions in terms of exposure bias, accuracy and latency.