Flávio Ribeiro, Dinei Florencio, Cha Zhang, and Michael Seltzer
MOS (mean opinion score) subjective quality studies are used to evaluate many signal processing methods. Since laboratory quality studies are time consuming and expensive, researchers often run small studies with less statistical significance or use objective measures which only approximate human perception. We propose a cost-effective and convenient measure called crowdMOS, obtained by having internet users participate in a MOS-like listening study. Workers listen and rate sentences at their leisure, using their own hardware, in an environment of their choice. Since these individuals cannot be supervised, we propose methods for detecting and discarding inaccurate scores. To automate crowdMOS testing, we offer a set of freely distributable, open-source tools for Amazon Mechanical Turk, a platform designed to facilitate crowdsourcing. These tools implement the MOS testing methodology described in this paper, providing researchers with a user-friendly means of performing subjective quality evaluations without the overhead associated with laboratory studies. Finally, we demonstrate the use of crowdMOS using data from the Blizzard text-to-speech competition, showing that it delivers accurate and repeatable results.