Nihar B. Shah and Dengyong Zhou
Many fields of science and engineering, ranging from predicting protein structures to building machine translation systems, require large amounts of labeled data. These labeling tasks have traditionally been performed by experts; the limited pool of experts would limit the size of the datasets, and make the process slow and expensive. In recent years, there is a rapidly increasing interest in using crowds of semi-skilled workers recruited through the Internet. While this 'crowdsourcing' can cheaply produce large amounts of labeled data in short times, it is typically plagued by the problem of low quality. To address this fundamental challenge in crowdsourcing, we design a novel reward mechanism for acquiring high-quality data, which incentivizes workers to censor their own low-quality data. Our main results are the mathematical proofs showing that surprisingly, under a natural and desirable 'no-free-lunch' requirement, this is the one and only mechanism that is incentive-compatible. The simplicity of the mechanism is an additional attractive property. In preliminary experiments involving over 900 worker-tasks, we observe upto a three-fold drop in the error rates under this unique incentive mechanism.