Algorithmic Crowdsourcing

Established: February 1, 2012

To build a machine learning based intelligent system, we often need to collect training labels and feed them into the system. A useful lesson in machine learning is that “more data beats a clever algorithm”. In the current days, through a commercial crowdsourcing platform, we can easily collect a large amount of labels at a cost of pennies per label.

However, the labels obtained from crowdsourcing may be highly noisy. Training a machine learning model with highly noisy labels can be misleading. This is widely known as “garbage in, garbage out”. There are two main reasons on label noise. One is that crowdsourcing workers may not have expertise on a labeling task, and the other is that crowdsourcing workers may have no incentives to produce high quality labels.

Our goal in this project to develop principled inference algorithms and incentive mechanisms to guarantee high quality labels from crowdsourcing in practice.

Contact person: Denny Zhou

People

Portrait of John  Platt

John Platt

Principal Scientist

Google

Portrait of Xi  Chen

Xi Chen

Intern

CMU

Portrait of Nihar  Shah

Nihar Shah

Intern

UC Berkeley

Portrait of Qiang  Liu

Qiang Liu

Visiting Scholar

Dartmouth

Portrait of Chao  Gao

Chao Gao

Intern

Yale

Portrait of Tengyu Ma

Tengyu Ma

Visiting Scholar

Princeton