Ece Kamar, Severin Hacker, Chris Lintott, and Eric Horvitz
4 June 2012
Researchers working within the maturing subdiscipline of "human computation" have been exploring opportunities to harness programmatic access to human abilities and intelligence via platforms for crowdsourcing. We present principles and algorithms for weaving together contributions from human and machine intelligence, including ideal fusion of efforts from multiple workers, the guiding of hiring decisions, the routing of tasks, and on decisions about when to halt efforts on a task at some degree of refinement, based on considerations of expected utility. We focus on consensus tasks and show how machine learning and inference can be harnessed to leverage the complementary strengths of humans and computational components to solve consensus tasks. We discuss the training of probabilistic models from data and the use of the models to predict the contributions of workers and to also fuse information about the target task or world state of interest. We show how the models can be used to learn about the reliability and competency of individual contributors and how these models can be used in the routing of tasks and the fusing of human and machine contributions. We experiment with multiple planning procedures for guiding decisions on hiring and routing tasks to workers so as to maximize the efficiency of large-scale crowdsourcing processes based on expected utility. In response to the challenges of computing expected value over long evidential sequences, we introduce a new Monte-Carlo procedure for computing the expected value of hiring that employs multiple inferences in concert from the predictive models. The procedure leverages the special structure of consensus tasks for cutting through intractability and provides an efficient handle on a large search space. We put all of the pieces together to describe an operating platform for crowdsourcing that we call CrowdSynth. We demonstrate the operation of this framework with its machine learned models on a large-scale real-world citizen science effort called Galaxy Zoo. The machine learned models for Galaxy Zoo tasks combine the efforts of people and machine vision on the task of classifying celestial bodies defined within the Galaxy Zoo citizen science project. We describe multiple experimental studies with Galaxy Zoo. The evaluations show how the proposed methodology optimizes the allocation of workers to crowdsourcing tasks for different worker costs and can significantly reduce the resources needed to solve crowdsourcing tasks accurately.