Saving Money While Polling with InterPoll using Power Analysis

Crowd-sourcing is increasingly being used for large-scale polling and surveys. Companies such as SurveyMonkey and make crowd-sourced surveys commonplace by making the crowd accessible through an easy-to-use UI and easy to retrieve results. Further, they do so with a relatively low latency by having dedicated crowds at their disposal.

In this paper we argue that the ease with which polls can be created conceals an inherent difficulty: the survey maker does not know how many workers to hire for their survey. Asking too few may lead to samples sizes that ``do not look impressive enough.'' Asking too many clearly involves spending extra money, which can quickly become costly. Existing crowd-sourcing platforms do not provide help with this, neither, one can argue, do they have any incentive to do so.

In this paper, we present a systematic approach to determining how many samples (i.e. workers) are required to achieve a certain level of statistical significance by showing how to automatically perform power analysis on questions of interest. Using a range of queries we demonstrate that power analysis can save significant amounts of money and time by often concluding that only a handful of results are required to arrive at a decision.

We have implemented our approach within \tool, a programmable developer-driven polling system that uses a generic crowd (Mechanical Turk) as a back-end. \tool automatically performs power analysis by analyzing both the structure of the \emph{query} and the \emph{data} that it dynamically polls from the crowd. In all of our studies we obtain statistically significant results for under~\$30, with most costing less than~\$10. Our approach saves both time and money for the survey maker.