Ulrich Paquet, Jurgen Van Gael, David Stern, Gjergji Kasneci, Ralf Herbrich, and Thore Graepel
Many online service systems leverage user-generated content from Web 2.0 style platforms such asWikipedia, Twitter, Facebook, and many more. Often, the value lies in the freshness of this information (e.g. tweets, event-based articles, blog posts, etc.). This freshness poses a challenge for supervised learning models as they frequently have to deal with previously unseen features.
In this paper we address the problem of online classification for tweets, namely, how can a classifier be updated in an online manner, so that it can correctly classify the latest “hype” on Twitter? We propose a two-step strategy to solve this problem. The first step follows an active learning strategy that enables the selection of tweets for which a label would be most useful; the selected tweet is then forwarded to Amazon Mechanical Turk where it is labeled by multiple users. The second step builds on a Bayesian corroboration model that aggregates the noisy labels provided by the users by taking their reliabilities into account.
In Computational Social Science and the Wisdom of Crowds Workshop (colocated with NIPS 2010)