4 June 2012
Many studies have shown that social data such as tweets are a rich source of information about the real-world including, for example, insights into health trends. A key limitation when analyzing twitter data, however, is that it depends on people self-reporting their own behaviors and observations. In this paper, we present a large-scale quantitative analysis of some of the factors that influence self-reporting bias. In our study, we compare a year of tweets about weather events to ground-truth knowledge about actual weather occurrences. For each weather event we calculate how extreme, how expected, and how big a change the event represents. We calculate the extent to which these factors can explain the daily variations in tweet rates about weather events. We find that we can build global models that take into account basic weather information, together with extremeness, expectation and change calculations to account for over 40% of the variability in tweet rates. We build location-specific (i.e., a model per each metropolitan area) models that account for an average of 70% of the variability in tweet rates.
In The 6th Intl. Conference on Weblogs and Social Media (ICWSM)
Publisher American Association for Artificial Intelligence