Munmun de Choudhury, Scott Counts, and Czerwinski
July 2011
Social media and networking websites, such as Twitter and
Facebook, generate large quantities of information and have
become mechanisms for real-time content dissipation to
users. An important question that arises is: how do we sample
such social media information spaces in order to deliver relevant
content on a topic to end users? Notice that these largescale
information spaces are inherently ‘diverse’, featuring a
wide array of attributes such as location, recency, degree of
diffusion effects in the network and so on. Naturally, for the
end user, different levels of diversity in social media content
can significantly impact the information consumption experience:
low diversity can provide focused content that may
be simpler to understand, while high diversity can increase
breadth in the exposure to multiple opinions and perspectives.
Hence to address our research question, we turn to diversity
as a core concept in our proposed sampling methodology.
Here we are motivated by ideas in the “compressive sensing”
literature and utilize the notion of sparsity in social media
information to represent such large spaces via a small number
of basis components. Thereafter we use a greedy iterative
clustering technique on this transformed space to construct
samples matching a desired level of diversity. Based
on Twitter Firehose data, we demonstrate quantitatively that
our method is robust, and performs better than other baseline
techniques over a variety of trending topics. In a user
study, we further show that users find samples generated by
our method to be more interesting and subjectively engaging
compared to techniques inspired by state-of
![]() PDF file |
Publisher Int'l AAAI Conference on Weblogs and Social Media
| Type | Proceedings |