Find Me the Right Content! Diversity-based Sampling of Social Media Content for Topic-centric Search.

Munmun De Choudhury; Scott Counts; Czerwinski; Mary Czerwinski

Find Me the Right Content! Diversity-based Sampling of Social Media Content for Topic-centric Search.

Munmun De Choudhury ,
Scott Counts ,
Czerwinski ,
Mary Czerwinski

Int'l AAAI Conference on Weblogs and Social Media | July 2011

Download BibTex

Social media and networking websites, such as Twitter and Facebook, generate large quantities of information and have become mechanisms for real-time content dissipation to users. An important question that arises is: how do we sample such social media information spaces in order to deliver relevant content on a topic to end users? Notice that these large scale information spaces are inherently ‘diverse’, featuring a wide array of attributes such as location, recency, degree of diffusion effects in the network and so on. Naturally, for the end user, different levels of diversity in social media content can signiﬁcantly impact the information consumption experience: low diversity can provide focused content that may be simpler to understand, while high diversity can increase bread thin the exposure to multiple opinions and perspectives. Hence to address our research question, we turn to diversity as a core concept in our proposed sampling methodology. Here we are motivated by ideas in the “compressive sensing” literature and utilize the notion of sparsity in social media information to represent such large spaces via a small number of basis components. Thereafter we use a greedy iterative clustering technique on this transformed space to construct samples matching a desired level of diversity. Based on Twitter Firehose data, we demonstrate quantitatively that our method is robust, and performs better than other baseline techniques over a variety of trending topics. In a user study, we further show that users ﬁnd samples generated by our method to be more interesting and subjectively engaging compared to techniques inspired by state-of-the-art systems, with improvements in the range of 15–45%.