Exploring the Community Structure of Newsgroups

Proceedings of the 10th ACM SIGKDD International Conference on Knowledge, Discovery and Data Mining (KKD) |

We propose to use the community structure of Usenet for organizing and retrieving the information stored in newsgroups. In particular, we study the network formed by crossposts, messages that are posted to two or more newsgroups simultaneously. We present what is, to our knowledge, by far the most detailed data that has been collected on Usenet cross-postings. We analyze this network to show that it is a small-world network with signi ficant clustering. We also present a spectral algorithm which clusters newsgroups based on the cross-post matrix. The result of our clustering provides a topical classi fication of newsgroups. Our clustering gives many examples of significant relationships that would be missed by semantic clustering methods.