Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Distributed Nonnegative Matrix Factorization for Web-Scale Dyadic Data Analysis on MapReduce

Chao Liu, Hung-chih Yang, Jinliang Fan, Li-Wei He, and Yi-Min Wang

Abstract

The Web abounds with dyadic data that keeps increasing by every single second. Previous work has repeatedly shown the usefulness of extracting the interaction structure inside dyadic data [21, 9, 8]. A commonly used tool in extracting the underlying structure is the matrix factorization, whose fame was further boosted in the Netflix challenge [26]. When we were trying to replicate the same success on real-world Web dyadic data, we were seriously challenged by the scalability of available tools. We therefore in this paper report our efforts on scaling up the nonnegative matrix factorization (NMF) technique. We show that by carefully partitioning the data and arranging the computations to maximize data locality and parallelism, factorizing a tens of millions by hundreds of millions matrix with billions of nonzero cells can be accomplished within tens of hours. This result effectively assures practitioners of the scalability of NMF on Web-scale dyadic data.

Details

Publication typeInproceedings
Published inWWW'10: Proceedings of the 19th International World Wide Web Conference
PublisherAssociation for Computing Machinery, Inc.
> Publications > Distributed Nonnegative Matrix Factorization for Web-Scale Dyadic Data Analysis on MapReduce