Zhi-Jie Yan, Teng Gao, and Qiang Huo
31 March 2012
This paper presents the design of an MPI-based parallel and distributed machine learning platform on large-scale HPC clusters. Researchers and practitioners can implement easily a class of parallelizable machine learning algorithms on the platform, or port quickly an existing non-parallel implementation of a parallelizable algorithm to the platform with only minor modifications. Complicated functions in parallel programming such as scheduling, caching and load balancing are handled automatically by the platform. The platform performance was evaluated in a series of stress tests by using a k-means clustering task on 7,500 hours of speech data (about 2.7 billion 52-dimensional feature vectors). Good scalability is demonstrated on an HPC cluster with thousands of CPU cores.
In International Workshop on Statistical Machine Learning for Speech Processing, IWSML 2012