SIGMA:Large Scale Machine Learning Toolkit

The goal of this project is to provide a group of parallel machine learning functionalities which can meet the requirements of research work and applications typically with large scale data/features. The toolkit includes but not limited to: classification, clustering, Ranking, statistical analysis, etc and makes them run on hundreds of machines, thousands of CPU cores parallel. We also provide a SDK for researchers/developers to invent their own algorithms and accumulate them into the toolkit.

If you are inside microsoft, you may get the latest information from: http://msraml/projects/Wiki%20Pages/(SIGMA)%20Large%20Scale%20Machine%20Learning%20Toolkit.aspx


If you access from public internet, you may download our toolkit from SIGMA download URL

And here is the algorithms supported in our projects and we also shared ten of the most popular algorithms in the above toolkit.

  • Parallel Classification
    • Logistic Regression
    • Boosting
    • SVM
      • PSVM
      • PPegasos
    • Neural Network
  • Parallel Ranking
    • LambdaRank
    • RankBoost
  • Parallel Clustering
    • Kmeans
    • Random Walk
  • Parallel Regression
    • Linear Regression
    • Regression Tree
  • Others
    • Parallel-Regularized-SVD
    • Parallel-LDA
  • Optimization Library
    • OWL-QN