Automatic choice of dimensionality for PCA

Tom Minka
NIPS 2000
Longer version available as MIT Media Lab tech report 514

A central issue in principal component analysis (PCA) is choosing the number of principal components to be retained. By interpreting PCA as density estimation, this paper shows how to use Bayesian model selection to determine the true dimensionality of the data. The resulting estimate is simple to compute yet guaranteed to pick the correct dimensionality, given enough data. In simulations, it is more accurate than cross-validation and other proposed algorithms, plus it runs much faster.

paper (long version with corrections, Sept 2 2008) MIT Media Lab tech report (2000)

Software: laplace_pca.zip

Tom Minka

Last modified: Wed Sep 21 18:02:54 GMT 2005