Thore Graepel, Matthias Burger, and Klaus Obermayer
We describe the development of neighborhood-preserving stochastic maps in terms of a probabilistic clustering problem. Starting from a cost function for central clustering that incorporates distortions from channel noise we derive a soft topographic vector quantization algorithm (STVQ) which is based on the maximum entropy principle and which maximizes the corresponding likelihood in an expectation-maximization (EM) fashion. Among other algorithms a probabilistic version of Kohonen's self-organizing map (SOM) is derived from STVQ as a computationally efficient approximation of the E-step. The foundation of STVQ in statistical physics motivates a deterministic annealing scheme in the temperature parameter β, and leads to a robust minimization algorithm of the clustering cost function. In particular, this scheme offers an alternative to the common stepwise shrinking of the neighborhood width in the SOM and makes it possible to use its neighborhood function solely to encode the desired neighborhood relations between the clusters. The annealing in β, which corresponds to a stepwise refinement of the resolution of representation in data space, leads to the splitting of an existing cluster representation during the "cooling" process. We describe this phase transition in terms of the covariance matrix C of the data and the transition matrix H of the channel noise and calculate the critical temperatures and modes as functions the eigenvalues and eigenvectors of C and H. The analysis is extended to the phenomenon of the automatic selection of feature dimensions in dimension-reducing maps, thus leading to a "batch"-alternative to the Fokker-Planck formalism for on-line learning. The results provide insights into the relation between the width of the neighborhood and the temperature parameter beta: It is shown that the phase transition which leads to the representation of the excess-dimensions can be triggered not only by a change in the statistics of the input data but also by an increase of beta, which corresponds to a decrease in noise level. The theoretical results are validated by numerical methods. In particular, a quantity equivalent to the heat capacity in thermodynamics is introduced to visualize the properties of the annealing process.
|Published in||Physical Review E|