This thesis presents a principled approach to clustering by formulating the problem in a probabilistic autoencoder framework which is based on folded Markov chains. As a suitable optimization technique deterministic annealing is introduced, which performs robust optimization based on an analogy to the cooling of a system in statistical physics. Application of deterministic annealing to the derived clustering cost functions leads to three algorithms: soft topographic vector quantization (STVQ), which performs topographic clustering on Euclidean feature vectors, kernel-based soft topographic mapping (STMK), which allows to do the clustering in a high dimensional Euclidean feature space by the application of the kernel trick, and soft topographic mapping for proximity data (STMP), which generalizes STVQ to arbitrary pairwise dissimilarity data in a mean field fashion. All three algorithms are analysed w.r.t. the annealing process and their application is demonstrated on both artificial and real world data.