The Symmetric Subspace Gaussian Mixture Model

  • Daniel Povey

MSR-TR-2010-138 |

This document describes an extension of the Subspace Gaussian Mixture Model (SGMM). The extension is a symmetrization of the model, which makes the speaker and speech-state subspaces behave in the same way. The difference relates to the way the Gaussian weights within the substates are handled: now they depend on the speaker vector as well as the speech-state vector. This requires a little more per-speaker computation (to compute certain per-speech-state normalizing factors), but the main cost is in additional memory. The memory consumed by the model is almost doubled as we need to store in memory a new precomputed quantity. However, this method gives quite respectable WER improvements and it seems likely that it would give even greater WER improvements in situations where the number of Gaussians per speech-state is larger (i.e., with more data).