Kemal Sonmez, Larry Heck, and Mitchel Weintraub
We describe a speaker tracking and detection system, for Switchboard conversations, that uses a two-speaker and silence hidden Markov model(HMM) with a minimum state duration constraint and Gaussian mixture model (GMM) state distributions adapted from a single gender and handset independent imposter model distribution. Speaker tracking is used to segment speakers for detection, which is carried out by averaging frame scores of the Viterbi path and HNORM’ via a novel parameter interpolation extension of HNORM for use with ﬁles of arbitrary lengths. Use of duration statistics augmenting the acoustic scores is also introduced via a nonlinear combination function. Results are reported on the NIST 1998 Multi-speaker development evaluation dataset.