Automatic Speech Recognition of Co-Channel Speech: Integrated Speaker and Speech Recognition Approach

  • Larry Heck ,
  • Mark Z. Mao

Interspeech |

This paper presents a novel Bayesian approach to the problem of speech recognition of co channel speech – also known as the “cocktail party” problem. The problem is formulated as the joint maximization of the a posteriori probability of the word sequence and the target speaker given the observed speech signal. It is shown that the joint probability can be expressed as the product of six terms: a likelihood score from a speaker-independent speech recognizer, the (normalized) likelihood score of a speaker recognizer, the likelihood of a sequence of prosodic events, the likelihood of a speaker-dependent statistical language model, a prior representing the channel usage patterns of a speaker, and the prior probability of the speaker. An efficient single-pass Viterbi search strategy is presented. Experimental results on over-the telephone recognition of co-channel speech show a 45% reduction in word error rate of a 10-digit telephone number task