Model-Based Fusion of Bone and Air Sensors for Speech Enhancement and Robust Speech Recognition

  • John Hershey ,
  • Trausti Kristjansson ,
  • Zheng Zhang

Proc. ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio Processing |

We present a probabilistic framework that uses a bone sensor and air microphone to perform speech enhancement for robust speech recognition. The system exploits advantages of both sensors: the noise resistance of the bone sensor, and the linearity of the air microphone. In this paper we describe the general properties of the bone sensor relative to conventional air sensors. We propose a model capable of adapting to the noise conditions, and evaluate performance using a commercial speech recognition system. We demonstrate considerable improvements in recognition – from a baseline of 57% up to nearly 80% word accuracy – for four subjects on a difficult condition with background speaker interference.