Nonlinear Information Fusion in Multi-Sensor Processing - Extracting and Exploiting Hidden Dynamics of Speech Captured by a Bone-Conductive Microphone

Li Deng, Zicheng Liu, Zhengyou Zhang, and Alex Acero

Abstract

One well-known difficulty in creating effective human-machine interface

via the speech input is the adverse effects of concurrent

acoustic noise. To overcome this challenge, we have developed

a joint hardware and software solution. A novel bone-conductive

microphone is integrated with a regular air-conductive one in a

single headset. These two simultaneous sensors capture distinct

signal properties in the speech embedded in acoustic noise. The

focus of this paper is exploration of the type of dynamic properties

that are relatively invariant between the bone-conductive sensor’s

signal and the clean speech signal; the latter would not be available

to the recognizer. Our approach is based on a nonlinear processing

technique that estimates the unobserved (hidden) vocal tract

resonances, as a representation of such invariant hidden dynamics,

from the available bone-sensor signal. The information about

these dynamic aspects of the clean speech is then fused with other

noisy measurements to aim at improving the recognition system’s

robustness to acoustic distortion. The fusion technique is based

on a combination of three sets of signals including the synthesized

speech signal using the vocal tract resonance dynamics extracted

nonlinearly from the bone-sensor signal.

Details

Publication typeInproceedings
Published inProc. of the IEEE Workshop on Multimedia Signal Processing
PublisherInstitute of Electrical and Electronics Engineers, Inc.
> Publications > Nonlinear Information Fusion in Multi-Sensor Processing - Extracting and Exploiting Hidden Dynamics of Speech Captured by a Bone-Conductive Microphone