Cha Zhang, Dinei Florencio, and Zhengyou Zhang
In distributed meeting applications, microphone arrays have been widely used to capture superior speech sound and perform speaker localization through sound source localization (SSL) and beamforming. This paper presents a unified maximum likelihood framework of these two techniques, and demonstrates how such a framework can be adapted to create efficient SSL and beamforming algorithms for reverberant rooms and unknown directional patterns of microphones. The proposed method is closely related to steered response power-based algorithms, which are known to work extremely well in real-world environments. We demonstrate the effectiveness of the proposed method on challenging synthetic and real-world datasets, including over 6 hours of recorded meetings.
In IEEE Transactions on Multimedia
© 2008 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. http://www.ieee.org/