Data Acquisition and Reconstruction for 3D Talking Head

Binocular Photometric Stereo Acquisition and Reconstruction for 3D Talking Head Applications

(Interspeech 2013 submission)

Chaoyang Wang, Lijuan Wang, Yasuyuki Matsushita, Frank K. Soong


In order to render a high quality, versatile 3D talking head, a stable, high frame rate AV data acquisition system is constructed. It can capture 3D position, surface orientation and albedo texture of the talking head video images along with the corresponding speech signals. The system consists of a computer controlled LED lighting subsystem; high speed stereo cameras; a microphone; and a computer for synchronous recording of multi-stream AV data. The visual image data collected is processed through a binocular photometric stereo 3D reconstruction pipeline. The pipeline automatically segments out the face; computes the depth map with binocular stereo; computes the normal map with photometric stereo; generates albedo texture; and finally constructs a high-detailed 3d model with depth and normal cues as constraints. By using the data collected with the built system, we can construct a high quality, dynamic 3D talking head model sequences to faithfully represent the actual 3D talking head, synchronized with the subject’s uttered speech. Furthermore, with enough thus collected 3D talking head sequence data, statistical talking head Hidden Markov Models (HMMs) can be trained to render high quality, 3D talking head for any given text or speech input.

