Text-to-Audiovisual Speech Synthesizer
Abstract:
This paper describes a text-to-audiovisual speech synthesizer system
incorporating the head and eye movements. The face is modeled using a set
of images of a human subject. Visemes, that are a set of lip images of the
phonemes, are extracted from a recorded video. A smooth transition between
visemes is achieved by morphing along the correspondence between the
visemes obtained by optical flows. This paper also describes methods for
introducing nonverbal mechanisms in visual speech communication such as
eye blinks and head nods. For eye movements, a simple mask based approach
is used. View morphing is used to generate the head movement. A complete
audiovisual sequence is constructed by concatenating the viseme
transitions and synchronizing the visual stream with the audio stream. An
effort has been made to integrate all these features into a single system,
which takes text, head and eye movement parameters and produces the
audiovisual stream.