Most people think of speech synthesis as having your computer speak to you. Proofing data entry, reading files and speaking prompts have been typical applications for speech synthesizers. Although synthesis technology is well suited for these traditional operations, here at Microsoft Research we are continually exploring new and exciting applications of our base technologies.
In addition to speaking, another popular use of human speech is singing. During the past 50 years, music synthesizers have developed to where they can imitate almost any acoustic instrument. Any acoustic instrument, that is, except for the most popular instrument - human vocals. "And there's a good reason for this, singing is the most complex and dynamic of all musical instruments", says Mark Cecys, the researcher who worked on this project. With the recent advances in computing and speech technology, we are finally moving beyond this limitation. Besides playing the instrumental parts, music synthesizers can now begin to sing the lyrics.
The Whistler Music Synthesizer
To demonstrate the potential of Microsoft's Whistler speech technology for musical applications, a novel music synthesizer was designed. Running in real-time on Win32, the Whistler speech engine was combined with a software wavetable synthesizer. The wavetable synthesizer plays the instrumental accompaniment while Whistler sings the lyrics. The notes and lyrics are entered using a commercial MIDI editor then exported as a Standard MIDI File to the synthesizer for fine tuning and musical playback.
The following examples use Whistler’s stock "Mark" and "Melanie" voices for all the vocals. A key feature of Whistler technology is modeling the particular characteristics of real human speakers. In other words, after analyzing a specific speaker’s voice, Whistler can faithfully reproduce the voice characteristics, sounding very close to the original speaker (or singer!).
Although the synthesizer output is 16-bits, 44.1 kHz stereo, to reduce download time, all examples have been scaled back to 8-bits, 22 kHz mono.