Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Decomposition of Speech and Sound into Modulators and Carriers

Speaker  Les Atlas

Affiliation  University of Washington

Host  Arjmand Samuel

Duration  00:58:22

Date recorded  24 September 2012

…musical tones are the simpler and more regular elements of the sensations of hearing, and that we have consequently first to study the laws and peculiarities of this class of sensations.

— Hermann von Helmholtz, On the Sensations of Tone as a Physiological Basis for the Theory of Music, 2nd English Edition (A. Ellis), translated from the 4th German Edition of 1877, Longman Green, London, 1885, Page 7.

It has been 135 years since this passage was written, yet we still have no formal foundation for going beyond what Helmholtz brilliantly saw as the building blocks he called “musical tones,” which we now simply call “frequency.” Helmholtz also saw that “beats of simple tones” and “beats due to combinational tones” or “differential tones” [op. cit., Page 159.] formed sum and difference beats. We now call the generalization of this effect “modulations” or “envelopes.” Since the time of Helmholtz, science and technology has developed radio and then very high-speed digital communications, revolutionizing the way we now live. Concepts from the 1920’s to 1930’s AM and FM radio communications still provide a perhaps outdated foundation. Researchers conventionally model the above modulations as “envelopes,” which multiply “carriers” or, equivalently, “temporal fine structure.” These envelopes as typically derived after subband filtering, are Hilbert envelopes, squared and lowpass filtered real envelopes, or, with perhaps the closest connection to physiology, rectified and lowpass filtered real envelopes. Yet as will be argued, science’s current foundation and methods for envelopes and temporal fine structure is still not as advanced as Helmholtz was with single tones and harmonics. Our talk will begin with demonstrations of simple two-complexes which have identical envelopes yet sound obviously different. We will show, assuming sufficiently low rate envelopes, how important it is to remove this ambiguity, especially for speech. We will then suggest how a novel modulator/carrier decomposition, which takes into account the common types of dynamic content seen in speech and sound, counteracts this ambiguity. New conceptual results, in conjunction with auditory filters taking the role of frequency multiplexing as in OFDM in modern high speech data communications, raise new questions about potent roles of temporal fine structure in everyday audio and speech. These results suggest novel features for recognition of speech in noise, reverberation, and/or multiple simultaneous talkers.

©2012 Microsoft Corporation. All rights reserved.
> Decomposition of Speech and Sound into Modulators and Carriers