Applying Physiologically-Motivated Models of Auditory Processing to Automatic Speech Recognition: Promise, Progress, and Problems

For many years the human auditory system has been an inspiration for developers of automatic speech recognition systems because of its ability to interpret speech accurately in a wide variety of difficult acoustical environments. This talk will discuss the application of physiologically-motivated and psychophysically-motivated approaches to signal processing that facilitates robust automatic speech recognition. The talk will begin by reviewing selected aspects of auditory processing that are believed to be especially relevant to speech perception, and that had been components of signal processing schemes that were proposed in the 1980s. We will review and discuss the motivation for, and the structure of, classical and contemporary computational models of auditory processing that have been applied to speech recognition, and we will evaluate and compare their impact on improving speech recognition accuracy. Finally, we will discuss some of the reasons why we believe that progress to date has been limited, and share insights that we have gleaned about auditory processing from recent work at Carnegie Mellon.

Speaker Details

Richard M. Stern received the S.B. degree from the Massachusetts Institute of Technology in 1970, the M.S. from the University of California, Berkeley, in 1972, and the Ph.D. from MIT in 1977, all in electrical engineering. He has been on the faculty of Carnegie Mellon University since 1977, where he is currently a Professor in the Electrical and Computer Engineering, Computer Science, and Biomedical Engineering Departments, the Language Technologies Institute, and a Lecturer in the School of Music. Much of Dr. Stern’s current research is in spoken language systems, where he is particularly concerned with the development of techniques with which automatic speech recognition can be made more robust with respect to changes in environment and acoustical ambience. He has also developed sentence parsing and speaker adaptation algorithms for earlier CMU speech systems. In addition to his work in speech recognition, Dr. Stern has worked extensively in psychoacoustics, where he is best known for theoretical work in binaural perception. Dr. Stern is a Fellow of the Acoustical Society of America, the 2008-2009 Distinguished Lecturer of the International Speech Communication Association, a recipient of the Allen Newell Award for Research Excellence in 1992, and he served as General Chair of Interspeech 2006. He is also a member of the IEEE and the Audio Engineering Society.

Date:
Speakers:
Richard M. Stern
Affiliation:
Carnegie Mellon University