Share this page
Share this page E-mail this page Print this page RSS feeds
Home > People > Jasha Droppo
Jasha Droppo

RESEARCHER
.

I hope you've found this page because we are interested in the same things. This is a picture of me when I lived in a tiny condo during graduate school. That's where I learned to enjoy linear algebra, information theory, and machine learning, which led directly to a career in digital signal processing and automatic speech recognition. More information about my professional life appears below. For those interested in my personal life, navigate to my Droppo family web site.

main(k){for(k=22;k--;putchar(k["\nnqf2tgqvsrdkpDoqrrvdk"]-k%5));}

Research interests

  • Feature transformation
  • Speech and acoustic digital signal processing
  • Noise-robust speech recognition
  • Low-bandwidth robust cepstral transport
  • Time-frequency representations
  • Nonstationary signal modeling and classification

Biography

I have been with Microsoft Research since July, 2000. My primary task has been to explore different techniques to make ASR more robust to additive and channel noise. Other projects I've worked on include general speech signal enhancement, pitch tracking, multiple stream ASR, novel speech recognition features, MiPad multimodal interface, cepstral compression and transport, and the WITTY microphone.

The SPLICE project was successful in building a more robust speech recognition system. Working jointly with Alex Acero and Li Deng, we were able to get amazing results on the noisy Aurora2 corpus. But, there were fundamental problems with the approach.

The model-based feature enhancement project was meant to address the stereo-data requirement for SPLICE. The model describes how speech and noise (and noisy channels) interact to corrupt the speech features. It can be used to either enhance the features before recognition, or to adapt the recognizer's model at run-time.

Lately, I've moved to more general non-parametric warpings of the feature space. We're trying to learn transformations that improve recognition performance in both noisy and clean conditions.

I earned my Ph.D. in Electrical Engineering at the University of Washington's Interactive Systems Design Laboratory in June of 2000. Early in my studies, I helped to develop a discrete theory for time-frequency representations of non-stationary audio signals. The application of this theory to speech recognition was the core of my thesis, "Time-Frequency Representations for Speech Recognition." Other projects I worked on during this time included a GMM-based speaker verification system, subliminal audio message encoding, and non-linear signal morphing.

My MSEE was also earned at the University of Washington, in 1996. During this time, I worked on a project to develop and build an acoustic pyrometer. The device probes the fireball within a coal-fired electrical plant with several sound pressure waves, and determines a temperature profile based on acoustic time of flight measurements. My thesis described the algorithms and techniques developed to make such a device feasable.

I earned my BSEE from Gonzaga University in Spokane, in 1994. My final project consisted of building a control system for a high speed dot-matrix printer. Fuzzy logic was popular at the time, so we used it as the basis of our system. I learned two important lessons from the project, both of which were probably unintentional. First lesson: stay away from fads. Whereas a conventional design would have been well understood and easy to implement with a guaranteed minimum level of performance, our fuzzy controller needed a lot of background work and experimentation to get working correctly. Second lesson: embrace fads. I wrote a paper comparing and contrasting the behavior of fuzzy controlers to linear controllers, and recieved first prize in the region's IEEE paper contest.

Downloads

Switching Linear Dynamic Model

This code enables training and evaluation of a switching linear dynamic model for enhancing cepstral streams for automatic speech recognition, as described in our ICASSP 2004 paper, "Noise Robust Speech Recognition with a Switching Linear Dynamic Model."

Pitch and Voicing Estimates for Aurora 2

This archive consists of a set of pitch period and voicing estimates for utterances found in the Aurora 2 corpus[1] using the algorithm described in [2]. Currently, pitch estimates are available for test sets A and B, as well as the clean training data. [1] H. G. Hirsch and D. Pearce, "The AURORA experimental framework for the performance evaluations of speech recognition systems under noisy condidions", in ISCA ITRW ASR2000 "Automatic Speech Recognition: Challenges for the Next Millennium", Paris, France, September 2000. [2] J. Droppo and A. Acero. Maximum a Posteriori Pitch Tracking, in Proc. of the Int. Conf. on Spoken Language Processing. Sydney, Australia. Dec 1998.

Publications

Last modified: Thursday, February 24, 2005

E-mail: jdroppo at microsoft dot com
U.S.Mail: Microsoft Corporation, One Microsoft Way, Redmond WA, 98052, USA
Tel: (425) 703-7114
Fax: (425) 706-7329