Jasha Droppo
Researcher
Speech Technology Group
I hope you've found this page because we are interested in the
same things. This is a picture of me when I lived in a tiny condo
during graduate school. That's where I learned to enjoy linear
algebra, information theory, and machine learning, which led directly
to a career in digital signal processing and automatic speech
recognition. More information about my professional life appears
below. For those interested in my personal life, navigate to my Droppo family web site.
main(k){for(k=22;k--;putchar(k["\nnqf2tgqvsrdkpDoqrrvdk"]-k%5));}
Research interests
- Feature transformation
- Speech and acoustic digital signal processing
- Noise-robust speech recognition
- Low-bandwidth robust cepstral transport
- Time-frequency representations
- Nonstationary signal modeling and classification
Biography
I have been with Microsoft Research since July, 2000. My primary task
has been to explore different techniques to make ASR more robust to
additive and channel noise. Other projects I've worked on include
general speech signal enhancement, pitch tracking, multiple stream ASR, novel speech recognition features, MiPad multimodal interface, cepstral
compression and transport, and the WITTY microphone.
The SPLICE project was successful in building a more robust speech recognition system. Working jointly
with Alex Acero and Li Deng, we were able to get amazing results on
the noisy Aurora 2 corpus. But, there were fundamental problems
with the approach.
The model-based feature enhancement project was meant to address the
stereo-data requirement for SPLICE. The model describes how speech and
noise (and noisy channels) interact to corrupt the speech features. It
can be used to either enhance the features before recognition, or to
adapt the recognizer's model at run-time.
Lately, I've moved to more general non-parametric warpings of the
feature space. We're trying to learn transformations that improve
recognition performance in both noisy and clean conditions.
I earned my Ph.D. in Electrical Engineering at the University of
Washington's Interactive Systems Design Laboratory in June of 2000.
Early in my studies, I helped to develop a discrete theory for
time-frequency representations of non-stationary audio signals. The
application of this theory to speech recognition was the core of my
thesis, "Time-Frequency Representations for Speech
Recognition." Other projects I worked on during this time
included a GMM-based speaker verification system, subliminal audio
message encoding, and non-linear signal morphing.
My MSEE was also earned at the University of Washington, in
1996. During this time, I worked on a project to develop and build an
acoustic pyrometer. The device probes the fireball within a coal-fired
electrical plant with several sound pressure waves, and determines a
temperature profile based on acoustic time of flight measurements. My
thesis described the algorithms and techniques developed to make such
a device feasable.
I earned my BSEE from Gonzaga University in Spokane, in 1994. My final
project consisted of building a control system for a high speed
dot-matrix printer. Fuzzy logic was popular at the time, so we used it
as the basis of our system. I learned two important lessons from the
project, both of which were probably unintentional. First lesson: stay
away from fads. Whereas a conventional design would have been well
understood and easy to implement with a guaranteed minimum level of
performance, our fuzzy controller needed a lot of background work and
experimentation to get working correctly. Second lesson: embrace
fads. I wrote a paper comparing and contrasting the behavior of fuzzy
controlers to linear controllers, and recieved first prize in the
region's IEEE paper contest.
Downloads
Switching Linear Dynamic Model
This code enables training and evaluation of a switching linear dynamic model for enhancing cepstral streams for automatic speech recognition, as described in our ICASSP 2004 paper, "Noise Robust Speech Recognition with a Switching Linear Dynamic Model."
Pitch and Voicing Estimates for Aurora 2
This archive consists of a set of pitch period and voicing estimates for utterances found in the Aurora 2 corpus[1] using the algorithm described in [2]. Currently, pitch estimates are available for test sets A and B, as well as the clean training data. [1] H. G. Hirsch and D. Pearce, "The AURORA experimental framework for the performance evaluations of speech recognition systems under noisy condidions", in ISCA ITRW ASR2000 "Automatic Speech Recognition: Challenges for the Next Millennium", Paris, France, September 2000. [2] J. Droppo and A. Acero. Maximum a Posteriori Pitch Tracking, in Proc. of the Int. Conf. on Spoken Language Processing. Sydney, Australia. Dec 1998.
Publications
- J. Droppo and A. Acero.
A Fine Pitch Model for Speech,
in Proc. of the Interspeech Conference. Sep, 2007, to appear.
- C. White, J. Droppo, A. Acero and J. Odell.
Maximum Entropy Confidence Estimation for Speech Recognition,
in Proc. of the Int. Conf. on Acoustics, Speech, and Signal Processing. Hawaii, April, 2007.
- J. Droppo and A. Acero.
Joint Discriminative Front End and Back End Training for Improved Speech Recognition Accuracy,
in Proc. of the Int. Conf. on Acoustics, Speech, and Signal Processing. Toulouse, May, 2006.
- I. Tashev, J. Droppo, and A. Acero.
Suppression Rule for Speech Recognition Friendly Noise Suppressors,
in Proc. of the 2006 Int. Conference Digital Signal Processing and Applications (DSPA). Moscow, Russia, Mar. 2006.
- J. Droppo, M. Mahajan, A. Gunawardana and A. Acero.
How to Train a Discriminative Front End with Stochastic Gradient Descent and Maximum Mutual Information,
in Proc. of the IEEE Workshop on Automatic Speech Recognition and Understanding. Puerto Rico, Dec, 2005.
- J. Droppo and A. Acero.
Maximum Mutual Information SPLICE Transform for Seen and Unseen Conditions,
in Proc. of the Interspeech Conference. Lisbon, Portugal, Sep, 2005.
- A. Subramanya, Z. Zhang, Z. Liu, J. Droppo, and A. Acero.
A Graphical Model for Multi-Sensory Speech Processing in Air-and-Bone Conductive Microphones,
in Proc. of the Interspeech Conference. Lisbon, Portugal, Sep, 2005.
- M. Seltzer, A. Acero, and J. Droppo.
Robust Bandwidth Extension of Noise-corrupted Narrowband Speech,
in Proc. of the Interspeech Conference. Lisbon, Portugal, Sep, 2005.
- L. Deng, J. Wu, J. Droppo, and A. Acero.
Analysis and Comparison of Two Speech Feature Extraction/Compensation Algorithms,
in IEEE Signal Processing Letters. Volume: 12 Issue: 6, Jun 2005. pp. 477-480.
- L. Deng, J. Droppo, and A. Acero.
Dynamic Compensation of HMM Variances Using the Feature Enhancement Uncertainty Computed From a Parametric Model of Speech Distortion,
in IEEE Transactions on Speech and Audio Processing. Volume: 13 Issue: 3, May 2005. pp. 412-421.
- Z. Liu, A. Subramanya, Z. Zhang, J. Droppo, and A. Acero.
Leakage Model and Teeth Clack Removal for Air- and Bone-Conductive Integrated Microphones
in Proc. of the Int. Conf. on Acoustics, Speech, and Signal Processing. Philadelphia, Mar, 2005.
- Z. Liu, Z. Zhang, A. Acero, J. Droppo and X. Huang.
Direct Filtering for Air- and Bone-Conductive Microphones,
in Proc. of the IEEE Workshop on Automatic Speech Recognition and Understanding. Siena, Italy. Sep, 2004.
- Z. Zhang, Z. Liu, M. Sinclair, A. Acero, L. Deng, J. Droppo, X. Huang, and Yanli Zheng.
Multi-Sensory Microphones for Robust Speech Detection, Enhancement and Recognition
in Proc. of the Int. Conf. on Acoustics, Speech, and Signal Processing. Montreal, May, 2004.
- J. Droppo and A. Acero.
Noise Robust Speech Recognition with a Switching Linear Dynamic Model
in Proc. of the Int. Conf. on Acoustics, Speech, and Signal Processing. Montreal, May, 2004.
- L. Deng, J. Droppo, and A. Acero.
Estimating Cepstrum of Speech Under the Presence of Noise Using a Joint Prior of Static and Dynamic Features,
in IEEE Transactions on Speech and Audio Processing. Volume: 12 Issue: 3 , May 2004. pp. 218-233.
- L. Deng, J. Droppo, and A. Acero.
Enhancement of log Mel power spectra of speech using a phase-sensitive model of the acoustic environment and sequential
estimation of the corrupting noise,
in IEEE Transactions on Speech and Audio Processing. Volume: 12 Issue: 2 , Mar 2004. pp. 133-143.
- Y. Zheng, Z. Liu, Z. Zhang, M. Sinclair, J. Droppo, L. Deng, A. Acero and X. Huang.
Air and Bone-Conductive Integrated Microphones for Robust Speech Detection and Enhancement,
in Proc. of the IEEE Workshop on Automatic Speech Recognition and Understanding. Virgin Islands, Dec, 2003.
- J. Wu, J. Droppo, L. Deng and A. Acero.
A Noise-Robust ASR Front-End Using Wiener Filter Constructed from MMSE Estimation of Clean Speech and Noise,
in Proc. of the IEEE Workshop on Automatic Speech Recognition and Understanding. Virgin Islands, Dec, 2003.
- L. Deng, J. Droppo, and A. Acero.
Recursive estimation of nonstationary noise using iterative stochastic approximation for robust speech recognition,
in IEEE Transactions on Speech and Audio Processing. Volume: 11 Issue: 6 , Nov 2003. pp. 568-580.
- M. Seltzer, J. Droppo, and A. Acero.
A Harmonic-Model-Based Front End for Robust Speech Recognition,
in Proc. of the Eurospeech Conference. Geneva, Switzerland, Sep, 2003.
- J. Droppo, L. Deng and A. Acero.
A Comparison of Three Non-Linear Observation Models for Noisy Speech Features,
in Proc. of the Eurospeech Conference. Geneva, Switzerland, Sep, 2003.
- L. Deng, J. Droppo and A. Acero.
Incremental Bayes Learning with Prior Evolution for Tracking Non-Stationary Noise Statistics from Noisy Speech Data,
in Proc. of the Int. Conf. on Acoustics, Speech, and Signal Processing. Hong Kong, April 2003.
- J. Droppo, L. Deng, and A. Acero.
Evaluation of SPLICE on the Aurora 2 and 3 Tasks,
in Proc. Int. Conf. on Spoken Language Processing. Denver, Colorado, Sep, 2002.
- J. Droppo, A. Acero, and L. Deng.
A Nonlinear Observation Model for Removing Noise from Corrupted Speech Log Mel-Spectral Energies,
in Proc. Int. Conf. on Spoken Language Processing. Denver, Colorado, Sep, 2002.
- L. Deng, J. Droppo, and A. Acero.
Exploiting Variances in Robust Feature Extraction Based on a Parametric Model of Speech Distortion,
in Proc. Int. Conf. on Spoken Language Processing. Denver, Colorado, Sep, 2002.
- L. Deng, J. Droppo, and A. Acero.
Log-Domain Speech Feature Enhancement Using Sequential MAP Noise Estimation and a Phase-sensitive Model of the Acoustic Environment,
in Proc. Int. Conf. on Spoken Language Processing. Denver, Colorado, Sep, 2002.
- J. Droppo, L. Deng and A. Acero.
Uncertainty Decoding with SPLICE for Noise Robust Speech Recognition,
in Proc. of the Int. Conf. on Acoustics, Speech, and Signal Processing. Orlando, Florida, May, 2002.
- L. Deng, J. Droppo and A. Acero.
A Bayesian Approach to Speech Feature Enhancement using the Dynamic Cepstral Prior,
in Proc. of the Int. Conf. on Acoustics, Speech, and Signal Processing. Orlando, Florida, May, 2002.
- L. Deng, J. Droppo and A. Acero.
Recursive Noise Estimation Using Iterative Stochastic Approximation for Stereo-based Robust Speech Recognition,
in Proc. of the IEEE Workshop on Automatic Speech Recognition and Understanding. Madonna di Campiglio, Italy, Dec, 2001.
- J. Droppo, A. Acero and L. Deng.
Evaluation of the SPLICE Algorithm on the Aurora2 Database,
in Proc. of the Eurospeech Conference. Aalborg, Denmark, Sep, 2001.
- L. Deng, A. Acero, L. Jiang, J. Droppo and X. Huang
High-Performance Robust Speech Recognition Using Stereo Training Data,
in Proc. of the Int. Conf. on Acoustics, Speech, and Signal Processing. Salt Lake City, Utah, May, 2001.
- J. Droppo, A. Acero and L. Deng.
Efficient Online Acoustic Environment Estimation for FCDCN in a Continuous Speech Recognition System,
in Proc. of the Int. Conf. on Acoustics, Speech, and Signal Processing. Salt Lake City, Utah, May, 2001.
- L. Deng, J. Droppo, and A. Acero.
Estimating cepstrum of speech under the presence of noise using a joint prior of static and dynamic features,
in IEEE Transactions on Speech and Audio Processing. Vol. 12, No. 3, May 2004, pp. 218-233.
- L. Deng, Y. Wang, K. Wang, A. Acero, H. Hon, J. Droppo, C. Boulis, D. Jacoby, M. Mahajan, C. Chelba, and X. Huang.
Speech and language processing for multimodal human-computer interaction (invited),
in Journal of VLSI Signal Processing Systems (Special issue on Real-World Speech Processing),
Vol. 36, No. 2, February 2004, pp. 161-187.
- L. Deng, K. Wang, A. Acero, H. Hon, J. Droppo, C. Boulis, Y. Wang, D. Jacoby, M. Mahajan, C. Chelba, and X.D.Huang.
Distributed Speech Processing in MiPad's Multimodal User Interface,
in IEEE Transactions on Speech and Audio Processing. Volume: 10 Issue: 8 , Nov 2002, pp. 605-619.
- X. Huang, A. Acero, C. Chelba, L. Deng, J. Droppo, D. Duchene, J. Goodman,
H. Hon, D. Jacoby, L. Jiang, R. Loynd, M. Mahajan, P. Mau, S. Meredith, S.
Mughal, S. Neto, M. Plumpe, K. Stery,. G. Venolia, K. Wang, Y. Wang.
MIPAD: A Multimodal Interaction Prototype,
in Proc. of the Int. Conf. on Acoustics, Speech, and Signal Processing. Salt Lake City, Utah, May, 2001.
- J. Droppo and L. Atlas.
Distance Metrics for Discrete Time-Frequency Representations,
in Proc. of the Int. Workshop on DSP. Hunt, Texas, Oct, 2000.
- J. Droppo.
Time-Frequency Features for Speech Recognition,
Doctoral Dissertation, University of Washington, Seattle, WA, May 2000.
- J. Droppo and A. Acero.
Maximum a Posteriori Pitch Tracking,
in Proc. of the Int. Conf. on Spoken Language Processing. Sydney, Australia. Dec 1998.
- J. Droppo and L. Atlas.
Application of Classifier-Optimal Time-Frequency Distributions to Speech Analysis,
in Proc. International Symposium on Time-Frequency and Time-Scale Analysis, 1998.
- L. Atlas, J. Droppo and J. McLaughlin.
Optimizing Time-Frequency Distributions for Automatic Classification,
in Proc. SPIE, 1997.
- J. McLaughlin, J. Droppo and L. Atlas.
Class-Dependent Time-Frequency Distributions via Operator Theory,
in Proc. of the Int. Conf. on Acoustics, Speech, and Signal Processing. Munich, Germany. 1997.
- S.B. Narayanan, J. McLaughlin, L. Atlas and J. Droppo.
Operator Theory Approach to Discrete Time-Frequency Distributions,
in Proc. International Symposium on Time-Frequency and Time-Scale Analysis, p. 521-524, June 1996.
- J. Droppo.
Fuzzy controllers compared to Linear Digital Controllers,
in Proc. NORTHCON 1994, p. 108-112, October 1994.
Last modified: Thursday, February 24, 2005
E-mail: jdroppo at microsoft dot com
U.S.Mail: Microsoft Corporation, One Microsoft Way, Redmond WA, 98052, USA
Tel: (425) 703-7114
Fax: (425) 706-7329 |