Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Kaisheng Yao

Kaisheng Yao

Kaisheng Yao is a researcher at Microsoft Research. He received his Ph.D. degree in Electrical Engineering in a joint program of Tsinghua University, China, and Hong Kong University of Science and Technology in 2000. From 2000 to 2002, he worked as an invited researcher at Advanced Telecommunication Research Lab in Japan. From 2002 to 2004, he was a post-doc researcher at Institute for Neural Computation at University of California at San Diego. From 2004 to 2008, he was with Texas Instruments. He joined Microsoft in 2008, first in a speech product group and later in Microsoft Research in 2013.

He has been active in both research and development areas including natural language understanding, speech recognition, machine learning and speech signal processing. He has published more than 50 papers in these areas and is the inventor/co-inventor of more than 20 granted/pending patents. At Microsoft, he has helped in shipping products such as smart watch gesture control, Bing query understanding, Xbox, and voice search. His current research and development interests are in the areas of deep learning using recurrent neural networks and its applications to natural language processing, document understanding, speech recognition and speech processing. He is a senior member of IEEE and a member of ACL.

Opensource Tools

  1. C++NN a C++ Neural Network Library
  2. Computation Network Toolkit (CNTK)

Workshops & Tutorials

  1. Second Frederick Jelinek Summer Workshop: Continuous Wide-Band Machine Translation.
  2. ICASSP 2015 tutorial on CNTK

Selected publications by topics

Natural Language Processing

  1. K. Yao, G. Zweig, and B. Peng, "Attention with Intention for a Neural Network Conversation Model", arxiv:1510.08565, also NIPS 2015 workshop.

  2. Y. Shi, K. Yao, H. Chen, Y.-C. Pan, M.-Y. Hwang, B. Peng, "Contextual Spoken Language Understanding using Recurrent Neural Networks", in ICASSP 2015

  3. G. Mesnil, Y. Dauphin, K. Yao, Y. Bengio, L. Deng, D. Hakkani-Tur, X. He, L. Heck, G. Tur, D. Yu and G. Zweig, "Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding", in IEEE Trans. on Audio, Speech and Language Processing, 2015

  4. K. Yao, B. Peng, Y. Zhang, D. Yu, G. Zweig, and Y. Shi, "Spoken Language Understanding using Long Short-Term Memory Neural Networks", in IEEE SLT 2014.

  5.  K. Yao, B. Peng, G. Zweig, D. Yu, X. Li and F. Gao "Recurrent Conditional Random Field for Language Understanding", in ICASSP 2014.

  6. K. Yao, G. Zweig, M. Hwang, Y. Shi and D. Yu, "Recurrent Neural Networks for Language Understanding", in INTERSPEECH, 2013

Machine Learning

  1. K. Yao, T. Cohn, K. Vylomova, K. Duh, and C. Dyer, "Depth-Gated LSTM", arXiv:1508.03790 [cs.NE], 2015.

  2. D. Yu, K. Yao, and Y. Zhang, "The Computational Network Toolkit", to appear in IEEE Signal Processing Magazine, Nov, 2015.

  3. B. Peng and K. Yao, "Recurrent Neural Networks with External Memory for Language Understanding", in, Best paper award in NLPCC.

  4. Dong Yu, Adam Eversole, Mike Seltzer, Kaisheng Yao, Zhiheng Huang, Brian Guenter, Oleksii Kuchaiev, Yu Zhang, Frank Seide, Huaming Wang, Jasha Droppo, Geoffrey Zweig, Chris Rossbach, Jon Currey, Jie Gao, Avner May, Baolin Peng, Andreas Stolcke, Malcolm Slaney, "An Introduction to Computational Networks and the Computational Network Toolkit", Microsoft Technical Report MSR-TR-2014-112, 2014.

  5. K. Yao, B. Peng, G. Zweig, D. Yu, X. Li, and F. Gao, “Recurrent conditional random fields”, in Deep Learning Workshop, NIPS, 2013.
  6. K. Yao, S. Nakamura, “Sequential noise compensation by sequential Monte Carlo method”, in Advances in Neural Information Processing Systems 14, pp. 1205-1212, edited by T. G. Dietterich, S. Becker, and Z. Ghahramani, MIT press, 2001.

Wearable Computing

  1. Y. Li, K. Yao, and G. Zweig, "Feedback-based handwriting recognition from inertial sensor data for wearable devices", in ICASSP 2015

Speech Recognition

  1. K. Yao and G. Zweig, "Sequence-to-sequence Neural Net Models for Grapheme-to-phoneme Conversion", in Interspeech 2015
  2. S. Zhang, C. Liu, K. Yao, and Y. Gong, "Deep neural support vector machines for speech recognition", in ICASSP 2015, presentation slides.
  3. K. Kalgaonkar, C. Liu, Y. Gong, and K. Yao, "Estimating confidence scores on ASR results using recurrent neural networks", in ICASSP 2015
  4. K. Yao, D. Yu, L. Deng, and Y. Gong, “A fast maximum likelihood nonlinear feature transformation method for GMM-HMM speaker adaptation”, in Neurocomputing, 2013
  5. D. Yu, K. Yao, H. Su, G. Li, and F. Seide, “KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition”, in IEEE International Conference on Acoustics, Speech, and Signal Processing, 2013.

  6. D. Povey and K. Yao, “A Basis Representation of Constrained MLLR Transforms for Robust Adaptation”, Computer, Speech, and Language, vol.26, no. 1., 2012. 

  7. L. Deng, J.-T. Huang, J. Li, K. Yao, et al, “Recent advances of deep learning for speech research at Microsoft”, in IEEE International Conference on Acoustics, Speech, and Signal Processing, 2013.
  8. K. Yao, D. Yu, F. Seide, H. Su, L. Deng, and Y. Gong, “Adaptation of context-dependent deep neural networks for automatic speech recognition”, in IEEE Spoken Language Technology workshop, 2012.
  9. K. Yao, Y. Gong and C. Liu, “A feature space transformation method for personalization using generalized i-vector clustering”, in INTERSPEECH 2012.
  10. D. Povey and K. Yao, “A basis method for robust estimation of constrained MLLR”, in IEEE International Conference on Acoustics, Speech, and Signal Processing, 2011.
  11. K. K. Paliwal and K. Yao, “Robust Speech Recognition under Noisy Ambient Conditions”, in Human Centric Interface for Ambient Intelligence, edited by R. Delgado, H. Aghajan, and J. Augusto, Elsevier, 2010. 
  12. K. Yao and L. Netsch, “An approach to low footprint pronunciation models for embedded speaker independent name recognition”, in IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal, Hawaii, 2007.
  13. K. Yao, L. Netsch, and V. Viswanathan, “Speaker-independent name recognition using improved compensation and acoustic modeling methods for mobile applications”, in IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 173-176, Toulouse, France, 2006.
  14. K. Yao, K. K. Paliwal, and T.-W. Lee, “Generative factor analyzed HMM for automatic speech recognition”, Speech Communication, vol. 45, no. 4, pp. 435-454, 2005. 
  15. K. Yao and T.-W. Lee, “Sequential Monte Carlo Method of Time-Varying Noise Estimation for Speech Enhancement and Recognition”, EURASIP Journal on Applied Signal Processing, vol. 15, pp. 2366-2384, 2004
  16. K. Yao, K. K. Paliwal, and S. Nakamura, “Noise Adaptive Speech Recognition based on Sequential Noise Parameter Estimation”, Speech Communication, vol. 42, no. 1, pp. 5-23, 2004. 
  17. K. Yao, E. Visser, O.-W. Kwon, and T.-W. Lee, “A speech processing front-end with eigenspace normalization for robust speech recognition in noisy automobile environments”, in EUROSPEECH, pp. 9-12, 2003.
  18. K. Yao, K. K. Paliwal, S. Nakamura, “Noise adaptive speech recognition with acoustic models trained from noisy speech evaluated on Aurora-2 database”, in International Conference on Spoken Language Processing, vol. 4, pp. 2437-2440, Sept., 2002.
  19. K. Yao, K. K. Paliwal, S. Nakamura, “Noise adaptive speech recognition in time-varying noise based on sequential Kullback proximal algorithm”, in IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 189-192, Florida, U.S.A., May, 2002.
  20. K. Yao, J. Chen, K. K. Paliwal, S. Nakamura, “Feature Extraction and Model-based Noise Compensation for Noisy Speech Recognition evaluated on AURORA 2 Task”, in EUROSPEECH, vol.1, pp. 233-236, Denmark, Sept. 2001. Rank 3rd in international competition.

Speech Signal Processing

  1. K. Yao, “A noise robust algorithm for underdetermined source separation”, in IEEE Workshop on Statistical Signal Processing, 2009.
  2. T.-W. Lee and K. Yao, “Speech enhancement by perceptual filter with sequential noise parameter estimation”, in IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 693-696, Montreal, Canada, 2004.
  3. K. Yao and T.-W. Lee, “Speech enhancement with noise parameter estimated by a sequential Monte Carlo method”, in IEEE workshop on Statistical Signal Processing, pp. 609-612, St. Louis, U.S.A., 2003.

Recent Patents (out of 15 patents)

  1. K. Yao, G. Zweig, and D. Yu. Recurrent Conditional Random Fields
  2. K. Yao and Y. Gong, MS 337930.01 Feature Space Transformation for Personalization using Generalized i-Vector Clustering
  3. D. Yu, K. Yao, F. Seide, and G. Li, MS 338392.01 Deep Neural Network Adaptation with KL-Divergence Regularization