Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Frank Seide

Frank Seide
PRINCIPAL RESEARCHER
.

Research Interests

  • Current area of interest: Speech as Content
  • Stuff I worked on in the past
    • Using GPGPU for speech recognition
    • Music processing: Query by Humming, Music Steering (demo video)
    • Computer Auditory Scene Analysis (CASA)
    • Dialogue Systems

Bio

Frank Seide is a Research Manager at Microsoft Research Asia, Beijing, responsible for Research efforts on transcription of phone calls and voicemail and content-based indexing of video and audio.

Frank was born in Hamburg, Germany. In 1993, he received a Master degree in electrical engineering from University of Technology of Hamburg-Harburg. His research interests are in the area of automatic speech recognition and audio analysis, with current focus on modeling and algorithms for large-vocabulary conversational speech recognition, spoken-dialogue systems, and audio search.

From 1993-97, Frank worked at the speech research group of Philips Research in Aachen, Germany, on spoken-dialogue systems. He then transferred to Taiwan as one of the founding members of Philips Research East-Asia, Taipei, to lead a research project on Mandarin speech recognition.

In June 2001, he joined the speech group at Microsoft Research Asia, initially as a Researcher, since 2003 as Project Leader for offline speech applications, and since October 2006 as Research Manager.

Publications

Frank Seide, Hao Fu, Jasha Droppo, Gang Li, and Dong Yu, 1-Bit Stochastic Gradient Descent and Application to Data-Parallel Distributed Training of Speech DNNs, in Interspeech 2014, September 2014

Dong Yu, Adam Eversole, Mike Seltzer, Kaisheng Yao, Zhiheng Huang, Brian Guenter, Oleksii Kuchaiev, Yu Zhang, Frank Seide, Huaming Wang, Jasha Droppo, Geoffrey Zweig, Chris Rossbach, Jon Currey, Jie Gao, Avner May, Baolin Peng, Andreas Stolcke, and Malcolm Slaney, An Introduction to Computational Networks and the Computational Network Toolkit, no. MSR-TR-2014-112, August 2014

Frank Seide, Hao Fu, Jasha Droppo, Gang Li, and Dong Yu, On Parallelizability of Stochastic Gradient Descent for Speech DNNs, in ICASSP, IEEE SPS, May 2014

Anthony Aue, Qin Gao, Hany Hassan, Xiaodong He, Gang Li, Nicholas Ruiz, and Frank Seide, MSR-FBK IWSLT 2013 SLT System Description, in Internaltional Workshop on Spoken Language Translation (IWSLT), December 2013

Li Deng, Jinyu Li, Jui-Ting Huang, Kaisheng Yao, Dong Yu, Frank Seide, Michael Seltzer, Geoff Zweig, Xiaodong He, Jason Williams, Yifan Gong, and Alex Acero, Recent Advances in Deep Learning for Speech Research at Microsoft, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2013

Dong Yu, Mike Seltzer, Jinyu Li, Jui-Ting Huang, and Frank Seide, Feature Learning in Deep Neural Networks - Studies on Speech Recognition, in International Conference on Learning Representations, May 2013

Hang Su, Gang Li, Dong Yu, and Frank Seide, Error Back Propagation For Sequence Training Of Context-Dependent Deep Networks For Conversational Speech Transcription, in ICASSP 2013, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2013

Dong Yu, Kaisheng Yao, Hang Su, Gang Li, and Frank Seide, KL-Divergence Regularized Deep Neural Network Adaptation For Improved Large Vocabulary Speech Recognition, in ICASSP 2013, 2013

Dong Yu, Li Deng, and Frank Seide, The Deep Tensor Neural Network with Applications to Large Vocabulary Speech Recognition, in IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 2, pp. 388-396, IEEE, 2013

Gang Li, Huifeng Zhu, Gong Cheng, Kit Thambiratnam, Behrooz Chitsaz, Dong Yu, and Frank Seide, Context-Dependent Deep Neural Networks For Audio Indexing Of Real-Life Data, in SLT 2012, December 2012

Kaisheng Yao, Dong Yu, Frank Seide, Hang Su, Li Deng, and Yifan Gong, Adaptation Of Context-Dependent Deep Neural Networks For Automatic Speech Recognition, in SLT 2012, December 2012

Dong Yu, Li Deng, and Frank Seide, Large Vocabulary Speech Recognition Using Deep Tensor Neural Networks, in Interspeech, ISCA, September 2012

Xie Chen, Adam Eversole, Gang Li, Dong Yu, and Frank Seide, Pipelined Back-Propagation for Context-Dependent Deep Neural Networks, in Interspeech, ISCA, September 2012

Dong Yu, Frank Seide, and Gang Li, Conversational Speech Transcription Using Context-Dependent Deep Neural Networks, in ICML 2012, June 2012

Dong Yu, Frank Seide, Gang Li, and Li Deng, Exploiting Sparseness In Deep Neural Networks For Large Vocabulary Speech Recognition, in ICASSP 2012, IEEE SPS, March 2012

Frank Seide, Gang Li, Xie Chen, and Dong Yu, Feature engineering in context-dependent deep neural networks for conversational speech transcription, in ASRU 2011, IEEE, December 2011

Frank Seide, Gang Li, and Dong Yu, Conversational Speech Transcription Using Context-Dependent Deep Neural Networks, in Interspeech 2011, International Speech Communication Association, August 2011

Sha Meng, L.-F. Wang, Y.-M. Lin, G. Li, Kit Thambiratnam, and Frank Seide, Vocabulary and Language Model Adaptation Using just One File, in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, Dallas, 2010

Qian Lin, Lie Lu, Christopher Weare, and Frank Seide, Music Rhythm Characterization with Application to Workout-Mix Generation, in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Dallas, 2010

Frank Seide, Kit Thambiratnam, Lie Lu, and Peng Yu, Multimedia Retrieval Through Indexing Speech: An Enterprise Perspective, in Proc. ACM-Multimedia Third Workshop on Searching Spontaneous Conversational Speech (SSCS), Beijing, 2009

Linxing Xiao, Lie Lu, Frank Seide, and Jie Zhou, Learning a Music Similarity Measure on Automatic Annotations with Application to Playlist Generation, in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Taipei, 2009

Kit Thambiratnam and Frank Seide, Unsupervised Lattice-based Acoustic Model Adaptation for Speaker-Dependent Conversational Telephone Speech Transcription, in Proc. Interspeech, Brighton, 2009

Roy Wallace, Kit Thambiratnam, and Frank Seide, Unsupervised Speaker Adaptation for Telephone Call Transcription, in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, Taipei, 2009

Wenzhu Shen, Peng Yu, Frank Seide, and Ji Wu, Automatic Punctuation Generation For Speech, in Proc. IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Merano, 2009

Xing Xie, Lie Lu, Menglei Jia, Hua Li, Frank Seide, and Wei-Ying Ma, Mobile Search with Multimodal Queries, in Proc. of the IEEE, vol. 96, no. 4, April 2008

Yu Shi, Frank Seide, and Frank K. Soong, GPU-Accelerated Gaussian Clustering for fMPE Discriminative Training, in Proc. Interspeech, Brisbane, 2008

Sha Meng, Peng Yu, Jia Liu, and Frank Seide, Fusing Multiple Systems Into a Compact Lattice Index for Chinese Spoken Term Detection, in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Las Vegas, 2008

Sha Meng, Jian Shao, Peng Yu, Jia Liu, and Frank Seide, Addressing the Out-Of-Vocabulary Problem for Large-Scale Chinese Spoken-Term Detection, in Proc. Interspeech, Brisbane, 2008

Peng Yu, Yu Shi, and Frank Seide, Approximate Word-Lattice Indexing with Text Indexers: Time-Anchored Lattice Expansion, in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Las Vegas, 2008

Lie Lu and Frank Seide, Mobile Ringtone Search Through Query By Hummming, in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Las Vegas, 2008

Kit Thambiratnam and Frank Seide, Fragmented Context-Dependent Syllable Acoustic Models, in Proc. Interspeech, Brisbane, 2008

Jian Shao, Peng Yu, Qingwei Zhao, Yonghong Yan, and Frank Seide, Towards Vocabulary-Independent Speech Indexing for Large-Scale Repositories, in Proc. Interspeech, Brisbane, 2008

Frank Seide, Kit Thambiratnam, and Peng Yu, Word-Lattice Based Spoken-Document Indexing with Standard Text Indexers, in Proc. IEEE Workshop on Spoken Language Technology (SLT), IEEE, Goa, 2008

Peng Yu, Kit Thambiratnam, and Frank Seide, Word-Lattice Based Spoken-Document Indexing with Standard Text Indexers, in Proc. ACM-SIGIR Second Workshop on Searching Spontaneous Conversational Speech (SSCS), Association for Computing Machinery, Inc., Singapore, 2008

Kit Thambiratnam and Frank Seide, Learning Spoken Document Similarity and Recommendation Using Supervised Probabilistic Latent Semantic Analysis, in Proc. Interspeech, Antwerp, 2007

Peng Yu, Jie Xu, Guo-Liang Zhang, Yu-Chou Chang, and Frank Seide, A Hidden-State Maximum Entropy Model for Word Confidence Estimation, in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Hawaii, 2007

Chang-E Liu, Kit Thambiratnam, and Frank Seide, Online Vocabulary Adaptation using Limited Adaptation Data, in Proc. Interspeech, IEEE, Antwerp, 2007

Frank Seide, Peng Yu, and Yu Shi, Towards Spoken-Document Retrieval for the Enterprise: Approximate Word-Lattice Indexing with Text Indexers, in Proc. IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Kyoto, 2007

Sha Meng, Peng Yu, Frank Seide, and Jia Liu, A Study of Lattice-Based Spoken Term Detection for Chinese Spontaneous Speech, in Proc. IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Kyoto, 2007

Zheng-Yu Zhou, Peng Yu, Ciprian Chelba, and Frank Seide, Towards Spoken-Document Retrieval for the Internet: Lattice Indexing for Large-Scale Web-Search Architectures, in Proc. Human Language Technology Conference (HLT/NAACL), New York City, 2006

Neema Moraveji, Kit Thambiratnam, Liu Jun, Roger Yu, and Frank Seide, DynaLine: A Non-Disruptive TV User Interface for Passive Browsing of Internet Video, in Microsoft Research Technical Report, Microsoft Research, 2006

Kit Thambiratnam, Frank Seide, and Peng Yu, Discriminatively Trained Spoken Document Similarity Models and Their Application to Probabilistic Latent Semantic Analysis, in Proc. IEEE Workshop on Spoken Language Technology (SLT), IEEE, Aruba, 2006

Peng Yu, Duo Zhang, and Frank Seide, Maximum Entropy Based Normalization of Word Posteriors for Phonetic and LVCSR Lattice Search, in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Toulouse, 2006

Peng Yu, Kaijiang Chen, Chengyuan Ma, and Frank Seide, Vocabulary-Independent Indexing of Spontaneous Speech, in IEEE Transactions on Speech and Audio Processing, vol. 13, no. 5, September 2005

Frank Seide, The Use of Virtual Hypothesis Copies in Decoding of Large-Vocabulary Continuous Speech Recognition, in IEEE Transactions on Speech and Audio Processing, vol. 13, no. 4, July 2005

Peng Yu, Kaijiang Chen, Lie Lu, and Frank Seide, Searching the Audio Notebook: Keyword Search in Recorded Conversations, in Proc. Human Language Technology Conference (HLT/EMNLPT), Vancouver, 2005

Peng Yu and Frank Seide, Fast Two-Stage Vocabulary-Independent Search in Spontaneous Speech, in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Philadelphia, 2005

Frank Seide, Peng Yu, Chengyuan Ma, and Eric Chang, Vocabulary-Independent Search in Spontaneous Speech, in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Montreal, 2004

Peng Yu and Frank Seide, A Hybrid Word/Phoneme-Based Approach for Improved Vocabulary-Independent Search in Spontaneous Speech, in Proc. International Conference on Spoken Language Processing (ICSLP), Jeju Island, 2004

Kuan-Ting Chen, Shui-Lung Chuang, Frank Seide, Hsin-Min Wang, Lee-Feng Chien, and Eric Chang, New Word Learning for Spoken Document Processing Through Discovery of Comparable Texts from External Resources, in Proc. ISCA Workshop on Multilingual Spoken Document Retrieval, Hong Kong, 2003

Jianlai Zhou, Frank Seide, and Li Deng, Coarticulation Modeling by Embedding a Target-Directed Hidden Trajectory Model into HMM–Model and Training, in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Hong Kong, 2003

Frank Seide, Jian-Lai Zhou, and Li Deng, Coarticulation Modeling by Embedding a Target-Directed Hidden Trajectory Model into HMM–MAP Decoding and Evaluation, in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Hong Kong, 2003

Peng Yu, Frank Seide, Chengyuan Ma, and Eric Chang, An Improved Model-Based Speaker Segmentation System, in Proc. Eurospeech, Geneva, 2003

Eric Chang, Frank Seide, Helen M. Meng, Zhuoran Chen, Yu Shi, and Yuk-Chi Li, A System for Spoken Query Information Retrieval on Mobile Devices, in IEEE Transactions on Speech and Audio Processing, vol. 10, no. 8, November 2002

Nick J.-C. Wang, Sammy Lee, Frank Seide, and Lin-Shan Lee, Rapid Speaker Adaptation Using A Priori Knowledge by Eigenspace Analysis of MLLR Parameters, in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Salt Lake City, 2001

Hsiao-Chuan Wang, Frank Seide, Chiu-Yu Tseng, and Lin-Shan Lee, MAT-2000–Design, Collection, and Validation of a Mandarin 2000-Speaker Telephone Speech Database, in Proc. International Conference on Spoken Language Processing (ICSLP), Beijing, 2000

Frank Seide and Nick J.-C. Wang, Two-Stream Modeling of Mandarin Tones, in Proc. International Conference on Spoken Language Processing (ICSLP), Beijing, 2000

Yuan-Fu Liao, Nick J.-C. Wang, Max Huang, Hank C.-H. Huang, and Frank Seide, Improvements of the Philips 2000 Taiwan Mandarin Benchmark System, in Proc. International Conference on Spoken Language Processing (ICSLP), Beijing, 2000

Bernd Souvignier, Andreas Kellner, Bernhard Rueber, Hauke Schramm, and Frank Seide, The Thoughtful Elephant: Strategies for Spoken Dialog Systems, in IEEE Transactions on Speech and Audio Processing, vol. 8, no. 1, January 2000

Hank C.-H. Huang and Frank Seide, Pitch Tracking and Tone Features for Mandarin Speech Recognition, in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Istanbul, 2000

Chiwei Che, Nick J.-C. Wang, Max Huang, Hank C.-H. Huang, and Frank Seide, Development of the Philips 1999 Taiwan Mandarin Benchmark System, in Proc. Eurospeech, Budapest, 1999

Frank Seide, Max Huang, Hank C.-H. Huang, Chiwei Che, and Nick J.-C. Wang, SAMPA-C–A phonetic representation of Chinese for Speech Recognition, in Proc. Workshop on East-Asian Language Resources and Evaluation (Oriental COCOSDA), Taipei, 1999

Frank Seide and Nick J.-C. Wang, Phonetic Modeling in the Philips Chinese Continuous-Speech Recognition System, in Proc. Intl. Symposium on Chinese Spoken Language Processing, Singapore, 1998

Andreas Kellner, Bernhard Rueber, Frank Seide, and Bach-Hiep Tran, PADIS–A voice-controlled automatic telephone switchboard and directory information system, in Speech Communication, pp. 95-111, Elsevier, 1997

Andreas Kellner, Frank Seide, and Bernhard Rueber, With A Little Help From The Database–Developing Voice-Controlled Directory Information Systems, in Proc. IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Santa Barbara, 1997

Frank Seide and Andreas Kellner, Towards an Automated Directory Information System, in Proc. Eurospeech, Rhodos, 1997

S. Ortmanns, H. Ney, Frank Seide, and I. Lindam, A Comparison Of Time Conditioned And Word Conditioned Search Techniques For Large Vocabulary Speech Recognition, in Proc. International Conference on Spoken Language Processing (ICSLP), Philadelphia, 1996

Andreas Kellner, Bernhard Rueber, and Frank Seide, A Voice-controlled Automatic Switchboard and Directory Information System, in Proc. IEEE 3rd Workshop on Interactive Voice Technology for Telecommunications Applications (IVTTA), Basking Ridge, 1996

Frank Seide, Bernhard Rueber, and Andreas Kellner, Improving Speech Understanding by Incorporating Database Constraints and Dialogue History, in Proc. International Conference on Spoken Language Processing (ICSLP), Philadelphia, 1996

Bach-Hiep Tran, Frank Seide, and Volker Steinbiss, A Word Graph Based N-Best Search in Continuous Speech Recognition, in Proc. International Conference on Spoken Language Processing (ICSLP), Philadelphia, 1996

Harald Aust, Martin Oerder, Frank Seide, and Volker Steinbiss, The Philips Automatic Train Timetable Information System, in Speech Communication, vol. 17, Elsevier, November 1995

Harald Aust, Martin Oerder, Frank Seide, and Volker Steinbiss, A Spoken Language Inquiry System for Automatic Train Timetable Information, in Philips Journal of Research, Elsevier, 1995

Frank Seide, Fast Likelihood Computation for Continuous-Mixture Densities Using a Tree-based Nearest Neighbor Search, in Proc. Eurospeech, Madrid, 1995

Frank Seide and Alfred Mertins, Non-linear Regression Based Feature Extraction for Connected-word Recognition in Noise, in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Adelaide, 1994

Harald Aust, Martin Oerder, Frank Seide, and Volker Steinbiss, Experience with the Philips automatic train timetable information system, in Proc. of the 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications (IVTTA), Kyoto, 1994