Share on Facebook Tweet on Twitter Share on LinkedIn Share by email

 


December 15: Microsoft releases a preview version of Skype Translator for
English and Spanish audiences.

November 12: Elementary school students in Tacoma, Washington, and Mexico City participate in the first Skype Mystery Call that uses a test version of Skype Translator. 

 

November 3: Microsoft launches the Skype Translator Preview program for Windows 8.1 computers and tablets.

 

July: Watch the Skype Translator demo from the Worldwide Partner Conference 2014, which features near real-time English-German translation.

 

Video: Skype Mystery Call

 

May: Microsoft announces and publicly demonstrates the Skype Translator, jointly developed by Microsoft researchers and Skype engineers:

Video: Skype Translator in action



 

Microsoft’s speech product group quickly productizes the company’s research breakthroughs in speech to deliver best-in-class speech recognition for Cortana and other speech-powered experiences within Microsoft products. With recognition accuracies closing in on human capabilities, the close partnership between Skype, Microsoft Research, and Microsoft’s Information Platform Group is critical in delivering this technology to Skype users worldwide.

 


Skype celebrates its 10th anniversary and reaches more than 1.4 trillion minutes of voice and video calls.



Microsoft’s deep neural network (DNN) research improves Bing Voice Search for Windows Phone. Additionally, Microsoft’s investments in machine translation research, combined with Bing’s information platform and web-scale architecture, power translations across a host of experiences, including features within Bing, Office, SharePoint, and Yammer.



Microsoft Translator Hub is released and implements a self-service model for building a highly customized automatic translation service between any two languages. This Azure-based service empowers language communities, service providers, and corporations to create automatic translation systems, allowing speakers of one language to share and access knowledge with speakers of any other language. By enabling translation to languages that aren’t supported by many mainstream translation engines, this also keeps less widely spoken languages vibrant and in use for future generations.

 
Video: Machine learning drives the hub


 

Eight sentences is all it takes for Rick Rashid, the founder of Microsoft Research, to electrify a crowd of 2,000 students and faculty in Tianjin, China. Decades of DNN and speech research culminate in a stunning live translation of Rashid’s voice speaking in English while the Chinese audience hears his voice in Mandarin. The speech recognition system in the demo rehearsal exhibits an error rate of less than 7%, or about the same as a person might perform at taking word-for-word notes.

 
Video: A breakthrough in translation


 

A seminal paper on speech transcription is authored by Microsoft researchers and presented at Interspeech 2011. Microsoft researchers show methods improve performance by over 30% compared to previous methods. Rather than having one word in 4 or 5 incorrect, the error rate becomes one word in 7 or 8. While still far from perfect, this is the most dramatic change in accuracy in the last decade.



Microsoft researchers in Asia become intrigued with the notion of translating the spoken word in the speaker’s own voice.

 

As such, The Translating! Telephone demo is shown publically for the first time at TechFest 2010, allowing a real-time translation of German to English using the voice of each speaker.



Microsoft researchers pioneer industrial-scale deep learning by first conducting large-scale industry technology development on voice search tasks, combining the strength of DNNs with the industry need for producing speech recognizers that are not only highly accurate but also highly efficient. The seminal journal paper published on the work was subsequently awarded the 2013 Best Paper Award by IEEE.



Before 2009, nearly all speech recognition systems are based on the technique of Gaussian mixture models (GMMs), with disappointing speech recognition results. Beginning in the latter part of 2009, things begin to change. The DNN model and a deep model which Microsoft researcher Li Deng and other colleagues developed earlier has interesting and distinct recognition error patterns. This discovery and subsequent collaboration motivates them to heavily invest research time in DNNs.



The Microsoft Machine Translation Service is released, enabling large scale translation of web content.



Geoff Hinton begins using DNNs for machine learning at the University of Toronto and publishes two seminal papers: "Fast Learning Algorithm for Deep Belief Nets," Hinton et al., Neural Computation, July 2006, and, "Reducing the Dimensionality of Data with Neural Networks," Hinton and R.R. Salakhutdinov, Science, July 2006.



Microsoft researchers Chris Quirk and Arul Menezes and University of Alberta researcher Colin Cherry develop the syntactic statistical machine translation approach that informs the future Microsoft machine translation system.



Skype is released. For the first time millions of users worldwide are able to communicate by video, without cost, over the Internet. Unprecedented person-to-person communication is enabled.



Zens, Och & Ney’s paper “Phrase-Based Statistical Machine Translation” simplifies and improves translation over earlier approaches.



Attacks on the World Trade Center initiate large scale DARPA funding for speech recognition, machine translation, and language processing. The Global Autonomous Language Exploitation (GALE) program combines speech recognition, machine translation, and information extraction. The DARPA TRANSTAC program demonstrates speech-to-speech translation on a handheld device, for short phrases.



Tokuda et al. derive speech parameter generation algorithm for HMM-based speech synthesis in “Speech Parameter Generation Algorithms for HMM-Based Speech Synthesis.” This method is later perfected by Frank Soong at Microsoft Research Asia.



Dragon Systems and IBM release the first commercial software for large vocabulary continuous speech recognitions, running on a PC with Microsoft Windows. Speech recognition becomes available to a mass audience.



Hunt and Black propose concatenative speech synthesis to create realistic sounding audio, in “Unit Selection in a Concatenative Speech Synthesis System Using a Large Speech Database.”



Early work on the core approaches for deep learning occurs when government-funded efforts experiment with DNNs. In particular, the Defense Advanced Projects Research Agency (DARPA) funds numerous large-scale research efforts in speech recognition. SRI International achieves success with DNNs in speaker recognition.



Brown et al. publish a seminal paper “A Statistical Approach to Machine Translation,” which suggests building machine translation systems using statistical methods based on the analysis of large amounts of data, rather than earlier approaches based on syntactic analysis and manipulation. The modern era of machine translation begins.



Neural network research becomes popular. A back-propagation algorithm is proposed and becomes widely accepted.



Lalit Bahl, Frederick Jelinek, and Jim Baker propose a noisy channel model for speech recognition, later known as Hidden Markov Models, that becomes the basis for current speech recognition systems. Work on automatic speech recognition begins at IBM and Carnegie Mellon University.



The US Department of Defense, National Science Foundation, and Central Intelligence Agency form the Automatic Language Processing Advisory (ALPAC) to study machine translation efforts. Funding for machine translation systems is curtailed after the ALPAC report finds that there are a sufficient number of human translators for current needs, and questions the ability to make high-quality automated systems. The report notes that “early machine translations of simple or selected text... were as deceptively encouraging as ‘machine translations’ of general scientific text have been uniformly discouraging.” Efforts in machine translation become relatively dormant.



IBM and Georgetown University demonstrate a computerized Russian/English translation system based on six grammar rules and a 250-word vocabulary. It translates sentences such as “Mi pyeryedayem mislyi posryedstvom ryechyi.” into “We transmit thoughts by means of speech.” Government funding for machine translation begins.



Machine translation pioneer Warren Weaver publishes his memorandum, “Translation,” describing computerized approaches for performing translation.



Success in breaking wartime cryptographic codes leads to the belief that similar methods might be successful in translating from one human language to another.