The Microsoft Audio Video Indexing Service (MAVIS) is a Windows Azure application which uses state of the art speech recognition technology developed at Microsoft Research to enable searching of digitized spoken content, whether they are from meetings, conference calls, voice mails, presentations, online lectures, or even Internet video. A side benefit of MAVIS is the ability to generate automatic closed captions and keywords which can increase accessability and discoverability of audio and video files with speech content.
At this time MAVIS supports English speech content.
MAVIS is now available as a commercial service through a subscription to Greenbutton inCus.
Search audio for spoken words - MAVIS generates a binary file which can be searched in Microsoft SQL server using full text search. The user experience is much like searching for text in documents and on the web as demonstrated on the MAVIS trial site. Users type in search terms, the result is a set of links, which when clicked on, will start playing the audio from where those terms were spoken.
Highly accurate audio search Results - MAVIS uses state of the art Deep Neural Net (DNN) based speech recognition technology developed at Microsoft Research to convert audio signals into words. Furthermore, MAVIS reduces errors in speech recognition by automatically expanding its vocabulary, and storing word alternatives using a technique referred to as Probabilistic Word-Lattice Indexing, explained in the technical background. These techniques help provide highly accurate search results.
Closed Captions and keyword generation - Closed captions can make audio and video content accessible to the hearing impaired, or translated so that the content can be used by a broader audience in different languages. MAVIS can generate closed captions in the SAMI and TTML formats. The accuracy of closed captions generated by MAVIS will depend mainly on the clarity of speech content. There are a number of subtitle editing tools on the web which can be used to edit the closed captions generated by MAVIS for improved accuracy. Additionally, MAVIS can generate keywords which can better expose media content to search engines such as Bing & Microsoft Sharepoint, and can also be used to categorize your content, or assist in delivering contextual based ads.
As the role of multimedia continues to grow in the enterprise, Government, and the Internet, the need for technologies that better enable discovery and search of such content becomes all the more important.
Microsoft Research has been working in the area of speech recognition for over two decades, and speech-recognition technology is integrated in a number of Microsoft products, such as Windows 7, TellMe.com, Exchange 2010, and Office OneNote. Using integrated speech-recognition technology in the Windows 7 operating system, users can dictate into applications like Microsoft Word, or use speech to interact with their Windows system. The TellMe.com service allows mobile users to get directory services using speech while on the go. Exchange 2010 now provides a rough transcript of incoming voicemails and in Office OneNote, users can search their speech recordings using keywords.
MAVIS Adds to the list of Microsoft applications and services that use speech recognition. MAVIS is designed to enable searching of 100s or even 10,000s of hours of conversational speech with different speakers on different topics. As illustrated below, the user can type in a search term or phrase and get back links to where those words were spoken.
Searching media files requires the installation of the MAVIS SQL add-on on a machine running Microsoft SQL server 2008 or later. The MAVIS SQL Add-on includes the software components that perform full text index on binary files generated by MAVIS, an API for searching media files and a sample web application to help develop a media search site as illustrated by the MAVIS trial site.
News about MAVIS
- Questions or comments? Send us email at email@example.com