Share this page
  • Share this page on Twitter Share this page on Facebook Share this page on Digg Share this page on Del.icio.us Read the Inside Microsoft Research blog
  • E-mail this page Print this page
  • RSS feeds
Home > Projects > MAVIS
MAVIS

The Microsoft Research Audio Video Indexing System (MAVIS) is a set of software components that use speech recognition technology to enable searching of digitized spoken content, whether they are from meetings, conference calls, voice mails, presentations, online lectures, or even Internet video. A side benefit of MAVIS is the ability to generate automatic closed captions and keywords which can increase accessability and discoverability of audio and video files with speech content.

MAVIS is now available as a commercial service through a subscription to Greenbutton inCus

MAVIS Features

Search audio for spoken words - MAVIS uses speech to enable efficient searching for spoken words in audio and video files. The user experience is much like searching for text in documents and on the web. Users type in search terms, the result is a set of links which, when clicked on, will start playing the audio from where those terms were spoken. 

Highly accurate audio search Results - Speech recognition is prone to errors which can affect the accuracy of audio search results. The MAVIS technology reduces errors in speech recognition by automatically expanding its vocabulary, and storing word alternatives using a technique referred to as Probabilistic Word-Lattice Indexing, explained in the technical background. These techniques help increase the speech recognition accuracy.

Closed Captions and keyword generation - Closed captions can make audio and video content accessible to the hearing impaired, or translated so that the content can be used by a broader audience in different languages. MAVIS can generate closed captions in the SAMI format. The accuracy of closed captions generated by MAVIS will depend mainly on the clarity of the speaker and background noise. There are a number of subtitle editing tools on the web which can be used to edit the closed captions generated by MAVIS for improved accuracy. Additionally, MAVIS can generate keywords which can better expose media content to search engines such as Bing & Google, and can also be used to categorize your content, or assist in delivering contextual based ads. 

About MAVIS

As the role of multimedia continues to grow in the enterprise, Government, and the Internet, the need for technologies that better enable discovery and search of such content becomes all the more important.

Microsoft Research has been working in the area of speech recognition for over two decades, and speech-recognition technology is integrated in a number of Microsoft products, such as Windows 7, TellMe.com, Exchange 2010, and Office OneNote. Using integrated speech-recognition technology in the Windows 7 operating system, users can dictate into applications like Microsoft Word, or use speech to interact with their Windows system. The TellMe.com service allows mobile users to get directory services using speech while on the go. Exchange 2010 now provides a rough transcript of incoming voicemails and in Office OneNote, users can search their speech recordings using keywords.

MAVIS Adds to the list of Microsoft applications and services that use speech recognition. MAVIS is designed to enable searching of 100s or even 10,000s of hours of conversational speech with different speakers on different topics. As illustrated below, the user can type in a search term or phrase and get back links to where those words were spoken.

MAVIS comprises of speech recognition software components that run as a service in the Windows Azure Platform (MAVIS Azure service), full text search components that run in SQL Server 2008 and sample IIS .NET Web application for the UI.

 

MAVIS Architecture

People
Gong Cheng
Gong Cheng

Behrooz Chitsaz
Behrooz Chitsaz

Frank Seide
Frank Seide

Kit Thambiratnam
Kit Thambiratnam