RARE: Robust Audio Recognition Engine

Chris Burges, John Platt, Erin Renshaw

Text Mining, Navigation and Search and Knowledge Tools

Jonathan Goldstein, Database

Microsoft Research

 

Links:

Papers:

An IEEE paper describing technical details and tests on a large database (10 pages, postscript)

A shorter paper describing the feature extraction algorithms used (4 pages, pdf)

A tech report on a new bitvector filtering algorithm for fast lookup (12 pages, pdf)

Using audio fingerprinting for duplicate detection and thumbnail generation (4 pages, pdf)

 

Talks:

Robust Audio Feature Extraction, 5.17.02

RARE - Research Presentations, 8.22.02

High level Overview of RARE, 4.10.03

Overview of Bit Vectors 4.10.03

 

2003 TechFest Posters:

Fast, Robust Audio Fingerprinting (Distortion Discriminant Analysis for Robustness to Noise)

Redundant Bit Vectors for Fast High-Dimensional Database Lookup

 

Brief Description:

Audio fingerprinting (AF) attempts to identify audio clips, either in files or in audio streams.  The fingerprints are constructed previously from clean copies of the clips; in this work, fingerprints of length 256 bytes are used, although the length is easily to change. Our audio fingerprinting works with any sort of audio.

AF has many possible applications. Software music players can use AF to identify metadata such as artist, album and track when other means fail (as happens today for user-generated CDs). Stream AF can be used with portable devices such as PDAs to identify broadcast music. Companies can use stream AF to detect whether commercials they have paid for actually air in the market and at the time they expect, and to detect if their commercials have been shortened. AF can be used to completely automate the detection of (noisy) copies of audio in a database, which can be very useful for large databases containing unlabeled data.  We've also shown how it can be used to detect choruses in music.  Stream AF could also be used to greatly increase the accuracy of the statistical sampling of broadcast music that is currently used to assess royalties for artists.