Christopher J.C. Burges, CCSP Group, Microsoft Research
John C. Platt, CCSP Group, Microsoft Research
Soumya Jana, CCSP Group, Microsoft Research (current affiliation: Electrical and Computer Engineering, University of Illinois at Urbana-Champaign)
IEEE Transactions on Speech and Audio Processing, Volume 11, Number 3, pp. 165-174, (2003)
Mapping audio data to feature vectors for the
classification, retrieval or identification tasks presents four principal challenges.
The dimensionality of the input must be significantly reduced; the resulting
features must be robust to likely distortions of the input; the features must
be informative for the task at hand; and the feature extraction operation must
be computationally efficient. In this paper, we propose Distortion Discriminant
Analysis (DDA), which fulfills all four of these requirements. DDA constructs a
linear, convolutional neural network out of layers, each of which performs an
oriented PCA dimensional reduction.We demonstrate the effectiveness of DDA on
two audio fingerprinting tasks: searching for 500 audio clips in 36 hours of
audio test data; and playing over 10 days of audio against a database with
approximately 240,000 fingerprints. We show that the system is robust to kinds
of noise that are not present in the training procedure. In the large test, the
system gives a false positive rate of 1:5 x 10-8 per audio clip, per fingerprint, at
a false negative rate of 0.2% per clip.
Audio fingerprinting, robust feature
extraction, dimensionality reduction
© 2003 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
PDF file (170 KB)
A previous (conference) paper on the same topic was C.J.C. Burges, J.C. Platt, S. Jana, Extracting Noise-Robust Features from Audio Data, ICASSP, pp. I1021-I1024, (2002).
© 2002 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
PDF file (65 KB)