Scattering Invariants for Audio Classification

To obtain efficient feature representations for audio classification, it is desirable to have invariance to time-shift and stability to time-warping. Mel-frequency cepstral coefficients (MFCCs) satisfy these criteria, but are unsuitable for modeling large-scale temporal structure. The scattering transform extends this representation through a convolutional network of wavelet transforms and modulus operators, capturing structures at larger time scales. Additional invariance to frequency transposition with stability to frequency-warping is obtained by applying a second scattering transform along the log-frequency axis. Using these representations, we obtain state-of-the-art results on tasks such as phone segment classification and musical genre classification on the TIMIT and GTZAN datasets, respectively.

Speaker Details

Joakim Anden is a Ph.D. candidate in applied mathematics at Ecole Polytechnique in Paris, France under the supervision of Prof. Stephane Mallat. Previously, he studied engineering physics and mathematics at the Royal Institute of Technology in Stockholm, Sweden and fundamental mathematics at Universite Pierre et Marie Curie in Paris, France, from which he received an M.Sc. in 2010. His research focuses on invariant signal representations and their applications to classification and similarity estimation for speech, music and environmental sounds as well as medical signals.

Date:
Speakers:
Joakim Anden
Affiliation:
Ecole Polytechnique
    • Portrait of Jeff Running

      Jeff Running

Series: Microsoft Research Talks