Scattering Invariants for Audio Classification

Speaker  Joakim Anden

Affiliation  Ecole Polytechnique

Host  Mike Seltzer

Duration  01:02:06

Date recorded  27 February 2014

To obtain efficient feature representations for audio classification, it is desirable to have invariance to time-shift and stability to time-warping. Mel-frequency cepstral coefficients (MFCCs) satisfy these criteria, but are unsuitable for modeling large-scale temporal structure. The scattering transform extends this representation through a convolutional network of wavelet transforms and modulus operators, capturing structures at larger time scales. Additional invariance to frequency transposition with stability to frequency-warping is obtained by applying a second scattering transform along the log-frequency axis. Using these representations, we obtain state-of-the-art results on tasks such as phone segment classification and musical genre classification on the TIMIT and GTZAN datasets, respectively.

©2014 Microsoft Corporation. All rights reserved.
> Scattering Invariants for Audio Classification