Speaker Joakim Anden
Affiliation Ecole Polytechnique
Host Mike Seltzer
Date recorded 27 February 2014
To obtain efficient feature representations for audio classification, it is desirable to have invariance to time-shift and stability to time-warping. Mel-frequency cepstral coefficients (MFCCs) satisfy these criteria, but are unsuitable for modeling large-scale temporal structure. The scattering transform extends this representation through a convolutional network of wavelet transforms and modulus operators, capturing structures at larger time scales. Additional invariance to frequency transposition with stability to frequency-warping is obtained by applying a second scattering transform along the log-frequency axis. Using these representations, we obtain state-of-the-art results on tasks such as phone segment classification and musical genre classification on the TIMIT and GTZAN datasets, respectively.
©2014 Microsoft Corporation. All rights reserved.