Scattering Invariants for Audio Classification
To obtain efficient feature representations for audio classification, it is desirable to have invariance to time-shift and stability to time-warping. Mel-frequency cepstral coefficients (MFCCs) satisfy these criteria, but are unsuitable for modeling large-scale temporal structure. The scattering transform extends this representation through a convolutional network of wavelet transforms and modulus operators, capturing structures at larger time scales. Additional invariance to frequency transposition with stability to frequency-warping is obtained by applying a second scattering transform along the log-frequency axis. Using these representations, we obtain state-of-the-art results on tasks such as phone segment classification and musical genre classification on the TIMIT and GTZAN datasets, respectively.
Speaker Details
Joakim Anden is a Ph.D. candidate in applied mathematics at Ecole Polytechnique in Paris, France under the supervision of Prof. Stephane Mallat. Previously, he studied engineering physics and mathematics at the Royal Institute of Technology in Stockholm, Sweden and fundamental mathematics at Universite Pierre et Marie Curie in Paris, France, from which he received an M.Sc. in 2010. His research focuses on invariant signal representations and their applications to classification and similarity estimation for speech, music and environmental sounds as well as medical signals.
- Series:
- Microsoft Research Talks
- Date:
- Speakers:
- Joakim Anden
- Affiliation:
- Ecole Polytechnique
-
-
Jeff Running
-
Series: Microsoft Research Talks
-
-
-
-
Galea: The Bridge Between Mixed Reality and Neurotechnology
Speakers:- Eva Esteban,
- Conor Russomanno
-
Current and Future Application of BCIs
Speakers:- Christoph Guger
-
Challenges in Evolving a Successful Database Product (SQL Server) to a Cloud Service (SQL Azure)
Speakers:- Hanuma Kodavalla,
- Phil Bernstein
-
Improving text prediction accuracy using neurophysiology
Speakers:- Sophia Mehdizadeh
-
-
DIABLo: a Deep Individual-Agnostic Binaural Localizer
Speakers:- Shoken Kaneko
-
-
Recent Efforts Towards Efficient And Scalable Neural Waveform Coding
Speakers:- Kai Zhen
-
-
Audio-based Toxic Language Detection
Speakers:- Midia Yousefi
-
-
From SqueezeNet to SqueezeBERT: Developing Efficient Deep Neural Networks
Speakers:- Sujeeth Bharadwaj
-
Hope Speech and Help Speech: Surfacing Positivity Amidst Hate
Speakers:- Monojit Choudhury
-
-
-
-
-
'F' to 'A' on the N.Y. Regents Science Exams: An Overview of the Aristo Project
Speakers:- Peter Clark
-
Checkpointing the Un-checkpointable: the Split-Process Approach for MPI and Formal Verification
Speakers:- Gene Cooperman
-
Learning Structured Models for Safe Robot Control
Speakers:- Ashish Kapoor
-
-