Statistical Media Processing (SMP) is a research project inside of the Knowledge Tools Group. Work in this project develops both technology and applications that lie in the intersection between media and statistics.
The SMP project is no longer active (although many of us still work in this field). Two other groups in Redmond have continued to work on using machine learning to process signals:
We define media is as the information that people use for communication, collaboration, entertainment, and memory archiving. Such media can include audio, still images, and video. In addition, metadata referring to such information is also valid media, such as descriptive metadata and user models. Statistics is defined as the automatic construction of intelligent systems by the examination of data. Statistics is a superset of machine learning. Statistical media processing includes media identification, classification, clustering, enhancement, recommendation, organization, and search.
The intersection between media and statistics is a very fruitful area of research, because each half stimulates new ideas in the other. By creating new machine learning and statistical algorithms that are appropriate for media processing, and by training these algorithms with real media gathered in real situations, we hope to push the state of the art of media processing and make Microsoft software have the best media processing features. Conversely, the application to media will push machine learning towards new directions, away from the classical classification & regression problems. These new directions include representations for media that are appropriate for machine learning, new algorithms to handle the small amounts of data typically available for user modeling, and new algorithms for modeling and enhancing media.
Primary contact: John Platt
Projects within SMP
- Audio Fingerprinting --- A system which automatically identifies a clip in an audio stream, even if the stream is distorted or noisy. The system includes technology for automatically extracting noise-robust features from signals, and a fast database lookup algorithm.
- AutoDJ --- A system for automatically generating music playlists, given one or more seed songs selected by a user. The system uses a machine learning algorithm that learns from previous experience.
- Statistical Acoustic Signal Processing --- Methods for enhancing audio capture on the PC, including echo cancellation, denoising, and dereverberation. These enhancements are based on adaptive filters and advanced statistical methods.
- AutoAlbum & PhotoTOC --- A interface that allows users to easily browse their digital photographs. The interface uses clustering and probabilistic methods to automatically create
Publications on Music synthesis and recommendation:
- Relationships Between Lyrics and Melody in Popular Music by E. Nichols, D. Morris, S. Basu, C. Raphael, Proc ISMIR, (2009).
- Data-Driven Exploration of Musical Chord Sequences by E. Nichols, D. Morris, S. Basu, Proc. IUI, (2009).
- Exposing Parameters of a Trained Dynamic Model for Interactive Music Creation by D. Morris, I. Simon, S. Basu, Proc. AAAI, (2008).
- MySong: Automatic Accompaniment Generation for Vocal Melodies by I. Simon, D. Morris, S. Basu, Proc. CHI, (2008).
- Audio Analogies: Creating New Music from an Existing Performance by Concatenative Synthesis by I. Simon, S. Basu, D. Salesin, and M. Agrawala, Proc. Int'l Conf. on Computer Music, (2005).
- Inferring Similarity between Music Objects with Application to Playlist Generation by R. Ragno, C.J.C. Burges, C. Herley, Proc. ACM Int'l Workshop on Multimedia Information Retrieval, pp. 73-80, (2005).
- Fast Embedding of Sparse Music Similarity Graphs by J. C. Platt, NIPS 16, pp. 571-578, (2004).
- Mixing with Mozart by S. Basu, Proc. Int'l Conf. on Computer Music, (2004).
- Learning a Gaussian Process Prior for Automatically Generating Music Playlists by J C. Platt, C.J.C. Burges, S. Swenson, C. Weare, A. Zheng, NIPS 14, pp. 1425-1432, (2002).
Publications on Music analysis and identification:
- ARGOS: Automatically Extracting Repeating Objects from Multimedia Streams by C. Herley, IEEE Trans. on Multimedia, Vol. 8, No 1., pp 115-129, (2006).
- Using Audio Fingerprinting for Duplicate Detection and Thumbnail Generation by C.J.C. Burges, D. Plastina, J.C. Platt, E. Renshaw, and H.S. Malvar, Proc. ICASSP, Vol. 3, pp. 9-12, (2005).
- Accurate Repeat Finding and Object Skipping using Fingerprints by C. Herley, Proc. ACM Multimedia, pp. 656-665, (2006).
- Redundant Bit Vectors for Quickly Searching High-Dimensional Regions by J. Goldstein, J.C. Platt, C.J.C. Burges, Proc. Sheffield Machine Learning Workshop, Springer Lecture Notes in Computer Science 3635, (2005).
- Extracting Repeats from Media Streams by C. Herley, Proc. ICASSP, Vol. 5, pp. 913-916, (2004).
- Distortion Discriminant Analysis for Audio Fingerprinting by C.J.C. Burges, J.C. Platt, S. Jana, IEEE Trans. on Speech and Audio Processing, Vol. 11, No. 3, pp. 165-174, (2003).
Publications on Audio and speech processing:
- Ensemble Deep Learning for Speech Recognition by L. Deng and J.C. Platt, Proc. Interspeech, (2014)
- HRTF Magnitude Synthesis via Sparse Representation of Anthropometric Features by P. Bilinski, J. Ahrens, M. R. P. Thomas, I. J. Tashev, and J. C. Platt, Proc. ICASSP (2014).
- Multichannel Acoustic Echo Cancellation in Multiparty Spatial Audio Conferencing with Constrained Kalman Filtering by Z. Zhang, Q. Cai, J.W. Stokes, Proc. Int'l Workshop on Acoustic Echo and Noise Control, (2008).
- Normalized Double-Talk Detection Based on Microphone and AEC Error Cross-Correlation by M.A Iqbal, J.W. Stokes, S.L. Grant, Proc. IEEE Int'l Conf. on Multimedia and Expo, (2007).
- Double-talk Detection using Real-time Recurrent Learning by M.A. Iqbal, J.W. Stokes, J.C. Platt, A.C. Surendran, S.L. Grant, Int'l Workshop on Acoustic Echo and Noise Control, (2006).
- Speaker Identification using a Microphone Array and a Joint HMM with Speech Spectrum and Angle of Arrival by J.W. Stokes, J.C. Platt, S. Basu, Proc. ICASSP, Vol 3, pp. 736-739, (2006).
- Robust RLS with Round Robin Regularization including Application to Stereo Acoustic Echo Cancellation by J.W. Stokes, J.C. Platt, Proc. ICME, (2006).
- Acoustic Echo Cancellation for High Noise Environments by A.S. Chhetri, J.W. Stokes, Proc. ICME (2006).
- Acoustic Echo Cancellation in a Channel with Rapidly Varying Gain by S. Basu, Proc. ICME, (2006).
- Hidden Conditional Random Fields for Phone Classification by A. Gunawardana, M. Mahajan, A. Acero, J.C. Platt, Proc. Interspeech, (2005).
- Regression-based Residual Acoustic Echo Suppression by A. Chhetri, A.C. Surendran, J.W. Stokes, J.C. Platt, International Workshop on Acoustic Echo and Noise Control, (2005).
- The Audio Epitome: A New Representation for Modeling and Classifying Auditory Phenomena by A. Kapoor, S. Basu, Proc. ICASSP, Vol. 5, pp. 189-192, (2004).
- Convolutional Networks for Speech Detection by S. Sukittanon, A.C Surendran, J.C. Platt, and C.J.C. Burges, ICSLP, (2004).
- Logistic Discriminative Speech Detectors using Posterior SNRs by A.C. Surendran, S. Sukittanon, and J.C. Platt, ICASSP, (2004).
- Acoustic Echo Cancellation with Arbitrary Playback Sampling Rate by J.W. Stokes, H.S. Malvar, Proc. ICASSP, Vol. 4, pp. 153-156, (2004)
Publications on Image processing and display:
- Performance-driven hand-drawn animation by I. Buck, A. Finkelstein, C. Jacobs, A. Klein, D. Salesin, J. Seima, R. Szeliski, K. Toyama, Proc. SIGGRAPH, Article No. 25, (2006).
- Home Video Browsing and Consumption through Exploration of a Learned Generative Model by N. Jojic, S. Basu, and N. Petrovic, Proc. CVPR, (2006).
- Recursive Estimation of Generative Models of Video by N. Petrovic, A. Ivanovic, N. Jojic, S. Basu, T. Huang, Proc. CVPR, (2006).
- Multiple Instance Boosting for Object Detection by P. Viola, J.C. Platt, C. Zhang, NIPS, Vol 18, pp. 1417-1426, (2006).
- Text Recognition of Low-resolution Document Images by C. Jacobs, P.Y. Simard, P. Viola, J. Rinker, Proc. ICDAR, pp. 695-699, (2005).
- Occlusion Removal from Minimum Number of Images by C. Herley, Proc. ICIP, Vol. 2, pp. 1046-1049, (2005).
- Learning Spatially-Variable Filters for Super-Resolution of Text by A. Corduneanu, J.C. Platt, Proc. ICIP, (2005).
- Efficient Inscribing of Noisy Rectangular Objects in Scanned Images by C. Herley, Proc. ICIP, Vol. 4, pp. 2399-2402, (2004).
- PhotoTOC: Automatic Clustering for Browsing Personal Photographs by J.C. Platt, M. Czerwinski, B. Field, Fourth IEEE Pacific Rim Conference on Multimedia (2003)
- Recursive Method to Extract Rectangular Objects from Scans by C. Herley, Proc. ICIP, Vol. 3, pp. 989-992, (2003).
- Image Analogies by A. Hertzmann, C. Jacobs, N. Oliver, B. Curless, D. Salesin, Proc. SIGGRAPH, pp. 327-340, (2001).
- Document Capture Using a Digital Camera by C. Herley, Proc. International Conference on Image Processing, (2001).
- AutoAlbum: Clustering Digital Photographs Using Probabilistic Model Merging by J.C. Platt, Proc. IEEE Workshop on Content-Based Access of Image and Video Libraries 2000, pp. 96-100, (2000).
- Optimal Filtering for Patterned Displays by J.C. Platt, IEEE Signal Processing Letters, Vol. 7, No. 7, pp. 179-181, (2000).
- Displaced Filtering for Patterned Displays by C. Betrisey, J.F. Blinn, B. Dresevic, B. Hill, G. Hitchcock, B. Keely, D.P. Mitchell, J.C. Platt, T. Whitted, Proc. Society for Information Display Symposium, pp. 296-299, (2000).