Statistical Media Processing

Statistical Media Processing (SMP) is a research project inside of the Knowledge Tools Group. Work in this project develops both technology and applications that lie in the intersection between media and statistics.


We define media is as the information that people use for communication, collaboration, entertainment, and memory archiving. Such media can include audio, still images, and video. In addition, metadata referring to such information is also valid media, such as descriptive metadata and user models. Statistics is defined as the automatic construction of intelligent systems by the examination of data. Statistics is a superset of machine learning. Statistical media processing includes media identification, classification, clustering, enhancement, recommendation, organization, and search.

The intersection between media and statistics is a very fruitful area of research, because each half stimulates new ideas in the other. By creating new machine learning and statistical algorithms that are appropriate for media processing, and by training these algorithms with real media gathered in real situations, we hope to push the state of the art of media processing and make Microsoft software have the best media processing features. Conversely, the application to media will push machine learning towards new directions, away from the classical classification & regression problems. These new directions include representations for media that are appropriate for machine learning, new algorithms to handle the small amounts of data typically available for user modeling, and new algorithms for modeling and enhancing media.

Primary contact: John Platt

Projects within SMP

  • Audio Fingerprinting --- A system which automatically identifies a clip in an audio stream, even if the stream is distorted or noisy. The system includes technology for automatically extracting noise-robust features from signals, and a fast database lookup algorithm.
  • AutoDJ --- A system for automatically generating music playlists, given one or more seed songs selected by a user. The system uses a machine learning algorithm that learns from previous experience.
  • Statistical Acoustic Signal Processing --- Methods for enhancing audio capture on the PC, including echo cancellation, denoising, and dereverberation. These enhancements are based on adaptive filters and advanced statistical methods.
  • AutoAlbum & PhotoTOC --- A interface that allows users to easily browse their digital photographs. The interface uses clustering and probabilistic methods to automatically create


Publications on Music synthesis and recommendation:

Publications on Music analysis and identification:

Publications on Audio and speech processing:

Publications on Image processing and display: