|
User-Specific Training for Vocal Melody Transcription |
Andrew Guillory, Sumit Basu, and Dan Morris
Overview
This page contains supplementary material for our AAAI 2010 paper: “User-Specific Learning for Recognizing a Singer’s Intended Pitch”.
The full citation for our paper follows, along with a link to the paper itself:
Guillory A, Basu S, and Morris D. User-Specific Learning for Recognizing a Singer’s Intended Pitch. Proceedings of AAAI 2010, July 2010.
For more information about this work, contact Dan Morris (dan@microsoft.com) and Sumit Basu (sumitb@microsoft.com).
Abstract
We consider the problem of automatic vocal melody transcription:
translating an audio recording of a sung melody
into a musical score. While previous work has focused on
finding the closest notes to the singer’s tracked pitch, we instead
seek to recover the melody the singer intended to sing.
Often, the melody a singer intended to sing differs from what
they actually sang; our hypothesis is that this occurs in a
singer-specific way. For example, a given singer may often
be flat in certain parts of her range, or another may have difficulty
with certain intervals. We thus pursue methods for
singer-specific training which use learning to combine different
methods for pitch prediction. In our experiments with human
subjects, we show that via a short training procedure we
can learn a singer-specific pitch predictor and significantly
improve transcription of intended pitch over other methods.
For an average user, our method gives a 20 to 30 percent reduction
in pitch classification errors with respect to a baseline
method comparable to commercial voice transcription tools.
For some users, we achieve even more dramatic reductions.
Our best results come from a combination of singer-specificlearning
with non-singer-specific feature selection. We are
also making our experimental data available to allow others
to replicate or extend our results, and we discuss the implications
of our work for training more general control signals.
Supplementary Material
The primary purpose of this page is to host the data used in our experiments, which consist of:
We hope these recordings can serve as the beginning of a larger data repository, and as a benchmark data set for user-specific training or vocal melody transcription for environments with fixed tempos (an important feature of these recordings is that they were created by asking users to sing along with a drum beat).
Our complete experimental procedure is described in detail in our paper, and the instructions displayed to participants will be included at the end of this page. We note that only 22 recordings are included here, which is smaller than the total number collected: not all participants consented to having their recordings publicly released. However, the data set posted on this page is not systematically biased and is appropriate for testing alternate methods and understanding our experiments.
Our data archive can be downloaded as a single zipfile:
public_pitch_data.zip (560MB)
The archive contains two directories:
input_data
This directory contains the ground truth files in both MIDI and audio format. Each file is numbered and named to correspond to the recordings for each participant (described below). In some cases files are labeled as “male” or “female”; these represent the same melody in slightly different ranges to allow for reasonable reproduction by participants of both genders. Each example is included as a MIDI sequence (.mid), an “example” sequence (.wma) (this is what participants heard before they were asked to sing back each example), and an “accompaniment” sequence (this is what participants heard while they were singing back each example: just a drum beat, an initial cue to set the key, and a count-in voiceover.
For example, melody 2 is represented in this directory as six files (all of which are live links on this page, as examples):
02TwinkleFemale.accompaniment.wma
02TwinkleFemale.example.wma
02TwinkleFemale.mid
02TwinkleMale.accompaniment.wma
02TwinkleMale.example.wma
02TwinkleMale.mid
public
This directory contains recordings from 22 partipcants singing along with each of our 21 melodies. For example, the directory called “P004” contains participant 4’s recordings of all 21 melodies, so files are named:
00Easy1.training.wav
01Easy2.training.wav
02TwinkleMale.training.wav
03Diddle2Male.training.wav
...
21OdeToJoyMale.training.wav
The remainder of this page contains a brief video example of our data collection application, along with the instructions provided to participants, to demonstrate the process used for data collection. Please contact Dan Morris (dan@microsoft.com) and Sumit Basu (sumitb@microsoft.com) if you have questions about our procedure, are interested in implementing a competing method, or are interested in adding to our data repository!