C. Chelba and Alex Acero
The paper presents the Position Specific Posterior Lattice, a novel lossy representation of automatic speech recognition lattices that naturally lends itself to efficient indexing and subsequent relevance ranking of spoken documents. Albeit lossy, the PSPL lattice is much more compact than the ASR 3-gram lattice from which it is computed, at virtually no degradation in word-error-rate performance. Since new paths are introduced in the lattice, the “oracle” accuracy increases over the original ASR lattice. In experiments performed on a collection of lecture recordings — MIT iCampus database — the spoken document ranking accuracy was improved by 20% relative over the commonly used baseline of indexing the 1-best output from an automatic speech recognizer. The Mean Average Precision (MAP) increased from 0.53 when using 1-best output to 0.62 when using the new lattice representation. The reference used for evaluation is the output of a standard retrieval engine working on the manual transcription of the speech collection.
|Published in||Proc. of the Interspeech Conference|