Indexing Uncertainty for Spoken Document Search

C. Chelba and Alex Acero


The paper presents the Position Specific Posterior Lattice, a

novel lossy representation of automatic speech recognition lattices

that naturally lends itself to efficient indexing and subsequent

relevance ranking of spoken documents. Albeit lossy, the

PSPL lattice is much more compact than the ASR 3-gram lattice

from which it is computed, at virtually no degradation in

word-error-rate performance. Since new paths are introduced

in the lattice, the “oracle” accuracy increases over the original

ASR lattice.

In experiments performed on a collection of lecture recordings

— MIT iCampus database — the spoken document ranking

accuracy was improved by 20% relative over the commonly

used baseline of indexing the 1-best output from an automatic

speech recognizer. The Mean Average Precision (MAP) increased

from 0.53 when using 1-best output to 0.62 when using

the new lattice representation. The reference used for evaluation

is the output of a standard retrieval engine working on the

manual transcription of the speech collection.


Publication typeInproceedings
Published inProc. of the Interspeech Conference
> Publications > Indexing Uncertainty for Spoken Document Search