C. Chelba and Alex Acero
September 2005
The paper presents the Position Specific Posterior Lattice, a
novel lossy representation of automatic speech recognition lattices
that naturally lends itself to efficient indexing and subsequent
relevance ranking of spoken documents. Albeit lossy, the
PSPL lattice is much more compact than the ASR 3-gram lattice
from which it is computed, at virtually no degradation in
word-error-rate performance. Since new paths are introduced
in the lattice, the “oracle” accuracy increases over the original
ASR lattice.
In experiments performed on a collection of lecture recordings
— MIT iCampus database — the spoken document ranking
accuracy was improved by 20% relative over the commonly
used baseline of indexing the 1-best output from an automatic
speech recognizer. The Mean Average Precision (MAP) increased
from 0.53 when using 1-best output to 0.62 when using
the new lattice representation. The reference used for evaluation
is the output of a standard retrieval engine working on the
manual transcription of the speech collection.
![]() PDF file |
In Proc. of the Interspeech Conference
| Type | Inproceedings |