SPEECH OGLE: Indexing Uncertainty for Spoken Document Search

The paper presents the Position Specific

Posterior Lattice (PSPL), a novel lossy

representation of automatic speech recognition

lattices that naturally lends itself

to efficient indexing and subsequent relevance

ranking of spoken documents.

In experiments performed on a collection

of lecture recordings — MIT iCampus

data — the spoken document ranking

accuracy was improved by 20% relative

over the commonly used baseline of

indexing the 1-best output from an automatic

speech recognizer.

The inverted index built from PSPL lattices

is compact — about 20% of the size

of 3-gram ASR lattices and 3% of the size

of the uncompressed speech — and it allows

for extremely fast retrieval. Furthermore,

little degradation in performance is

observed when pruning PSPL lattices, resulting

in even smaller indexes — 5% of

the size of 3-gram ASR lattices.

2005-chelba-aclb.pdf
PDF file

In  Proc. of the Association for Computational Linguistics

Details

TypeInproceedings
> Publications > SPEECH OGLE: Indexing Uncertainty for Spoken Document Search