Voice Search of Structured Media Data

This paper addresses the problem of using unstructured queries to search a structured database in voice search applications. By incorporating structural information in music metadata, the end-to-end search error has been reduced by 15% on text queries and up to 11% on spoken queries. Based on that, an HMM sequential rescoring model has reduced the error rate by 28% on text queries and up to 23% on spoken queries compared to the baseline system. Furthermore, a phonetic similarity model has been introduced to compensate speech recognition errors, which has improved the end-to-end search accuracy consistently across different levels of speech recognition accuracy.

PDF file

In  International Conference on Acoustics, Speech and Signal Processing

Publisher  Institute of Electrical and Electornic Engineers, Inc.


AddressTaipei, Taiwan
> Publications > Voice Search of Structured Media Data