K-Best Suffix Arrays

Suppose we have a large dictionary of strings. Each entry starts with a figure of merit (popularity). We wish to find the kbest matches for a substring, s, in a dictinoary, dict. That is, grep s dict | sort –n | head –k, but we would like to do this in sublinear time. Example applications: (1) web queries with popularities, (2) products with prices and (3) ads with click through rates. This paper proposes a novel index, k-best suffix arrays, based on ideas borrowed from suffix arrays and kdtrees. A standard suffix array sorts the suffixes by a single order (lexicographic) whereas k-best suffix arrays are sorted by two orders (lexicographic and popularity). Lookup time is between log N and sqrt N.

kbest.pdf
PDF file

In  Proceedings of NAACL HLT 2007, Companion Volume, pages 17–20. Association for Computational Linguistic

Publisher  Association for Computational Linguistics
All copyrights reserved by ACL 2007

Details

TypeInproceedings
URLhttp://www.aclweb.org/
Pages17–20
> Publications > K-Best Suffix Arrays