K-Best Suffix Arrays
- Ken Church ,
- Bo Thiesson ,
- Robert Ragno
Proceedings of NAACL HLT 2007, Companion Volume, pages 17-20. Association for Computational Linguistic |
Published by Association for Computational Linguistics
Suppose we have a large dictionary of strings. Each entry starts with a figure of merit (popularity). We wish to find the kbest matches for a substring, s, in a dictinoary, dict. That is, grep s dict | sort –n | head –k, but we would like to do this in sublinear time. Example applications: (1) web queries with popularities, (2) products with prices and (3) ads with click through rates. This paper proposes a novel index, k-best suffix arrays, based on ideas borrowed from suffix arrays and kdtrees. A standard suffix array sorts the suffixes by a single order (lexicographic) whereas k-best suffix arrays are sorted by two orders (lexicographic and popularity). Lookup time is between log N and sqrt N.