K-Best Suffix Arrays

Ken Church; Bo Thiesson; Robert Ragno

K-Best Suffix Arrays

Ken Church ,
Bo Thiesson ,
Robert Ragno

Proceedings of NAACL HLT 2007, Companion Volume, pages 17-20. Association for Computational Linguistic | April 2007

Published by Association for Computational Linguistics

Publication

Download BibTex

Suppose we have a large dictionary of strings. Each entry starts with a figure of merit (popularity). We wish to find the kbest matches for a substring, s, in a dictinoary, dict. That is, grep s dict | sort –n | head –k, but we would like to do this in sublinear time. Example applications: (1) web queries with popularities, (2) products with prices and (3) ads with click through rates. This paper proposes a novel index, k-best suffix arrays, based on ideas borrowed from suffix arrays and kdtrees. A standard suffix array sorts the suffixes by a single order (lexicographic) whereas k-best suffix arrays are sorted by two orders (lexicographic and popularity). Lookup time is between log N and sqrt N.