Locality in Search Engine Queries and Its Implications for Caching

Yinglian Xie and David O'Hallaron


Caching is a popular technique for reducing both server load

and user response time in distributed systems. In this paper, we consider the

question of whether caching might be effective for search engines as well.

We study two real search engine traces by examining query locality and its

implications for caching. Our trace analysis results show that: (1) Queries

have significant locality, with query frequency following a Zipf distribution.

Very popular queries are shared among different users and can be cached

at servers or proxies, while 16% to 22% of the queries are from the same

users and should be cached at the user side. Multiple-word queries are

shared less and should be cached mainly at the user side. (2) If caching is

to be done at the user side, short-term caching for hours will be enough to

cover query temporal locality, while server/proxy caching should use longer

periods, such as days. (3) Most users have small lexicons when submitting

queries. Frequent users who submit many search requests tend to reuse

a small subset of words to form queries. Thus, with proxy or user side

caching, prefetching based on user lexicon looks promising.


Publication typeProceedings
Published inProc. of the Infocom 2002
> Publications > Locality in Search Engine Queries and Its Implications for Caching