Yinglian Xie and David O'Hallaron
June 2002
Caching is a popular technique for reducing both server load
and user response time in distributed systems. In this paper, we consider the
question of whether caching might be effective for search engines as well.
We study two real search engine traces by examining query locality and its
implications for caching. Our trace analysis results show that: (1) Queries
have significant locality, with query frequency following a Zipf distribution.
Very popular queries are shared among different users and can be cached
at servers or proxies, while 16% to 22% of the queries are from the same
users and should be cached at the user side. Multiple-word queries are
shared less and should be cached mainly at the user side. (2) If caching is
to be done at the user side, short-term caching for hours will be enough to
cover query temporal locality, while server/proxy caching should use longer
periods, such as days. (3) Most users have small lexicons when submitting
queries. Frequent users who submit many search requests tend to reuse
a small subset of words to form queries. Thus, with proxy or user side
caching, prefetching based on user lexicon looks promising.
![]() PDF file |
In Proc. of the Infocom 2002
| Type | Proceedings |