Replicated Indexes for Distributed Data

  • David Lomet

Published by Institute of Electrical and Electronics Engineers, Inc.

Publication

We describe a distributed index structure, in which data is distributed among multiple sites and indexes to the data are replicated over multiple sites. This permits good scalability as storage and accessing load are distributed over the sites and each site with an index replica has fast local access to the index structure, making remote requests at most for data at the leaves of the index tree. We call this method the dPi-tree because it is based on the Pi-tree. We replicate the index without the need for coherence messages. This works whether the index replica is persistent or a transient cached copy. We generalize a technique first used to provide recovery for Pi-tree indexes to independently and lazily maintain the index replicas. A further result is that each index replica is fully recoverable, an area not treated previously in replication schemes. We also show how the data in the leaves of the index can be distributed and re-distributed at very low cost.