Biplob Debnath, Sudipta Sengupta, Jin Li, David J. Lilja, and David Du
The bloom filter is a probabilistic data structure that provides a compact representation of a set of elements. To keep false positive probabilities low, the size of the bloom filter must be dimensioned a priori to be linear in the maximum number of keys inserted, with the linearity constant ranging typically from one to few bytes. A bloom filter is most commonly used as an in-memory data structure, hence its size is limited by the availability of RAM space on the machine. However, if sufficient RAM space is not available, we may need to store bloom filters in the secondary storage devices. Since magnetic disks provide very slow performance for the random access operations, it is not suitable for a bloom filter as its operations require lot of random accesses.
In this paper, we propose to use flash memory based storage devices to store bloom filters. Flash memory provides faster performance for read operations irrespective of random and sequential access patterns, while it provides poor performance for random write operations. Here, we present BloomFlash, a bloom filter designed for flash memory based storage, that is especially designed to take advantage of the physical characteristics of flash memory. To reduce random write operations, BloomFlash exploits two key design decisions: (i) buffering bit updates in RAM and applying them in bulk to flash, and (ii) hierarchical bloom filter design consisting of component bloom filters. With the help of two real-world data traces taken from representative bloom filter applications, our experimental results show that BloomFlash achieves bloom filter access times in the range of few tens of microseconds, thus allowing up to approximately 28,500 operations per sec.
Microsoft Research, 2010.