Allocating inverted index into flash memory for search engines (abstract only)

  • Bojun Huang ,
  • Zenglin Xia

Proceedings of the 20th international conference on World Wide Web (WWW) |

Published by ACM

Publication

Although most large-scale web search engines adopt the standard DRAM-HDD storage hierarchy, the usage of hard disk is greatly limited by its long read latency. On the other hand, NAND Flash memory is 100x faster than hard disk and 10x cheaper than DRAM [2]. Therefore, it’s possible to allocate a significant portion of DRAM data into Flash memory, so as to save money on storage. This paper considers the optimal policy that allocates the DRAM portion of inverted index into Flash memory as much as possible. Note that the original hard disk portion of index data is still left in hard disk in our scheme, which actually results in a three-layer storage hierarchy. To our best knowledge, we are the first to show that it’s possible to get substantially better system performance for web index serving by trying some Flash-aware storage management approaches, rather than just plugging in a SSD and treating it as super hard disk. We limit our discussion in the static scenario, where posting lists are allocated atomically in either Flash memory or DRAM only when the index updates and no other data movement is performed at run time. The problem is very similar to static index caching/pruning [1] [4], except that the caching here is exclusive and the target storage is Flash memory. Note that previous work suggested that static policies work well for inverted index caching, compared with their dynamic counterparts [1].