Bernard Wong, Aleksandrs Slivkins, and Emin Gün Sirer
December 2008
Keyword search is a critical component in most content retrieval systems. Despite the emergence of completely decentralized and efficient peer-to-peer techniques for content distribution, there have not been similarly efficient, accurate, and decentralized mechanisms for content discovery based on approximate search keys. In this paper, we present a scalable and efficient peer-to-peer system called Cubit with a new search primitive that can efficiently find the k data items with keys most similar to a given search key. The system works by creating a keyword metric space that encompasses both the nodes and the objects in the system, where the distance between two points is a measure of the similarity between the strings that the points represent. It provides a loosely structured overlay that can efficiently navigate this space. This overlay also enables multi-keyword searches and supports boolean expressions over keywords. We evaluate Cubit through both a real deployment as a search plugin for a popular BitTorrent client and a large-scale simulation and show that it provides an efficient, accurate and robust method to handle imprecise string search in file-sharing applications.
| Type | TechReport |
| URL | http://www.cs.cornell.edu/ bwong/Cubit |
| Number | http://hdl.handle.net/1813/11651 |
| Institution | Cornell University CIS |