Share this page
Share this page E-mail this page Print this page RSS feeds
Home > Publications > Clean Living: Eliminating Near-Duplicates in Lifetime Personal Storage
Clean Living: Eliminating Near-Duplicates in Lifetime Personal Storage

As lifetime personal storage is becoming a reality, we find that it is becoming increasingly difficult to search and navigate the contents one accumulates. One of the most striking issues is the duplicates and near duplicates that clutter search and navigation. We investigated different technique to eliminate the duplicates and near duplicates objects in the MyLifeBits personal storage system. Our results show the effectiveness of near-duplicate detection on personal contents like emails, documents and web pages visited. In one experiment, duplicate and near-duplicate detection reduced the number of documents a user must consider by 21% and the number of web pages by 43%.

tr-2006-30.doc
Word document
tr-2006-30.pdf
PDF file

Details

Type: TechReport
Number: MSR-TR-2006-30
Pages: 6
Institution: Microsoft Research