Find Your Lost Data

Published

By Suzanne Ross, Writer, Microsoft Research

The more data you have, the more you know The more you know, the more you forget. The more you forget, the less you know. So why have data?

Microsoft Researchers have an answer for this old, slightly twisted riddle. They’ve put together a nifty interface that will find all the data on your PC that you need, be it email, documents, tablet notes or spreadsheets. You can find all the data that people have sent to you, all the Web pages you’ve ever seen, and all the attachments you’ve ever forgotten to save.

Microsoft Research Podcast

Collaborators: Renewable energy storage with Bichlien Nguyen and David Kwabi

Dr. Bichlien Nguyen and Dr. David Kwabi explore their work in flow batteries and how machine learning can help more effectively search the vast organic chemistry space to identify compounds with properties just right for storing waterpower and other renewables.

You don’t have to remember where you put stuff, or even exactly what that stuff is. The program, called Stuff I’ve Seen (SIS), indexes everything you care about on your hard drive and your email. It sorts it by date, by type and allows you to filter and refine your search.

“Several years ago Susan Dumais and I realized that the technology existed at Microsoft to do high quality search on all of your stuff, but no one’s done it. So we just did it as a proof of concept. We wanted to index all of your life on the computer,” said Ed Cutrell, one of the researchers on the project.

“When we first developed SIS, we used some classic techniques from information retrieval called ‘best match score.’ We ranked all of the results by that score.”

“It became clear that this just wasn’t enough. The reason why is, when it’s your own stuff you have all of these other, better cognitive associations that helps you remember things. We found the date is far and away the most popular sort order. If you try to sort by date on the Web it’s going to be meaningless to you,” said Cutrell. In contrast, people often know lots of details about their own stuff and remember associations with other things in their lives.

For instance, if you’re looking for an email from your boss about the new product line, you might remember that he sent it sometime before you went on vacation five weeks ago. Using SIS, you could search on the name of your boss and quickly refine the search by the date or other memorable landmarks.

SISSIS is different from Web search in that it’s easy to filter or pivot after the initial search. The Web searches so many documents that search engines have to ask for lots of information up front. The problem with that is, you don’t always know exactly what you’re looking for. You may just have a vague idea that you need some information that was somewhere in an email or document.

With SIS, you can type in your best guess, such as ‘set up a blog,’ and then you can refine the search, filtering by the type of document, from an Excel spreadsheet to a Power Point file to a music file and more. You can further refine by date, rank, author or other properties that you remember about the document.

SIS can also reduce the need for bookmarks and folder organizations. Studies have found that 70% of the Web sites we go to are Web sites we’ve gone to before. Finding them again can be tricky and time consuming. You have to either maintain an extensive file system, or hope that you can remember the exact search keyword you used before to find it again. SIS just automatically saves the Web pages you’ve gone to and adds them to your index.

A few people have wondered if SIS exposes their documents to a ‘big brother.’ No, no need to worry about big bro’. SIS only finds the documents that are already on your local hard drive and all your mail. The only way for someone to get to your data is if they hack your computer, steal your password, or you let them in.

Stuff I Should See

Cutrell and the team at Adaptive Systems and Interaction have added another feature to SIS that helps those of us who don’t know what we know. It’s called Implicit Query, or SIS IQ for short. SIS IQ finds things that we didn’t even realize we needed.

If you’re working on an email or a document, SIS IQ will search your index for information related to that document. You may be responding to a request from someone, and you forgot you had already sent over a thick set of attachments on this same subject to another friend three months ago. SIS IQ will find it for you, and display it unobtrusively on a sidebar next to your work area.

Some people are perfectly happy filing and categorizing their stuff. They have nested folders within nested folders. But for the rest of us, SIS offers a way to just throw everything in one big pile and forget about it. SIS will find it when you need it again.

Continue reading

See all blog posts