Challenges in using lifetime personal information stores

SIGIR 2004: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval |

Published by ACM

Publication

Within five years, our personal computers with terabyte disk drives will be able to store everything we read, write, hear, and many of the images we see including video. Vannevar Bush outlined such a system in his famous 1945 Memex article [1]. For the last four years we have worked on MyLifeBits www.MyLifeBits.com http://www.MyLifeBits.com, a system to digitally store everything from one’s life, including books, articles, personal financial records, memorabilia, email, written correspondence, photos(time, location taken), telephone calls, video, television programs, and web pages visited. We recently added content from personal devices that automatically record photos and audio.The project started with the capture of Bell’s content [2], followed by an effort to explore the use of the SQL database for storage and retrieval. Work has continued along these lines to extend content capture from every useful source e.g. a meeting capture system. The second phase of the project includes the design of tools and links for annotation, collections, cluster analysis, facets for characterizing the content, creation of timelines and stories, and other inherent database related capabilities, e.g. the ability to pivot on an event or photo or person to retrieve linked information [3]. Ideally we would like to have a system that would read every document, extract meta-data(e.g. Dublin Core) and classify it using multiple ontologies, faceted classifications, or the relevant.While such a system has implications for future computing devices and their users, these systems will only exist if we can effectively utilize the vast personal stores. Although our system is exploratory, the Stuff I’ve Seen system [4] demonstrates the utility and necessity of easy search and access to one’s own data. Other research efforts with similar goals relating to personal information include Haystack [5], LifeStreams [6], and the UK “Memories for Life” Grand Challenge.There are serious research issues beyond the problem of making the information useful through rapid and easy retrieval.The “Dear Appy” problem (“Dear Appy, My application, or platform, or media left me unreadable. Signed, Lost Data”) is unsettling to archivists and computer professionals -and must be solved.Just navigating the stored life of individual would at first glance appear to take almost a lifetime to sift through. While we are making progress in the capture of less traditionally archived content(e.g. meetings, phone calls & video), automatic interpretation and index of voice are illusive. MyLifeBits is currently focused on retrieval including the hopefully automatic, addition of meta-data e.g. document type identification, high level knowledge. While such data is essential for the archivist, it is unclear how useful such meta-data is to a one’s own information; without such higher level knowledge and concepts, the vast amount of raw bits may be completely unusable.The most cited problem of personal archives is the control of the content including personal security, together with joint ownership of content by other individuals and organizations. In many corporations, periodic expunging of documents is the standard. Similarly, the aspects of a person’s life not available in public documents is owned by the organization and all documents may have to be tagged in such a way that it can be expunged, if necessary, when an individual is no longer part of the organization. The HPPA law in the US and even more stringent privacy laws in other counties have major implications for personal stores.