Archival References to Web Pages

Leslie Lamport
Compaq Systems Research Center
lamport@pa.dec.com

The Problem

These days, scholarly articles in archival journals often need to cite web pages. Given the decreasing cost of storage, there is reason to hope that a page will survive somewhere on the web for as long as a journal will survive in a library--for example, on the journal's web site. However, finding the page poses a problem. Currently, pages are cited by simply giving their URLs. But URLs change frequently, and the same URL is unlikely to lead to the page five, ten, or a hundred years after the article appears. This is a real problem today for technical journals.

The Solution

There is a ridiculously simple solution to this problem that requires no new infrastructure and can be used by anyone right now. A unique id is put on the page, and readers can find the page by just searching the web for its id. For example, one web page that I have cited in publications ends with:

This page can be found by searching the web for the 21-letter string uidnotreallythestring. Please do not put this string in any document that could wind up on the web--including email messages and Postscript and Word documents. You can refer to it in web documents as the string obtained by removing the - from uid-notreallythestring.

For obvious reasons, I have not used my page's actual unique id, which is the 21-letter string

formed by concatenating uid and lamporttlahomepage. As this example shows, there are a number of ways to mention the unique id in another web page so the other page won't be found by searching for the id. (The name displayed above is a gif image.)

Before putting a unique id on the web, one should search the web to be sure that the string does not already exist.

One could, of course, search for the page by using keywords or its title. However, that depends on the intelligence of the search engine and of the searcher. A search on the unique id will return just the one page. If the method becomes widely used, browsers can add an option that finds and retrieves the page with a single click. Google now provides this with their I'm Feeling Lucky button. But, there's no need to wait for that; anyone can start using the method right now.

This solution makes no attempt at security. Spoofing--luring viewers to a bogus page--just requires putting the unique id into that page. Unique ids are also an open target for spammers. There are technologically sophisticated solutions that provide security, such as the Digital Object Identifier. However, they require additional infrastructure. The goal is to locate scholarly web pages. Such pages receive little traffic and are of no interest to spoofers or spammers. Spamming might still be a problem if there were an easy way for a crawler to harvest large numbers of unique ids. Therefore, unique ids should not be chosen according to any pattern. In particular, one should not routinely use the string uid as part of a unique id. One should also not embed the id in any standard text.

You can find this page by looking for the string lamportswwwnineposter. Don't put this sequence of letters in any other document. One way to tell others about it is to call it the string obtained by joining together lamports and wwwnine and poster.

Archival References to Web Pages

Leslie Lamport Compaq Systems Research Center lamport@pa.dec.com

The Problem

The Solution

Leslie Lamport
Compaq Systems Research Center
lamport@pa.dec.com