By Janie Chang
October 5, 2010 2:00 PM PT
During this year’s USENIX Symposium on Operating Systems Design and Implementation, three classic research papers written or co-written by authors associated with Microsoft Research received the Association for Computing Machinery’s Special Interest Group on Operating Systems (SIGOPS) Hall of Fame Award:
The SIGOPS Hall of Fame Award was established in 2005 to recognize the most significant operating-systems research papers published at least 10 years in the past. Papers nominated for the SIGOPS Hall of Fame Award must have stood the test of time and must represent influential work in the field of computer science. Nominations are solicited from the SIGOPS membership, and a Hall of Fame Committee chooses winners based on the impact the paper has had on the field of operating-systems research.
Written while Lampson and Sturgis were at Xerox’s Palo Alto Research Center (PARC), Crash Recovery in a Distributed Data Storage System has been of considerable and far-reaching impact, evidenced by the fact that although the report was only informally circulated, it received enough nominations and accolades to be selected for the SIGOPS Hall of Fame.
“We never published it as a paper,” explains Lampson, now a technical fellow at Microsoft Research New England, “because Howard and I are terrible perfectionists. We wanted a proof for the algorithm, and it just wasn’t possible until six or seven years later, when the necessary technology became available.”
Lampson’s and Sturgis’ now-famous “two-phase commit protocol” is used today in virtually all implementations of multimachine transactions, so its impact is hard to overstate. Textbooks on distributed computing regularly describe the two-phase commit protocol.
During the 1970s, though, when the two scientists were investigating this problem at Xerox PARC, commercial computers were not networked, and researchers were just starting to envision the need to ensure the integrity of database transactions over multiple systems.
“Back in the ’70s, Xerox PARC owned one of the few networked computer systems,” Lampson recalls, “so the problem was staring us in the face: How could we handle transactions where components are distributed over multiple systems, and, just as importantly, how could we make it a highly reliable scheme so that you could get the data onto a disk and recover it years later?”
Their work became an underground classic in its field, the cornerstone of early distributed database systems. The two-phase commit protocol was a key feature of transactional middleware products, and by the 1990s, it was considered essential to distributed computing and implemented in operating systems such as Digital’s VMS and Windows NT.
When asked how he feels about SIGOPS honoring this report, Lampson smiles.
“It is a pleasure,” he says, “to see our unpublished work get official recognition after all these years of unofficial recognition.”
Using Encryption for Authentication in Large Networks of Computers is the seminal paper on protocols for remote key distribution. Aimed at providing both authentication and secrecy, it inspired research that has generated a large body of work on the design and analysis of security protocols. Since its publication in 1978, the paper has been on the required-reading list for computer-security courses, in which the Needham-Schroeder protocol often is used to introduce the basic concepts of cryptographic protocols, including the notion of a trusted third party for authentication and key exchange.
The collaboration between Schroeder, now assistant director of Microsoft Research Silicon Valley, and the late Roger Needham, founding director of Microsoft Research Cambridge, began when the two met at the Massachusetts Institute of Technology, where Schroeder was a graduate student and Needham a visiting professor from the University of Cambridge. They stayed in touch, and in 1977, shortly after Schroeder joined Xerox PARC, Needham took a sabbatical to spend six months at PARC.
Schroeder recalls how the project began.
“Xerox PARC invented personal computers and also the Ethernet, which we used to connect workstations together,” he says, “so we had the opportunity to study how data packets between workstations could be tampered with, eavesdropped, or modified in a way that made it difficult to know who you were really communicating with.
“Encryption techniques have been known since before World War II, but in order to use encryption, you had to share a secret key outside the system. In those days of smaller networks and low data-traffic volume, it was feasible to manually share secret pairwise keys. But we knew that with larger networks and higher volumes, manual exchange would be completely impractical.
“So at our first meeting to decide what Roger would do during his time at PARC, he said, ‘All right, what should we work on?’ And I said: ‘Well, there’s this problem with authentication. How do you share keys automatically in a way that both parties can know unequivocally who was talking to whom?’
“The research was mostly analytical thinking and constant iterations, applying Occam’s razor to strip out avoidable steps, and exploring modifications to the protocol, such as a variant that worked using public keys. As it turned out, this paper was the first place where such a protocol was written and analyzed, and where the case was made for why such services should be part of the Internet.”
Needham and Schroeder recognized there was a lot more work ahead following this initial foray into a complex discipline. The paper’s conclusion included a warning that protocols such as the ones they had developed were prone to subtle errors that would be difficult to detect in normal operation. They encouraged other researchers to develop techniques to verify such protocols.
As an example, it become apparent that the Needham-Schroeder protocol contained a slight error involving time stamps and that, because of that, it was possible to carry out a replay attack—to play a recording of the transaction at a later time and have it appear to be still timely. It was a famous error, although it didn’t alter the fundamental structure of the solution.
“But it led to a lot of researchers working on variants to fix different problems,” Schroeder grins, “so I think a lot of people had a lot of fun.”
Andrew Herbert, Microsoft distinguished engineer and managing director of Microsoft Research Cambridge, is delighted that his predecessor is being honored.
“It is marvelous news,” he says, “to see the work by Needham and Schroeder recognized as a seminal contribution to the field of computer systems. Roger Needham was much admired, both as a man of great character and for his scientific reputation, significantly arising from his work on authentication protocols.”
The Recovery Manager of the System R Database Manager provided the first complete description of a recovery manager: its components and how they worked. Most importantly, the paper documented where the system went wrong and the lessons learned, providing future researchers valuable insight into right and wrong paths. Written by a team of eight co-authors, the paper’s lead author was the late Jim Gray, who later became a technical fellow at Microsoft Research Silicon Valley.
“This was the first relational database system,” he recalls, “that included a robust recovery system, which demonstrated that relational databases were not only good for improved programmer productivity, but also that they could be used to run commercial-transaction workloads. Log-based recovery systems like System R’s are now present in virtually all database products that are used for transaction processing.
“What I remember most about the paper is the evaluation section, where they presented performance numbers and lessons learned. They didn’t mince words about what worked, what didn’t, and what they regretted leaving out. Most of that advice still holds true today.”