Publications

SIGMETRICS 2000 Feasibility of a Serverless Distributed File System Deployed on an Existing Set of Desktop PCs
4th USENIX WSS Single Instance Storage in Windows® 2000
4th IEEE MASCOTS Large-Scale Simulation of Replica Placement Algorithms for a Serverless Distributed File System
9th ESA Modeling Replica Placement in a Distributed File System: Narrowing the Gap between Analysis and Simulation
15th DISC Competitive Hill-Climbing Strategies for Replica Placement in a Distributed File System
4th IEEE SRDS Optimizing File Availability in a Secure Serverless Distributed File System
1st IPTPS The Sybil Attack
USENIX 2002 Cooperative Task Management without Manual Stack Management
22nd ICDCS Reclaiming Space from Duplicate Files in a Serverless Distributed File System
5th OSDI FARSITE: Federated, Available, and Reliable Storage for an Incompletely Trusted Environment
18th ACSAC A Secure Directory Service based on Exclusive Encryption

 

Details

Conference

SIGMETRICS 2000

Title

Feasibility of a Serverless Distributed File System Deployed on an Existing Set of Desktop PCs

Authors

William J. Bolosky, John R. Douceur, David Ely, and Marvin Theimer

Abstract

We consider an architecture for a serverless distributed file system that does not assume mutual trust among the client computers. The system provides security, availability, and reliability by distributing multiple encrypted replicas of each file among the client machines. To assess the feasibility of deploying this system on an existing desktop infrastructure, we measure and analyze a large set of client machines in a commercial environment. In particular, we measure and report results on disk usage and content; file activity; and machine uptimes, lifetimes, and loads. We conclude that the measured desktop infrastructure would passably support our proposed system, providing availability on the order of one unfilled file request per user per thousand days.

Keywords

Serverless distributed file system architecture, personal computer usage data, availability, reliability, security, trust, workload characterization, analytical modeling, feasibility analysis.

Citation W. J. Bolosky, J. R. Douceur, D. Ely, M. Theimer; Proceedings of the international conference on Measurement and modeling of computer systems, 2000, pp. 34-43
Definitive Version Link to ACM Digital Library version of paper

Download

Document in PDF formatDocument in Postscript format

Notice This copy is posted by permission of the ACM and may not be redistributed.

 

Conference

4th USENIX Windows Systems Symposium

Title

Single Instance Storage in Windows® 2000

Authors

William J. Bolosky, Scott Corbin, David Goebel, and John R. Douceur

Abstract

Certain applications, such as Windows 2000's Remote Install service, can result in a set of files in which many different files have the same content. Using a traditional file system to store these files separately results in excessive use of disk and main memory file cache space. Using hard or symbolic links would eliminate the excess resource requirements, but changes the semantics of having separate files, in that updates to one "copy" of a file would be visible to users of another "copy." We describe the Single Instance Store (SIS), a component within Windows® 2000 that implements links with the semantics of copies for files stored on a Windows 2000 NTFS volume. SIS uses copy-on-close to implement the copy semantics of its links. SIS is structured as a file system filter driver that implements links and a user level service that detects duplicate files and reports them to the filter for conversion into links. Because SIS links are semantically identical to separate files, SIS creates them automatically when it detects files with duplicate contents. This paper describes the design and implementation of SIS in detail, briefly presents measurements of a remote install server showing a 58% disk space savings by using SIS, and discusses other possible uses of SIS.

Citation W. J. Bolosky, S. Corbin, D. Goebel, J. R. Douceur; Proceedings of the 4th USENIX Windows Systems Symposium, 2000, pp. 13-24

Download

Document in PDF formatDocument in Postscript format

Notice The original publication of this paper was granted to USENIX.  Copyright to this work is retained by the authors.  Permission is granted for the noncommercial reproduction of the complete work for educational or research purposes.

 

Conference

9th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS)

Title

Large-Scale Simulation of Replica Placement Algorithms for a Serverless Distributed File System

Authors

John R. Douceur and Roger P. Wattenhofer

Abstract

Farsite is a scalable, distributed file system that logically functions as a centralized file server but that is physically implemented on a set of client desktop computers. Farsite provides high degrees of reliability and availability by storing replicas of files on multiple machines. Replicas are placed to maximize the effective system availability, using a distributed, iterative, randomized placement algorithm. We perform a large-scale simulation of three candidate algorithms using machine availability data collected from over 50,000 desktop computers. We find that algorithmic efficiency and placement efficacy run counter to each other. We fit analytic functions to the improvement rates and provide explanations for the fitted curves. We explore the algorithms’ properties through study of their dynamic behavior. We visualize algorithmic placements and compare them to theoretical worst cases. We quantify the degree of machine failure correlation and develop a formula to approximate its effect.

Citation J. R. Douceur, R. P. Wattenhofer; Proceedings of 9th IEEE MASCOTS, 2001, pp. 311-319

Download

Document in PDF formatDocument in Postscript format

Notice © 2001 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

 

Conference

9th Annual European Symposium on Algorithms

Title

Modeling Replica Placement in a Distributed File System: Narrowing the Gap between Analysis and Simulation

Authors

John R. Douceur and Roger P. Wattenhofer

Abstract

We examine the replica placement aspect of a distributed peer-to-peer file system that replicates and stores files on ordinary desktop computers. It has been shown that some desktop machines are available for a greater fraction of time than others, and it is crucial not to place all replicas of any file on machines with low availability. In this paper we study the efficacy of three hill-climbing algorithms for file replica placement. Based on large-scale measurements, we assume that the distribution of machine availabilities be uniform. Among other results we show that the MinMax algorithm is competitive, and that for growing replication factor the MinMax and MinRand algorithms have the same asymptotic worst-case efficacy.

Citation J. R. Douceur, R. P. Wattenhofer; Proceedings of 9th ESA, 2001, pp. 356-367

Download

Document in PDF formatDocument in Postscript format

Tech Report
(Extended Version)
Document in PDF formatDocument in Postscript format
Notice © Springer-Verlag.

 

Conference

15th International Symposium on Distributed Computing

Title

Competitive Hill-Climbing Strategies for Replica Placement in a Distributed File System

Authors

John R. Douceur and Roger P. Wattenhofer

Abstract

The Farsite distributed file system stores multiple replicas of files on multiple machines, to provide file access even when some machines are unavailable. Farsite assigns file replicas to machines so as to maximally exploit the different degrees of availability of different machines, given an allowable replication factor R. We use competitive analysis and simulation to study the performance of three candidate hill-climbing replica placement strategies, MinMax, MinRand, and RandRand, each of which successively exchanges the locations of two file replicas. We show that the MinRand and RandRand strategies are perfectly competitive for R = 2 and 2/3-competitive for R = 3. For general R, MinRand is at least 1/2-competitive and RandRand is at least 10/17-competitive. The MinMax strategy is not competitive. Simulation results show better performance than the theoretic worst-case bounds.

Citation J. R. Douceur, R. P. Wattenhofer; Proceedings of 15th DISC, 2001, pp. 48-62

Download

Document in PDF formatDocument in Postscript format

Tech Report
(Reformatted)
Document in PDF formatDocument in Postscript format
Notice © Springer-Verlag.

 

Conference

20th Symposium on Reliable Distributed Systems

Title

Optimizing File Availability in a Secure Serverless Distributed File System

Authors

John R. Douceur and Roger P. Wattenhofer

Abstract

Farsite is a secure, scalable, distributed file system that logically functions as a centralized file server but that is physically realized on a set of client desktop computers. Farsite provides security, reliability, and availability by storing replicas of each file on multiple machines. It continuously monitors machine availability and relocates replicas as necessary to maximize the effective availability of the system. We evaluate several replica-placement methods using large-scale simulation with machine availability data from over 50,000 desktop computers. We find that initially placing replicas in an availability-sensitive fashion yields pathological results, whereas very good results are obtained by random initial placement followed by incremental improvement using a scalable, distributed, fault-tolerant, and attack-resistant hill-climbing algorithm. The algorithm is resilient to severe restrictions on communication and replica placement, and it does not excessively co-locate replicas of different files on the same set of machines.

Citation J. R. Douceur, R. P. Wattenhofer; Proceedings of 20th IEEE SRDS, 2001, pp. 4-13

Download

Document in PDF formatDocument in Postscript format

Notice © 2001 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

 

Conference

1st International Workshop on Peer-to-Peer Systems

Title

The Sybil Attack

Author

John R. Douceur

Abstract

Large-scale peer-to-peer systems face security threats from faulty or hostile remote computing elements. To resist these threats, many such systems employ redundancy. However, if a single faulty entity can present multiple identities, it can control a substantial fraction of the system, thereby undermining this redundancy. One approach to preventing these “Sybil attacks” is to have a trusted agency certify identities. This paper shows that, without a logically centralized authority, Sybil attacks are always possible except under extreme and unrealistic assumptions of resource parity and coordination among entities.

Citation J. R. Douceur; Proceedings of 1st IPTPS, 2002

Download

Document in PDF formatDocument in Postscript format

 

Conference

USENIX 2002 Annual Technical Conference

Title

Cooperative Task Management without Manual Stack Management

Authors

Atul Adya, Jon Howell, Marvin Theimer, William J. Bolosky, and John R. Douceur

Abstract

Cooperative task management can provide program architects with ease of reasoning about concurrency issues. This property is often espoused by those who recommend “event-driven” programming over “multi-threaded” programming. Those terms conflate several issues. In this paper, we clarify the issues, and show how one can get the best of both worlds: reason more simply about concurrency in the way “event-driven” advocates recommend, while preserving the readability and maintainability of code associated with “multithreaded” programming.

We identify the source of confusion about the two programming styles as a conflation of two concepts: task management and stack management. Those two concerns define a two-axis space in which “multithreaded” and “event-driven” programming are diagonally opposite; there is a third “sweet spot” in the space that combines the advantages of both programming styles. We point out pitfalls in both alternative forms of stack management, manual and automatic, and we supply techniques that mitigate the danger in the automatic case. Finally, we exhibit adaptors that enable automatic stack management code and manual stack management code to interoperate in the same code base.

Citation A. Adya, J. Howell, M. Theimer, W. J. Bolosky, J. R. Douceur; Proceedings of USENIX 2002 Annual Technical Conference

Download

Document in PDF formatDocument in Postscript format

Notice The original publication of this paper was granted to USENIX.  Copyright to this work is retained by the authors.  Permission is granted for the noncommercial reproduction of the complete work for educational or research purposes.

 

Conference 22nd International Conference on Distributed Computing Systems
Title Reclaiming Space from Duplicate Files in a Serverless Distributed File System
Authors John R. Douceur, Atul Adya, William J. Bolosky, Dan Simon, and Marvin Theimer
Abstract The Farsite distributed file system provides availability by replicating each file onto multiple desktop computers. Since this replication consumes significant storage space, it is important to reclaim used space where possible. Measurement of over 500 desktop file systems shows that nearly half of all consumed space is occupied by duplicate files. We present a mechanism to reclaim space from this incidental duplication to make it available for controlled file replication. Our mechanism includes 1) convergent encryption, which enables duplicate files to coalesced into the space of a single file, even if the files are encrypted with different users’ keys, and 2) SALAD, a Self-Arranging, Lossy, Associative Database for aggregating file content and location information in a decentralized, scalable, fault-tolerant manner. Large-scale simulation experiments show that the duplicate-file coalescing system is scalable, highly effective, and fault-tolerant.
Citation J. R. Douceur, A. Adya, W. J. Bolosky, D. Simon, M. Theimer; Proceedings of 22nd IEEE ICDCS, 2002.
Download Document in PDF formatDocument in Postscript format
Tech Report
(Extended Version)
Document in PDF formatDocument in Postscript format
Notice © 2002 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

 

Conference

5th Symposium on Operating Systems Design and Implementation

Title

FARSITE: Federated, Available, and Reliable Storage for an Incompletely Trusted Environment

Authors

Atul Adya, William J. Bolosky, Miguel Castro, Gerald Cermak, Ronnie Chaiken, John R. Douceur, Jon Howell, Jacob R. Lorch, Marvin Theimer, Roger P. Wattenhofer

Abstract

Farsite is a secure, scalable file system that logically functions as a centralized file server but is physically distributed among a set of untrusted computers. Farsite provides file availability and reliability through randomized replicated storage; it ensures the secrecy of file contents with cryptographic techniques; it maintains the integrity of file and directory data with a Byzantine-fault-tolerant protocol; it is designed to be scalable by using a distributed hint mechanism and delegation certificates for pathname translations; and it achieves good performance by locally caching file data, lazily propagating file updates, and varying the duration and granularity of content leases. We report on the design of Farsite and the lessons we have learned by implementing much of that design.

Citation A. Adya, W. J. Bolosky, M. Castro, G. Cermak, R. Chaiken, J. R. Douceur, J. Howell, J. R. Lorch, M. Theimer, R. P. Wattenhofer; Proceedings of the 5th OSDI, December 2002.

Download

Document in PDF formatDocument in Postscript format

Notice The original publication of this paper was granted to USENIX.  Copyright to this work is retained by the authors.  Permission is granted for the noncommercial reproduction of the complete work for educational or research purposes.

 

Conference

18th Annual Computer Security Applications Conference

Title

A Secure Directory Service based on Exclusive Encryption

Authors

John R. Douceur, Atul Adya, Josh Benaloh, William J. Bolosky, Gideon Yuval

Abstract

We describe the design of a Windows file-system directory service that ensures the persistence, integrity, privacy, syntactic legality, and case-insensitive uniqueness of the names it indexes. Byzantine state replication provides persistence and integrity, and encryption imparts privacy. To enforce Windows’ baroque name syntax – including restrictions on allowable characters, on the terminal character, and on several specific names – we develop a cryptographic process, called “exclusive encryption,” that inherently excludes syntactically illegal names and that enables the exclusion of case-insensitively duplicate names without access to their plaintext. This process excludes entire names by mapping the set of allowed strings to the set of all strings, excludes certain characters through an amended prefix encoding, excludes terminal characters through varying the prefix coding by character index, and supports case-insensitive comparison of names by extracting and encrypting case information separately. We also address the issues of hiding name-length information and access-authorization information, and we report a newly discovered problem with enforcing case-insensitive uniqueness for Unicode names.

Citation J. R. Douceur, A. Adya, J. Benaloh, W. J. Bolosky, G. Yuval; Proceedings of the 18th ACSAC, December 2002.

Download

Document in PDF formatDocument in Postscript format

Notice © 2002 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.