
We are building a symbiotic, serverless, distributed file system. As a file system, its purpose is for storing files. It's distributed in that it runs on multiple machines, not just a single machine. It's serverless, meaning that it does not make use of a central server or a cluster of servers; it runs entirely on client machines. And it's symbiotic, meaning that it works among cooperating but not completely trusting clients.
Logically, the system functions as a central file server, but physically, there is no central server machine. Instead, a group of desktop client computers collaboratively establish a virtual file server that can be accessed by any of the clients.
There are three major goals of the Farsite project....
The first goal is to provide highly available and reliable file service while running on ordinary desktop computers, given that they have unpredictable lifetimes, low availability relative to server machines, and heterogeneous hardware and software.
The second goal is to ensure user privacy and data integrity in an environment with no centrally trusted authority and in which the operating-system software of a fraction of the machines in the system may be compromised.
The third goal is to construct a system that is automatically configuring and adaptively self-tuning, so it can react appropriately to component failures, usage variations, or environmental changes, with neither a central trusted administrator to make decisions nor a central, up-to-date repository of information upon which to base decisions.
Our prototypical target is a large company or university, meaning an organization with around 105 machines, storing around 1010 files, containing around 1016 bytes of data. We assume that the machines are interconnected by a high-bandwidth, low-latency, switched network. Also, at least for our initial version, we are assuming no significant geographical differences among machines.
There are several advantages. The system will provide a global name space for files, location-transparent access to both private files and shared public files, and improved reliability relative to storing files on a desktop workstation.
The main concern with storing files on your desktop is susceptibility to failures. For example, if you are away from your desk, and you try to access a file remotely, your desktop system must be running, it must not be too heavily loaded, and there must be an unbroken network connection between your access machine and your desktop machine. A router error, a congested network segment, or a hiccup on your desktop machine is all it takes for your remote file access to fail.
Also, if your desktop machine has a disk head crash, you might lose all of your data. Head crashes are not extremely common, but they are not all that uncommon either. Chances are, you know someone who knows someone who has suffered through the consequences of a head crash. They're a serious enough issue that RAID technology was invented primarily to deal with this problem.
Central servers have several disadvantages....
Servers are expensive, since they commonly employ specialized hardware: High-performance I/O is needed since the ratio of clients to servers is large, and the server has to be able to support the peak load placed on it by all clients. RAID disks are often used to increase the reliability of data storage and reduce the system's susceptibility to a disk head crash.
Servers are vulnerable to geographically localized faults. Clustering technology can eliminate a single point of failure, but server clusters still represent a single site of failure that can be compromised by a bad UPS, a failed router, a broken network link.
Servers rely on trusted system administrators. They are needed to authorize access, since there has to be someone to decide who can use the server's resources and how much of those resources each person is allowed to use. They are also needed to perform reliability functions, such as regular backups.
No. No central administrator is needed to authorize access, since the owner of each machine decides who can use the resources of that machine. In principle, each owner can enter into contracts with other machine owners to negotiate the mutual use of their machines' resources. In practice, we expect that we will have the system enforce a simple resource-sharing policy, namely that the amount of space each client can use is proportional to the amount of space that client contributes to the system. So, if you need more storage space, you don't get it by calling the sys-admin and begging for a larger quota; you do it by buying a larger hard drive and plugging it into your desktop computer.
No. One of the main goals of Farsite is to enable cooperating but not-completely-trusting clients to collaboratively store files. Our architecture allows for a portion of the network and/or a fraction of the client machines to be compromised without compromising the privacy, integrity, or reliability of the files stored by the system.
It will look like a file server. At least in the initial incarnation, the top-level directory is accessed via a drive letter (e.g., "F:"). This virtual drive contains a directory hierarchy which is used to access all files stored in the Farsite system. Each user can access only those files and directories for which he/she has access permission.
The basic strategy is first to encrypt the contents of each file, then to make multiple replicas of the encrypted file, and finally to distribute the encrypted replicas to several client machines.
Encryption prevents an unauthorized user from reading a file, even if the file happens to be stored on that user's desktop machine. Farsite uses digital signatures for authentication, which prevents an unauthorized user from writing a file. By making multiple replicas, and by distributing those replicas in a secure fashion, Farsite makes it difficult for a malicious user to destroy every copy of any given file.
A second benefit of replication is improving file availability. When you attempt to access a file, there is some probability that one (or more) of the machines that store that file will be down, heavily loaded, or inaccessible at that moment. However, with multiple distributed file replicas, there is a greater likelihood that at least one machine that holds a copy of the target file is up, unloaded, and accessible.
A third benefit of replication is improving file reliability. If the disk on your desktop machine has a head crash, you will lose all of the files that are stored only on that disk. However, with Farsite, your files will be replicated to disks on other machines, so the loss of a single disk is not critical.
Farsite includes a directory service that is implemented in a distributed fashion, such that the data for each directory is replicated among several client machines. Whereas the integrity of file data is guaranteed by digital signatures (so we only need enough file replicas to ensure a high degree of availability), the integrity of file meta-data depends on the integrity of the parent directory, which can be undetectably compromised by the machines that house the directory replicas. Therefore, the number of directory replicas is significantly higher than that of file replicas, and the directory replicas communicate using a Byzantine agreement algorithm, which protects them from attacks by a fraction of the machines holding the replicas.
They shouldn't. Each machine has a local on-disk cache that contains the files most recently accessed by the machine. According to our feasibility study, this cache can reduce the number of remote file accesses to around ten per day for the average user. The vast majority of accesses will be to files contained in the local cache.
The local cache does not have a fixed size; instead, it stores all files last accessed within a specified period of time that we call the "cache retention period," which makes the cache size effectively self-tuning: Machines with little local file activity will have relatively small caches, and machines with a lot of local file activity will have relatively large caches. According to our feasibility study, the average cache size will be about one quarter of the local disk space.
Not necessarily. Replicating files will reduce the available disk space, but Farsite also employs a technique that increases the available disk space. If two or more files stored in the system happen to have identical contents, Farsite coalesces the files into the space needed to store a single file. In part, it does this using the Single Instance Store technology in Windows 2000, which automatically detects and coalesces identical files on a single disk volume, and which automatically separates coalesced files when one of them is modified, in order to maintain the semantics of separate files.
There is a great degree of file duplication among people's machines, since many of us use the same application programs and since we often share documents with our immediate co-workers. By eliminating this incidental and haphazard content duplication, Farsite can reclaim space that it can then use for storing a controlled number of file replicas. The amount of duplication is a function of the population size. According to our feasibility study, a population of around 250 randomly selected users is sufficient to reclaim half of the disk space in use.
Farsite encrypts files using a technique called convergent encryption, which allows the detection and coalescing of identical files even when these files are encrypted with separate keys. Rather than enciphering the contents of a user's files directly with the user's key, the contents of each file are one-way hashed, and the resulting hash value is used as a key for enciphering the file contents. The user's key is then used to encipher the hash value, and this enciphered value is attached to the file as meta-data. The user decrypts a file by first deciphering the hash value and then deciphering the file using the hash value as a key. With this approach, two files with identical plaintext will also have identical ciphertext, irrespective of the users' keys that encrypt them.
Farsite includes many automatic tuning mechanisms: The local-cache size adapts to the level of file-access activity on each machine. Files are relocated as machine availability changes. When a new machine joins the system, its files are replicated to other machines, and other file replicas are relocated to the new machine. When a machine dies or leaves the system, new replicas are generated to replace the replicas on the departing machine. The number of replicas is adjusted to maintain an appropriate degree of availability and reliability.
For the most part, we are not attempting to optimize disk space. It is true that one thing Farsite enables is the use of disk-space sharing, and it is also true that the purpose of Farsite's duplicate coalescing is to reclaim used disk space, but this is not the primary concern of our project. Our main concerns are providing location-transparence, availability, reliability, security, fault tolerance, and adaptive self-tuning.
We do not yet have an architecture for performing distributed backup. However, it's worth noting that there is much less need to backup files in Farsite than there is on a desktop workstation or file server. Since there are several complete copies of each file, the file reliability is far greater than it is with only a single copy (or even the 1 + 1/N copies on a RAID system). Furthermore, Farsite's on-line redundant storage is immediately accessible and verifiable, unlike off-line backups. It is our hope that we can design in sufficient reliability that the only need for backup will be for archive purposes, rather than for reliability purposes.
Yes. There have been a couple of prior serverless distributed file systems: xFS, developed as part of the NOW project at Berkeley, and Frangipani, developed on top of the Petal distributed virtual disk at DEC SRC. However, neither of these systems addresses the problem of an untrusted infrastructure or doing without a central administrator.