Dennis Fetterly, Maya Haridasan, Michael Isard, and Swaminathan Sundararaman
1 October 2010
In recent years, there has been an explosion of interest in computing using clusters of commodity, shared nothing computers. In this paper, we describe the design of TidyFS, a simple and small distributed file system that provides the abstractions necessary for data parallel computations on clusters. Similar to other large-scale distributed file systems such as the Google File System (GFS) and the Hadoop Distributed File System (HDFS), the prototypical workload for this file system is high-throughput, write-once, sequential I/O. The primary user visible unit of storage in this system is the stream, which is a sequence of partitions distributed across the local storage of machines in the cluster. The mapping of streams to sequences of partitions is performed by the TidyFS metadata server, which also tracks the locations of each of the partition replicas in the system, the state of each storage machine in the cluster, and per-stream and per-partition attributes. The metadata server is implemented as a state machine and replicated for scalability and fault tolerance. In addition to the metadata server, the system is comprised of a graphical user interface which enables users and administrators to view the state of the system and a small service installed on each cluster machine responsible for replication, validation, and garbage collection. Clients read and write partitions directly to get the best possible I/O performance.
Publisher Microsoft Research