Beyond file systems: understanding the nature of places where people store their data

MSR-TR-2013-26 |

Published by Microsoft

This paper analyzes the I/O and network behavior of a large class of home, personal and enterprise applications. Through user studies and measurements, we find that users and application developers increasingly have to deal with a de facto distributed system of specialized storage containers/file systems, each exposing complex data structures, and each having different naming and metadata conventions, caching and prefetching strategies and transactional properties. Two broad dichotomies emerge from this. First, there is tension between the traditional local file system and cloud storage containers. Local file systems have high performance, but they lack support for rich data structures, like graphs, that other storage containers provide. Second, distinct cloud storage containers provide different operational semantics and data structures. Transferring data between these containers is often lossy leading to added data management complexity for users and developers. We believe our analysis directly impacts the way users understand their data, designers build and evaluate the success of future storage systems and application developers program to APIs provided by the storage systems.