Sharing and Caching Characteristics of Internet Content
To improve the performance of Internet content delivery, many techniques exploit sharing: repeated requests to the same object by multiple clients. One widely deployed technique isWeb proxy caching, where requests to shared objects are served from a proxy cache instead of the origin server. In this dissertation, we present a network tracing system that enables the study of application-level Internet workloads, and we present three Internet caching studies performed using workloads collected by the tracing system.
The first study investigates Web document sharing patterns from an organizational point of view. We explore the extent of document sharing both within and across organizations. We find that when clients are members of the same organization, the amount of sharing increases measurably when compared with clients that are members of different organizations. However, this increase is not large enough to have a significant impact on cache performance.
The second study explores the performance of cooperative Web proxy caching, focusing on the effectiveness of cooperation over a wide range of client population sizes. Allowing proxy caches to cooperate effectively combines the client populations served by those proxies. This provides new opportunities for sharing, and therefore offers the potential to increase cache hit rates. Overall, we find that proxy cooperation provides significant performance benefits only within limited population bounds.
The final study is motivated by the increasing availability of multimedia Internet content, such as streaming audio and video. We compare the workload characteristics of streaming-media content to traditional Web content, and we evaluate the effectiveness of proxy caching and multicast delivery for streaming-media content. We find that these multimedia workloads exhibit strong temporal locality, and we quantify the benefit it provides for caching and multicast delivery.
Finally, we present the design and implementation of our trace collection system. It uses passive network monitoring to observe all Web traffic generated by the University of Washington client population. Our system employs anonymization safeguards to protect users’ privacy. It has been deployed at the University network border for three years, and has scaled to handle a factor of three load increase during that period.