Nectar: Automatic Management of Data and Computation in Datacenters

Pradeep Kumar Gunda, Lenin Ravindranath, Chandramohan A. Thekkath, Yuan Yu, and Li Zhuang

Abstract

Managing data and computation is at the heart of datacenter

computing. Manual management of data can lead to data loss,

wasteful consumption of storage, and laborious bookkeeping. Lack of

proper management of computation can result in lost opportunities to

share common computations across multiple jobs or to compute

results incrementally.

Nectar is a system designed to address the aforementioned

problems. It automates and unifies the management of data and

computation within a datacenter. In Nectar, data and computation are treated interchangeably

by associating data with its computation.

Derived datasets, which are the results of computations, are uniquely identified by the

programs that produce them, and together with their programs, are

automatically managed by a datacenter wide caching service.

Any derived dataset can be transparently regenerated

by re-executing its program, and any computation can be transparently avoided

by using previously cached results. This enables us

to greatly improve datacenter management and resource

utilization: obsolete or infrequently used derived datasets

are automatically garbage collected, and shared common

computations are computed only once and reused by others.

This paper describes the design and implementation of Nectar, and reports on

our evaluation of the system using analytic studies of

logs from several production clusters and an actual deployment on a 240-node cluster.

Details

Publication typeInproceedings
Published inProceedings of the 9th Symposium on Operating Systems Design and Implementation (OSDI)
> Publications > Nectar: Automatic Management of Data and Computation in Datacenters