Nobody ever got fired for using Hadoop on a cluster

The norm for data analytics is now to run them on commodity

clusters with MapReduce-like abstractions. One only

needs to read the popular blogs to see the evidence of this.

We believe that we could now say that "nobody ever got fired

for using Hadoop on a cluster"!

We completely agree that Hadoop on a cluster is the

right solution for jobs where the input data is multi-terabyte

or larger. However, in this position paper we ask if this is

the right path for general purpose data analytics? Evidence

suggests that many MapReduce-like jobs process relatively

small input data sets (less than 14 GB). Memory has reached

a GB/$ ratio such that it is now technically and financially

feasible to have servers with 100s GB of DRAM. We therefore

ask, should we be scaling by using single machines with

very large memories rather than clusters? We conjecture that,

in terms of hardware and programmer time, this may be a

better option for the majority of data processing jobs.

hotcbp12 final.pdf
PDF file

In  1st International Workshop on Hot Topics in Cloud Data Processing (HotCDP 2012)

Publisher  ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.


> Publications > Nobody ever got fired for using Hadoop on a cluster