VC3: Trustworthy Data Analytics in the Cloud

MSR-TR-2014-39 |

We present VC3, the first practical framework that allows users to run distributed MapReduce computations in the cloud while keeping their code and data secret, and ensuring the correctness and completeness of their results. VC3 runs on unmodified Hadoop, but crucially keeps Hadoop, the operating system and the hypervisor out of the TCB; thus, confidentiality and integrity are preserved even if these large components are compromised. VC3 relies on SGX processors to isolate memory regions on individual computers, and to deploy new protocols that secure distributed MapReduce computations. VC3 optionally enforces region self-integrity invariants for all MapReduce code running within isolated regions, to prevent attacks due to unsafe memory reads and writes. Experimental results on common benchmarks show that VC3 performs well compared with unprotected Hadoop; VC3’s average runtime overhead is negligible for its base security guarantees, 4.5% with write integrity and 8% with read/write integrity.