Main-Memory Join Algorithms: Sort or Hash?

Speaker  Jens Teubner

Affiliation  TU Dortmund University

Host  Ken Eguro

Duration  01:17:37

Date recorded  1 May 2013

The classical wisdom is that hashing is preferred method to implement joins in main memory. But this wisdom is now many years, if not decades, old and hardware has evolved considerably in the meantime.

In this talk I will discuss join strategies for execution in main memory, including hash and sort-merge variants. The runtime characteristics of either strategy depends critically on a suitable implementation that respects the intricacies of modern hardware architectures. I will show how hash and sort-merge joins can be implemented to maximally benefit from hardware features like vector processing (SIMD), multi-level caches, multi-core parallelism, or NUMA-style memory arrangement. And I will point to pitfalls that could mis-guide conclusions about the “best” join implementation strategy.

The join implementations discussed outperform the state of the art in join processing by several factors. Experiments on modern hardware platforms indicate that the sort-merge strategy is about to surpass hashing in upcoming hardware architectures—which might put classical wisdom upside-down.

©2013 Microsoft Corporation. All rights reserved.
> Main-Memory Join Algorithms: Sort or Hash?