Carnegie Mellon University
Database applications are predominantly memory-intensive workloads, and their performance is substantially influenced by the memory access latency. Memory speeds, however, have fundamentally lagged behind processor speeds. Today's memory systems incur access latencies that are up to three orders of magnitude larger than the latency of a single arithmetic operation. Previous work has demonstrated that database system performance suffers from memory access delays incurred by data and instruction cache misses, and similar performance trends exist across commercial database systems. Therefore, to significantly improve database system behavior on modern processor platforms database software designers should focus on maximizing cache utilization and keep data that are likely to be referenced in the cache hierarchy.
This talk will analyze the impact of data placement on database system performance. The data placement scheme used in today's database systems "pushes" unreferenced data to caches, wasting memory bandwidth, polluting the cache, and exposing hard-to-overlap memory latencies. I will introduce a novel data placement scheme, called Partition Attributes Across (PAX). PAX eliminates unnecessary memory accesses by only bringing useful data into the cache. The experimental results on a variety of workloads show that PAX drastically reduces data-related stalls which in our experiments reduces elapsed execution time by a factor of 2. PAX is easy to implement in any traditional DBMS and can be aplied orthogonally to other storage decisions and schemes (e.g., affinity-based vertical partitioning).