Incrementally Parallelizing Database Transactions with Thread-Level Speculation Todd Mowry Carnegie Mellon University Alternative Title: Making Large Legacy Software Run Twice as Fast on a Quad-Core With Just One Month of Programmer Effort: A Case Study With BerkeleyDB A major challenge of the industry-wide shift to multicore processing is what to do with large legacy software that was not written with parallelism in mind. Ideally, the transition from sequential to parallel code would involve zero programmer effort: e.g., it would be handled automatically by a parallelizing compiler. In practice, however, this ideal remains beyond the scope of today's compiler technology for most applications. For complex applications that were developed by large teams over many years, the cost of rewriting key parts of the software nearly from scratch to introduce parallelism is prohibitively expensive. Fortunately, there is a "secret weapon" that can dramatically reduce the amount of programmer effort required to exploit parallelism: Thread-Level Speculation (TLS). Through a combination of underlying compiler and hardware support, TLS allows the programmer to optimistically create parallel threads by detecting and recovering from any resulting dependence violations. A side-effect of TLS is that it also profiles the sources of dependence violations across these speculative threads, and this information can be fed back to the programmer (or perhaps an automatic tool) to enable an iterative process of incrementally improving the parallel performance of the code. In other words, TLS is a safety net that allows the programmer to first focus on the likely macro-level sources of parallelism, and postpone worrying about lower-level correctness issues unless they prove to be performance bottlenecks. In all cases, TLS will ensure that the program behaves correctly. In this talk, we present a case study of how we used TLS to successfully extract parallelism from within individual transactions in the BerkeleyDB database management system. Our technique requires a limited number of small, localized changes to a subset of the low-level data structures in the DBMS, and these details are hidden from the transaction programmer. With only a month's worth of effort by a graduate student who was unfamiliar with the code base, we achieved roughly a twofold to threefold speedup for three of the five TPC-C transactions on a simulated quad-core processor. We believe that these encouraging results are just one example of how TLS can help programmers transform legacy software into efficient parallel software with minimal effort.