Everything I Know About Fast Databases I Learned at the Dog Track

11 Mar
Computer Science Colloquium Series
Andy Pavlo, Postdoctoral Researcher, Brown Univeresity
Monday, March 11, 2013 - 4:00pm to 5:00pm
Maxwell Dworkin G125 (Refreshments at 3:30 outside MD G125)

An emerging class of distributed database management systems (DBMS), known as NewSQL, provide the same scalable performance of NoSQL systems while maintaining the consistency guarantees of a traditional, single-node DBMS. These NewSQL systems achieve high throughput rates for data-intensive applications by storing their databases in a cluster of main memory partitions. This partitioning enables them to eschew much of the legacy, disk-oriented architecture that slows down traditional systems, such as heavy-weight concurrency control algorithms, thereby allowing for the efficient execution of single-node transactions. But many applications cannot be partitioned such that all of their transactions execute in this manner; these multi-node transactions require expensive coordination that inhibits performance. Thus, without intelligent methods to overcome these impediments, a NewSQL DBMS will scale no better than a traditional DBMS.

In this talk, I will present our research on integrating machine learning techniques to improve the performance of fast database systems that is inspired by my adventures at greyhound racing tracks. In particular, I will discuss my work on the H-Store parallel, main memory transaction processing system. I will first describe the Houdini framework that uses Markov models to predict transactions’ behaviors to allow a DBMS to selectively enable runtime optimizations. I will then present Hermes, a method for the deterministic execution of speculative transactions whenever a DBMS stalls because of distributed transactions. Together, these projects enable H-Store to support transactional workloads that are beyond what single-node systems can handle.