ACM SIGMOD Jim Gray Dissertation Award W Talk

2020 
The increasing democratization of server hardware with multi-core CPUs and large main memories has been one of the dominant hardware trends of the last decade. "Bare metal" servers with tens of CPU cores and over 100 gigabytes of main memory have been available for several years now. Recently, this large scale hardware has also been available via the cloud. Database systems, with their roots in uniprocessors and paucity of main memory, have unsurprisingly been found wanting on modern hardware. In addition to changes in hardware, database systems have had to contend with changing application requirements and deployment environments. Database systems have long provided applications with an interactive interface, in which an application can communicate with the database over several round-trips in the course of a single request. A large class of applications, however, does not require interactive interfaces, and is unwilling to pay the performance cost associated with overly flexible interfaces. Some of these applications have eschewed database systems altogether in favor of high-performance key-value stores. Finally, modern applications are increasingly deployed at ever increasing scales, often serving hundreds of thousands to millions of simultaneous clients. These large scale deployments are more prone to errors due to consistency issues in their underlying database systems. Ever since their inception, database systems have allowed applications to tradeoff consistency for performance, and often nudge applications towards weak consistency. When deployed at scale, weak consistency exposes latent consistency-related bugs, in the same way that failures are more likely to occur at scale. Nearly every widely deployed database system provides applications with weak consistency consistency by default, and its widespread use in practice significantly complicates application development, leading to latent Heisenbugs that are only exposed in production. This dissertation proposes and explores the use of deterministic execution to address these concerns. Database systems have traditionally been non-deterministic; given an input list of transactions, the final state of the database, which corresponds to some totally ordered execution of transactions, is dependent on non-deterministic factors such as thread scheduling decisions made by the operating system and failures. Deterministic execution, on the other hand, ensures that the database's final state is always determined by its input list of transactions; in other words, the input list of transactions is the same as the total order of transactions that determines the database's state. While non-deterministic database systems expend significant resources in determining valid total orders of transactions, we show that deterministic systems can exploit simple and low-cost up-front total ordering of transactions to execute and schedule transactions much more efficiently. We show that deterministic execution enables low-overhead, highly-parallel scheduling mechanisms, that can address the performance limitations of existing database systems on modern hardware. Deterministic database systems are designed based on the assumption that applications can submit their transactions in one-shot prepared transactions, instead of multiple round-trips. Finally, we attempt to understand the fundamental reason for the observed performance differences between various consistency levels in database systems, and based on this understanding, show that we can exploit deterministic execution to provide strong consistency at a cost that is competitive with that offered by weak consistency levels.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []