Machine Learning for Databases.
Machine learning techniques have been proposed to optimize the databases. For example, traditional empirical database optimization techniques (e.g., cost estimation, join order selection, knob tuning) cannot meet the high-performance requirement for large-scale database instances, various applications and diversified users, especially on the cloud. Fortunately, machine learning based techniques can alleviate this problem by judiciously learning the optimization strategy from historical data or explorations. In this tutorial, we categorize database tasks into three typical problems that can be optimized by different machine learning models, including (i) NP-hard problems (e.g., knob space exploration, index/view selection, partition-key recommendation for offline optimization; query rewrite, join order selection for online optimization), (ii) regression problems (e.g., cost/cardinality estimation, index/view benefit estimation, query latency prediction), and (iii) prediction problems (e.g., transaction scheduling, trend prediction). We review existing machine learning based techniques to address these problems and provide research challenges.