Automatic Database Management System Tuning Through Large-scale Machine Learning

Problem

To tune the configurations of a DBMS using ML models.

The tuner must have administrative privileges to modify the DBMS's configurations.
The cost of restarting a DBMS is ignored.
The physical design is reasonable.
- Proper indexes, materialized views, other database elements have been installed.

Overview

OtterTune collects the internal metrics because those metrics directly relate to the knobs and more predictable when tuning knobs.

Some metrics may redundant because

Steps:

Build a matrix \(X\) where \(X_{ij}\) represents the value of metric \(i\) on configuration set \(j\)
Performs Factor Analysis to reduce the dimension of \(X\) to \(U\) where \(U_{ij}\) represents the value of metric \(i\) on the \(j\)-th factor
Performs k-means clustering and pick up only the most representative metric in each cluster
- \(K\) is determined by a heuristic algorithm without human intervention

Example Results:

Picking Useful Metrics

Use LASSO to evaluate the impact of each knobs
- \(X\): knobs
- \(y\): metrics
- The most common feature selection algorithm
- Computationally efficient
Includes polynomial features to test if there is dependency between two knobs
- For example, product "Buffer Pool Size" and "Log Buffer Size" as a feature to see if LASSO pick up this feature
Use incremental approach (gradually increase the number of selected knobs/features and check the effectiveness)

Example Results:

LASSO

Steps

Interesting insights

Uses not only external metrics but also internal metrics for evaluating the performance of a configuration
The way of picking up the useful metrics

How do they use the dependencies between knobs? Do those become features?
- Not sure
Do they use the variance given by Gaussian Process?
- They use the variance as the confidence level
Does OtterTune use any workload information such as queries or transactions for tuning?
- No