Tastes Great! Less Filling! High Performance and Accurate Training Data Collection for Self-Driving Database Management Systems
- Authors: Matthew Butrovich, Wan Shen Lim, Lin Ma, John Rollinson, William Zhang, Yu Xia, Andrew Pavlo
- Institute: Carnegie Mellon University, Army Cyber Institute, Massachusetts Institute of Technology
- Published at SIGMOD'22
- Paper Link: https://dl.acm.org/doi/10.1145/3514221.3517845
Background
A self-driving DBMS usually contains a behavior modeling module which predicts the cost of a database action on a given workload.
The module needs a set of training data to train, so the system needs a method to collect these data.
Motivation
Current training data collection scheme:
- Offline
- Method 1: Cloning a database and simulate an existing workload trace
- Con:
- Cloning a database is time-consuming
- Recording workload trace is also not easy
- Con:
- Method 2: Running a new database with synthetic queries
- Con:
- Needs extra time for simulating workloads (maybe days or weeks) to generate enough data for robustness
- Cannot capture the real metrics of online environments
- Con:
- Method 1: Cloning a database and simulate an existing workload trace
- Online
- Con: overhead too high
Needs an online method with low overhead
Problem
Requirements
- Needs a method that collects internal features (CPU time, # of concurrent workers, info of GC etc.)
- External features are not accurate
- Needs a method that collects metrics in kernel-space.
- Collecting metrics in user-space is expensive due to the overhead of system calls and I/O
Method
- How to collect metrics in kernel-space with low overhead?
- By writing a kernel module using Berkley Packet Filter (BPF) library, which allows a user to write a kernel module without knowing much kernel knowledge.
- Pro:
- Has OS-level privilege
- No need to run DBMS using root privilege
- Faster
- Pro:
- By writing a kernel module using Berkley Packet Filter (BPF) library, which allows a user to write a kernel module without knowing much kernel knowledge.