This guide outlines the steps to benchmark database transfer services using a robust dataset from ClickBench. ClickBench provides a wide table with 70 columns and approximately 100 million records. Follow these steps to ensure efficient and reliable benchmarking.
-
Find a Dataset
Identify the correct dataset; most databases have their own backups. -
Prepare the Source Database
- Set up a source database on ec2 instance.
- Use a pre-production serverless runtime environment.
-
Load the Data
Import the prepared dataset into the source database, ensuring it aligns with the benchmarking scenario.
Baselines provide reference points to measure performance.
-
Define Key Metrics
- Key metric: Rows per second (preferable over bytes/s due to variable byte sizes).
-
Perform Initial Transfer
Execute an initial transfer from the source to the target database using default settings. -
Record Performance Metrics
After setting baselines, fine-tune the transfer settings for better performance.
-
Activate the Transfer
Deploye transfer via helm in your k8s cluster. -
Expose pprof for Profiling
- Expose the pprof port for profiling, by default
--run-profiler
is true.
- Expose the pprof port for profiling, by default
-
Download the pprof File
-
Visualize the Profile
- Use tools like Speedscope.
- Upload the profile to analyze call stacks.
- Use the "Left-Heavy" view to identify high-time-consuming paths.
There’s no silver bullet for performance improvement, but here are some resources:
-
One Billion Row Challenge in Golang
Techniques for IO/CPU optimization in data parsing. -
Minimizing Allocations in Golang
Focus on reducing object allocations. -
Analyzing Go Heap Escapes
Use escape analysis for up to 10% performance improvements. -
Competitor Baselines:
- If the 30-second CPU profile shows less than 30 seconds, the bottleneck is likely outside the hot path.
After optimization, write benchmarks to simulate real-world workloads.
ns/op
: Time per iteration (lower is better).B/op
: Memory allocated per iteration (lower is better).allocs/op
: Allocation count per iteration (lower is better).
MB/s
: Throughput in bytes per second (higher is better).
Example Code:
b.SetBytes(int64(totalSize * limit))
BenchmarkTextFetcher/128_rows-10 1567 756604 ns/op 86.79 MB/s 421344 B/op 10820 allocs/op
This guide provides a systematic approach to benchmarking and optimizing database transfer tools. Utilize these steps to measure, enhance, and achieve efficient data migrations.