Skip to content

Commit 5eadce7

Browse files
committed
Check in oxql benchmark.
Add oxql field lookup benchmark. To avoid the complication of generating realistic synthetic data, we provide scripts for fetching and loading fields from a running ClickHouse instance.
1 parent eb8ae2f commit 5eadce7

6 files changed

Lines changed: 247 additions & 0 deletions

File tree

Cargo.lock

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

oximeter/db/Cargo.toml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,7 @@ expectorate.workspace = true
112112
itertools.workspace = true
113113
omicron-test-utils.workspace = true
114114
oximeter-test-utils.workspace = true
115+
rand.workspace = true
115116
slog-dtrace.workspace = true
116117
sqlformat.workspace = true
117118
sqlparser.workspace = true
@@ -150,3 +151,7 @@ doc = false
150151
[[bench]]
151152
name = "protocol"
152153
harness = false
154+
155+
[[bench]]
156+
name = "oxql"
157+
harness = false

oximeter/db/benches/README.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# Oximeter benchmarks
2+
3+
## Field lookup
4+
5+
Filtering and pivoting OxQL field labels can take a significant fraction of overall query time, so we include a benchmark focusing on field lookup. This benchmark queries all timeseries for a given table, filtering on a far-future timestamp so that we don't exercise measurement lookup. Because field lookup latency varies with the number of field tables to be combined, we include metrics that use varying numbers of field types. In the interest of benchmarking realistic queries, this benchmark doesn't generate synthetic data, but instead provides scripts for the operator to back up real field data from a running rack and restore them into a test database.
6+
7+
To fetch field data:
8+
9+
```bash
10+
$ mkdir -p /tmp/oximeter-field-bench
11+
$ oximeter/db/benches/backup_field_tables.sh /tmp/oximeter-field-bench [port]
12+
```
13+
14+
To restore into a test database. Note: take care not to restore into a real Oxide rack. For safety, the load script will fail if the destination database has nonzero rows.
15+
16+
```bash
17+
$ oximeter/db/benches/load_field_tables.sh /tmp/oximeter-field-bench [port]
18+
```
19+
20+
Then run the benchmark:
21+
22+
```bash
23+
$ cargo bench --package oximeter-db --bench oxql -- --save-baseline main
24+
```
25+
26+
To evaluate performance changes, run the benchmark using a new baseline:
27+
28+
```bash
29+
$ cargo bench --package oximeter-db --bench oxql -- --save-baseline my-branch
30+
```
31+
32+
Then compare with `critcmp`:
33+
34+
```bash
35+
$ critcmp main my-branch
36+
```
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
#!/bin/bash
2+
#
3+
# Dump ClickHouse field and schema tables to disk in native format. Run against
4+
# a test rack with realistic oximeter data. Used to capture test data for
5+
# benchmarking.
6+
#
7+
# Usage: ./backup_field_tables.sh <output_dir> [port]
8+
9+
set -euo pipefail
10+
11+
if [[ $# -lt 1 ]]; then
12+
echo "Usage: $0 <output_dir> [port]" >&2
13+
exit 1
14+
fi
15+
16+
OUTPUT_DIR="$1"
17+
PORT="${2:-9000}"
18+
DATABASE="oximeter"
19+
20+
mkdir -p "$OUTPUT_DIR"
21+
22+
# Back up field tables.
23+
#
24+
# Note: Use SELECT rather than RESTORE because we may not have access to the
25+
# remote ClickHouse's local disk, or have backups enabled at all.
26+
for table in timeseries_schema fields_{bool,i8,i16,i32,i64,ipaddr,string,u8,u16,u32,u64,uuid}; do
27+
count=$(clickhouse client --port "$PORT" \
28+
--query "SELECT count() FROM $DATABASE.$table")
29+
if [[ "$count" -eq 0 ]]; then
30+
echo "No rows in table $DATABASE.$table; skipping"
31+
continue
32+
fi
33+
output="$OUTPUT_DIR/${table}.native.gz"
34+
echo "Backing up $DATABASE.$table ($count rows) to $output"
35+
clickhouse client --port "$PORT" \
36+
--query "SELECT * FROM $DATABASE.$table FORMAT Native" \
37+
| gzip > "$output"
38+
done
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
#!/bin/bash
2+
#
3+
# Load field table backups into a fresh ClickHouse for benchmarking.
4+
# Crashes if the destination database already contains data.
5+
#
6+
# Usage: ./load_field_tables.sh <input_dir> [port]
7+
8+
set -euo pipefail
9+
10+
if [[ $# -lt 1 ]]; then
11+
echo "Usage: $0 <input_dir> [port]" >&2
12+
exit 1
13+
fi
14+
15+
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
16+
SCHEMA_DIR="$SCRIPT_DIR/../schema/single-node"
17+
18+
INPUT_DIR="$1"
19+
PORT="${2:-9000}"
20+
21+
DATABASE="oximeter"
22+
23+
# Error if database isn't empty.
24+
echo "Checking for existing data..."
25+
count=$(clickhouse client --port "$PORT" \
26+
--query "SELECT ifNull(sum(total_rows), 0) FROM system.tables WHERE database = '$DATABASE'" \
27+
2>/dev/null || echo "0")
28+
29+
if [[ "$count" -gt 0 ]]; then
30+
echo "Error: $DATABASE database already contains data ($count rows)"
31+
echo "Refusing to initialize a non-empty database."
32+
exit 1
33+
fi
34+
35+
# Initialize schema.
36+
echo "Initializing database schema..."
37+
clickhouse client --port "$PORT" --multiquery < "$SCHEMA_DIR/db-init.sql"
38+
39+
# Load backups.
40+
#
41+
# Note: Use INSERT rather than RESTORE because we may not have access to the
42+
# remote ClickHouse's local disk, or have backups enabled at all.
43+
for table in timeseries_schema fields_{bool,i8,i16,i32,i64,ipaddr,string,u8,u16,u32,u64,uuid}; do
44+
input="$INPUT_DIR/${table}.native.gz"
45+
if [[ ! -f "$input" ]]; then
46+
echo "No backup for table $table; skipping"
47+
continue
48+
fi
49+
echo "Loading $table"
50+
gunzip -c "$input" | clickhouse client --port "$PORT" \
51+
--query "INSERT INTO $DATABASE.$table FORMAT Native"
52+
done

oximeter/db/benches/oxql.rs

Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
// This Source Code Form is subject to the terms of the Mozilla Public
2+
// License, v. 2.0. If a copy of the MPL was not distributed with this
3+
// file, You can obtain one at https://mozilla.org/MPL/2.0/.
4+
5+
//! Benchmark for OxQL query performance.
6+
//!
7+
//! Tests multiple timeseries with varying numbers of field types.
8+
9+
// Copyright 2026 Oxide Computer Company
10+
11+
use criterion::BenchmarkId;
12+
use criterion::Criterion;
13+
use criterion::{criterion_group, criterion_main};
14+
use oximeter_db::Client;
15+
use oximeter_db::oxql::query::QueryAuthzScope;
16+
use rand::seq::SliceRandom;
17+
use std::net::IpAddr;
18+
use std::net::SocketAddr;
19+
use std::sync::Arc;
20+
21+
const DEFAULT_CLICKHOUSE_PORT: u16 = 9000;
22+
23+
/// Timeseries for field lookup benchmarks, with their field table counts.
24+
///
25+
/// Note: we manually select a subset of metrics, spanning the range of field table counts. If we add metrics or change the schemas of existing metrics, we'll need to revisit the selection.
26+
const FIELD_TIMESERIES: &[(&str, u8)] = &[
27+
("crucible_upstairs:flush", 1),
28+
("ddm_session:advertisements_received", 2),
29+
("virtual_machine:vcpu_usage", 3),
30+
("bgp_session:active_connections_accepted", 4),
31+
("switch_data_link:bytes_sent", 6),
32+
];
33+
34+
fn get_clickhouse_addr() -> IpAddr {
35+
std::env::var("CLICKHOUSE_ADDRESS")
36+
.ok()
37+
.and_then(|s| s.parse().ok())
38+
.unwrap_or_else(|| IpAddr::from([127, 0, 0, 1]))
39+
}
40+
41+
fn get_clickhouse_port() -> u16 {
42+
std::env::var("CLICKHOUSE_PORT")
43+
.ok()
44+
.and_then(|s| s.parse().ok())
45+
.unwrap_or(DEFAULT_CLICKHOUSE_PORT)
46+
}
47+
48+
fn get_client(rt: &tokio::runtime::Runtime) -> Arc<Client> {
49+
let ip = get_clickhouse_addr();
50+
let port = get_clickhouse_port();
51+
let addr = SocketAddr::new(ip, port);
52+
let log = slog::Logger::root(slog::Discard, slog::o!());
53+
54+
rt.block_on(async {
55+
let client = Arc::new(Client::new(addr, &log));
56+
client.ping().await.unwrap();
57+
client
58+
})
59+
}
60+
61+
// Benchmark field lookup. As of this writing, filtering and collating fields
62+
// can make up a significant proportion of overall query time, and its latency
63+
// varies with both the cardinality and the number of field tables that need to
64+
// be combined for the relevant series. Query each series in FIELD_TIMESERIES,
65+
// filtering to a future timestamp so that we only benchmark the performance of
66+
// field lookup, and ignore measurements. Note that the user is responsible for
67+
// populating ClickHouse with test data.
68+
fn oxql_field_lookup(c: &mut Criterion) {
69+
let rt = tokio::runtime::Builder::new_multi_thread()
70+
.enable_all()
71+
.build()
72+
.unwrap();
73+
74+
let client = get_client(&rt);
75+
let mut group = c.benchmark_group("oxql");
76+
77+
let mut timeseries: Vec<_> = FIELD_TIMESERIES.iter().collect();
78+
timeseries.shuffle(&mut rand::rng());
79+
80+
for (timeseries, field_tables) in timeseries {
81+
// Use a far-future timestamp to benchmark field lookup only, with no
82+
// measurements.
83+
let query =
84+
format!("get {} | filter timestamp > @2200-01-01", timeseries);
85+
86+
rt.block_on(client.oxql_query(&query, QueryAuthzScope::Fleet)).unwrap();
87+
88+
let bench_id = format!("{}/{}_tables", field_tables, timeseries);
89+
90+
group.bench_function(
91+
BenchmarkId::new("field_lookup", &bench_id),
92+
|bench| {
93+
let client = client.clone();
94+
let query = query.clone();
95+
bench.to_async(&rt).iter(|| {
96+
let client = client.clone();
97+
let query = query.clone();
98+
async move {
99+
client.oxql_query(&query, QueryAuthzScope::Fleet).await
100+
}
101+
})
102+
},
103+
);
104+
}
105+
106+
group.finish();
107+
}
108+
109+
criterion_group!(
110+
name = benches;
111+
config = Criterion::default().sample_size(50).noise_threshold(0.05);
112+
targets = oxql_field_lookup
113+
);
114+
115+
criterion_main!(benches);

0 commit comments

Comments
 (0)