Skip to content

Commit d3ff1f1

Browse files
connortsui20claude
andauthored
[claude] feat(bench): emit v3 JSONL records and dual-write to bench server (#7780)
## Summary Prototype website: http://ec2-18-219-54-101.us-east-2.compute.amazonaws.com:3000/ This is the first step we should make before we cut over to the new benchmarks website on #7643 This PR allows the CI actions to additionally post data to a server (on my EC2 instance for now). We want to check that this actually works before we start using this for all of our CI. Note that this does NOT change how the current benchmarks website works, as this just does a few extra things on top of that. Also for reviewers, even though this looks like 1k LoC I think the logic here is not that hard to review, a lot of this is boilerplate you can skim over. Below is a bunch of AI-generated description: read at your own discretion. <details> Brings the v3 emitter and CI dual-write plumbing from `ct/benchmarks-v3` onto `develop` **without** the v3 server/website code. CI continues to write v2 results to S3 unchanged; v3 ingest is a side channel that no-ops until the deploy track sets `vars.V3_INGEST_URL`. This is item 2 ("CI ingestion wiring") of the v3 production-readiness checklist in [`benchmarks-website/planning/README.md`](https://github.com/vortex-data/vortex/blob/ct/benchmarks-v3/benchmarks-website/planning/README.md). The v3 website itself ships in a separate PR off `ct/benchmarks-v3` once dual-write is verified healthy in production. ### What's included **Rust emitter (`vortex-bench`)** - New `vortex-bench/src/v3.rs`: one record per `kind` (`query_measurement`, `compression_time`, `compression_size`, `random_access_time`, `vector_search_run`) plus a serde-tagged `V3Record` enum, JSONL writer, and `insta` snapshot tests. Field shapes match [`02-contracts.md`](https://github.com/vortex-data/vortex/blob/ct/benchmarks-v3/benchmarks-website/planning/02-contracts.md). - `Dataset::v3_dataset_dims()` (default `(name(), None)`) lets Public-BI map to `(public-bi, <subset>)`. - `compress` and `runner` capture per-iteration timings and provide `SqlBenchmarkRunner::v3_records()`. **Benchmark binaries** - `compress-bench`, `datafusion-bench`, `duckdb-bench`, `lance-bench`, `random-access-bench`, `vector-search-bench` all gain `--gh-json-v3 <path>`. Bare records, no envelope. The legacy `-d gh-json -o ...` flow is untouched. **`bench-orchestrator`** - `vx-bench run --gh-json-v3 <path>` plumbs the flag through to the underlying benchmark binary. **`scripts/post-ingest.py`** (Python 3, stdlib only) - Reads JSONL, fills the `commit` envelope from `git show`, wraps in `{run_meta, commit, records}`, POSTs to `/api/ingest` with `Authorization: Bearer ${INGEST_BEARER_TOKEN}`. Exits non-zero on 4xx/5xx. No retry/spool — deferred. **Workflows** - `.github/workflows/bench.yml` and `sql-benchmarks.yml` add `--gh-json-v3 results.v3.jsonl` to the bench runs and a follow-up "Ingest results to v3 server" step. - New `.github/workflows/v3-commit-metadata.yml` POSTs an empty envelope on every push to `develop` so the v3 `commits` dim stays populated even when no benchmark ran. ### What's NOT included (intentionally) - Anything under `benchmarks-website/` — the v2 React/Node app stays in production unchanged. - Workspace member additions for `benchmarks-website/server` and `benchmarks-website/migrate` — those crates don't exist on `develop` yet. - `.github/workflows/ci.yml` and `publish-bench-server.yml` changes — they reference `vortex-bench-server`, which is also v3-server-only. ## Risk **Zero.** The v3 ingest step is gated on `vars.V3_INGEST_URL != ''` and `continue-on-error: true`. If the V3 server is down, the variable is unset, or the bearer secret is missing, the workflow no-ops and the v2 path keeps writing to S3 unchanged. The Rust emitter writes JSONL to a local file only; no network egress from the binaries themselves. ## Verify A CI run on this branch should show the new "Ingest results to v3 server" step running and POSTing successfully to the EC2 host at `vars.V3_INGEST_URL`. ## Follow-up The v3 website itself (server, migrator, web UI) ships in a separate PR off `ct/benchmarks-v3` once dual-write is verified healthy in production. Outbox-style retry on failed POSTs is also a follow-up — not built until we observe a failure in the wild. ## Test plan - [x] `cargo build -p vortex-bench` — clean. - [x] `cargo nextest run -p vortex-bench` — 49/49 pass, including 7 new v3 snapshot tests. - [x] `cargo build -p compress-bench -p datafusion-bench -p duckdb-bench -p lance-bench -p random-access-bench -p vector-search-bench` — clean. - [x] All six benchmark binaries print `--gh-json-v3 <GH_JSON_V3>` in `--help`. - [x] `python3 scripts/post-ingest.py --help` — clean. - [x] `pytest bench-orchestrator/tests/test_executor.py` — 5/5 pass, including 2 new `gh_json_v3` tests. - [x] `cargo +nightly fmt --all` — no diff. - [x] `cargo clippy --all-targets --all-features -p vortex-bench` — clean. - [x] `cargo clippy --all-targets -p compress-bench -p datafusion-bench -p lance-bench -p random-access-bench -p vector-search-bench` — clean. `duckdb-bench` skipped (transitively triggers a pre-existing `cognitive_complexity` lint in `vortex-duckdb/src/convert/expr.rs:47`, present on `develop` and unrelated to these changes). - [x] `yamllint --strict -c .yamllint.yaml` on the three changed/new workflow files — clean. - [x] `./scripts/public-api.sh` — N/A. All touched Rust crates have `publish = false`. - [ ] Real round-trip against the EC2 host — verifies once this branch triggers a CI bench run with `V3_INGEST_URL` set. --- _Generated by [Claude Code](https://claude.ai/code/session_0154XbxhgQztmbrQfJ4ZSxVo)_ </details> --------- Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>
1 parent e0a0527 commit d3ff1f1

30 files changed

Lines changed: 1470 additions & 7 deletions

.github/workflows/bench.yml

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,7 @@ jobs:
9292
VORTEX_EXPERIMENTAL_PATCHED_ARRAY: "1"
9393
FLAT_LAYOUT_INLINE_ARRAY_NODE: "1"
9494
run: |
95-
bash scripts/bench-taskset.sh target/release_debug/${{ matrix.benchmark.id }} --formats ${{ matrix.benchmark.formats }} -d gh-json -o results.json
95+
bash scripts/bench-taskset.sh target/release_debug/${{ matrix.benchmark.id }} --formats ${{ matrix.benchmark.formats }} -d gh-json -o results.json --gh-json-v3 results.v3.jsonl
9696
9797
- name: Setup AWS CLI
9898
uses: aws-actions/configure-aws-credentials@ec61189d14ec14c8efccab744f656cffd0e33f37 # v6
@@ -105,6 +105,19 @@ jobs:
105105
run: |
106106
bash scripts/cat-s3.sh vortex-ci-benchmark-results data.json.gz results.json
107107
108+
- name: Ingest results to v3 server
109+
if: vars.V3_INGEST_URL != ''
110+
continue-on-error: true
111+
shell: bash
112+
env:
113+
INGEST_BEARER_TOKEN: ${{ secrets.INGEST_BEARER_TOKEN }}
114+
run: |
115+
python3 scripts/post-ingest.py results.v3.jsonl \
116+
--server "${{ vars.V3_INGEST_URL }}" \
117+
--commit-sha "${{ github.sha }}" \
118+
--benchmark-id "${{ matrix.benchmark.id }}" \
119+
--repo-url "${{ github.server_url }}/${{ github.repository }}"
120+
108121
- name: Alert incident.io
109122
if: failure()
110123
uses: ./.github/actions/alert-incident-io

.github/workflows/sql-benchmarks.yml

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -376,6 +376,7 @@ jobs:
376376
bash scripts/bench-taskset.sh uv run --project bench-orchestrator vx-bench run "${{ matrix.subcommand }}" \
377377
--targets-json '${{ steps.targets.outputs.targets_json }}' \
378378
--output results.json \
379+
--gh-json-v3 results.v3.jsonl \
379380
--no-build \
380381
--runner "ec2_${{ inputs.machine_type }}" \
381382
${{ matrix.iterations && format('--iterations {0}', matrix.iterations) || '' }} \
@@ -395,6 +396,7 @@ jobs:
395396
bash scripts/bench-taskset.sh uv run --project bench-orchestrator vx-bench run "${{ matrix.subcommand }}" \
396397
--targets-json '${{ steps.targets.outputs.targets_json }}' \
397398
--output results.json \
399+
--gh-json-v3 results.v3.jsonl \
398400
--no-build \
399401
--runner "ec2_${{ inputs.machine_type }}" \
400402
${{ matrix.iterations && format('--iterations {0}', matrix.iterations) || '' }} \
@@ -499,6 +501,19 @@ jobs:
499501
run: |
500502
bash scripts/cat-s3.sh vortex-ci-benchmark-results data.json.gz results.json
501503
504+
- name: Ingest results to v3 server
505+
if: inputs.mode == 'develop' && vars.V3_INGEST_URL != ''
506+
continue-on-error: true
507+
shell: bash
508+
env:
509+
INGEST_BEARER_TOKEN: ${{ secrets.INGEST_BEARER_TOKEN }}
510+
run: |
511+
python3 scripts/post-ingest.py results.v3.jsonl \
512+
--server "${{ vars.V3_INGEST_URL }}" \
513+
--commit-sha "${{ github.sha }}" \
514+
--benchmark-id "${{ matrix.id }}" \
515+
--repo-url "${{ github.server_url }}/${{ github.repository }}"
516+
502517
- name: Upload File Sizes
503518
if: inputs.mode == 'develop' && matrix.remote_storage == null
504519
shell: bash
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# Posts a v3 ingest envelope with no records on every push to develop, so the
2+
# `commits` dim stays populated even when no benchmark ran.
3+
4+
name: v3 commit metadata
5+
6+
on:
7+
push:
8+
branches: [develop]
9+
workflow_dispatch: { }
10+
11+
permissions:
12+
contents: read
13+
14+
jobs:
15+
commit-metadata:
16+
runs-on: ubuntu-latest
17+
timeout-minutes: 10
18+
steps:
19+
- uses: actions/checkout@v6
20+
with:
21+
fetch-depth: 2
22+
23+
- name: Ingest commit metadata to v3 server
24+
if: vars.V3_INGEST_URL != ''
25+
continue-on-error: true
26+
shell: bash
27+
env:
28+
INGEST_BEARER_TOKEN: ${{ secrets.INGEST_BEARER_TOKEN }}
29+
run: |
30+
echo -n > empty.jsonl
31+
python3 scripts/post-ingest.py empty.jsonl \
32+
--server "${{ vars.V3_INGEST_URL }}" \
33+
--commit-sha "${{ github.sha }}" \
34+
--benchmark-id "commit-metadata" \
35+
--repo-url "${{ github.server_url }}/${{ github.repository }}"

Cargo.lock

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

bench-orchestrator/bench_orchestrator/cli.py

Lines changed: 51 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
from contextlib import contextmanager
88
from datetime import datetime, timedelta
99
from pathlib import Path
10+
from tempfile import TemporaryDirectory
1011
from typing import Annotated
1112

1213
import pandas as pd
@@ -115,6 +116,38 @@ def open_results_output(path: Path | None):
115116
yield handle
116117

117118

119+
@contextmanager
120+
def temporary_v3_output_dir(enabled: bool):
121+
"""Create a temporary directory for per-backend v3 JSONL files."""
122+
if not enabled:
123+
yield None
124+
return
125+
126+
with TemporaryDirectory(prefix="vx-bench-v3-") as temp_dir:
127+
yield Path(temp_dir)
128+
129+
130+
def backend_v3_output_path(temp_dir: Path | None, index: int, backend: Engine) -> Path | None:
131+
"""Return the v3 JSONL path a backend should write, if v3 output is enabled."""
132+
if temp_dir is None:
133+
return None
134+
return temp_dir / f"{index:02d}-{backend.value}.jsonl"
135+
136+
137+
def write_combined_v3_output(output_path: Path, input_paths: list[Path]) -> None:
138+
"""Concatenate successful per-backend v3 JSONL files into the requested output."""
139+
if output_path.parent != Path():
140+
output_path.parent.mkdir(parents=True, exist_ok=True)
141+
142+
with output_path.open("w", encoding="utf-8") as output:
143+
for input_path in input_paths:
144+
if not input_path.exists():
145+
raise RuntimeError(f"v3 output was not written by benchmark backend: {input_path}")
146+
with input_path.open("r", encoding="utf-8") as input_file:
147+
for line in input_file:
148+
output.write(line)
149+
150+
118151
def write_result_line(line: str, store_writer, compatibility_file) -> None:
119152
"""Write a raw result line to the run store and optional compatibility output."""
120153
store_writer(line)
@@ -210,6 +243,10 @@ def run(
210243
Path | None,
211244
typer.Option("--output", help="Optional path for compatibility JSONL output"),
212245
] = None,
246+
gh_json_v3: Annotated[
247+
Path | None,
248+
typer.Option("--gh-json-v3", help="Optional path for v3 JSONL records emitted by the benchmark binary"),
249+
] = None,
213250
options: Annotated[list[str] | None, typer.Option("--opt", help="Engine or benchmark specific options")] = None,
214251
) -> None:
215252
"""Run benchmarks with specified configuration."""
@@ -276,10 +313,16 @@ def run(
276313
soft_failures: list[str] = []
277314

278315
try:
279-
with store.create_run(config, build_config) as ctx, open_results_output(output) as compatibility_file:
280-
for backend, backend_targets in backend_groups.items():
316+
with (
317+
store.create_run(config, build_config) as ctx,
318+
open_results_output(output) as compatibility_file,
319+
temporary_v3_output_dir(gh_json_v3 is not None) as v3_temp_dir,
320+
):
321+
v3_output_parts: list[Path] = []
322+
for backend_idx, (backend, backend_targets) in enumerate(backend_groups.items()):
281323
executor = BenchmarkExecutor(binary_paths[backend], backend, verbose=verbose)
282324
backend_formats = [target.format for target in backend_targets]
325+
backend_gh_json_v3 = backend_v3_output_path(v3_temp_dir, backend_idx, backend)
283326

284327
try:
285328
results = executor.run(
@@ -294,6 +337,7 @@ def run(
294337
sample_rate=sample_rate,
295338
tracing=tracing,
296339
runner=runner,
340+
gh_json_v3=backend_gh_json_v3,
297341
on_result=lambda line, store_writer=ctx.write_raw_json, compatibility=compatibility_file: (
298342
write_result_line(
299343
line,
@@ -302,6 +346,8 @@ def run(
302346
)
303347
),
304348
)
349+
if backend_gh_json_v3 is not None:
350+
v3_output_parts.append(backend_gh_json_v3)
305351
console.print(f"[green]{backend.value}: {len(results)} results[/green]")
306352
except RuntimeError as exc:
307353
ctx.metadata.partial = True
@@ -310,6 +356,9 @@ def run(
310356
console.print(f"[red]{backend.value} failed: {exc}[/red]")
311357
soft_failures.append(str(exc))
312358

359+
if gh_json_v3 is not None:
360+
write_combined_v3_output(gh_json_v3, v3_output_parts)
361+
313362
ctx.metadata.binaries = {backend.value: str(path) for backend, path in binary_paths.items()}
314363
except RuntimeError as exc:
315364
console.print(f"[red]{exc}[/red]")

bench-orchestrator/bench_orchestrator/runner/executor.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ def build_command(
4040
sample_rate: int | None = None,
4141
tracing: bool = False,
4242
runner: str | None = None,
43+
gh_json_v3: Path | None = None,
4344
) -> list[str]:
4445
"""Build the command used to execute a benchmark binary."""
4546
cmd = [
@@ -67,6 +68,8 @@ def build_command(
6768
cmd.append("--tracing")
6869
if runner:
6970
cmd.extend(["--runner", runner])
71+
if gh_json_v3 is not None:
72+
cmd.extend(["--gh-json-v3", str(gh_json_v3)])
7073
if options:
7174
for key, value in options.items():
7275
cmd.extend(["--opt", f"{key}={value}"])
@@ -98,6 +101,7 @@ def run(
98101
sample_rate: int | None = None,
99102
tracing: bool = False,
100103
runner: str | None = None,
104+
gh_json_v3: Path | None = None,
101105
on_result: Callable[[str], None] | None = None,
102106
) -> list[str]:
103107
"""
@@ -128,6 +132,7 @@ def run(
128132
sample_rate=sample_rate,
129133
tracing=tracing,
130134
runner=runner,
135+
gh_json_v3=gh_json_v3,
131136
)
132137

133138
if self.verbose:

bench-orchestrator/tests/test_cli.py

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,3 +105,47 @@ def fake_run(self, **kwargs):
105105
metadata = json.loads((run_dirs[0] / "metadata.json").read_text(encoding="utf-8"))
106106
assert metadata["targets"] == [{"engine": "datafusion", "format": "parquet"}]
107107
assert metadata["binaries"] == {"datafusion": str(binary_path)}
108+
109+
110+
def test_run_combines_gh_json_v3_output_per_backend(tmp_path, monkeypatch) -> None:
111+
run_store = ResultStore(base_dir=tmp_path / "runs")
112+
output_path = tmp_path / "artifacts" / "results.v3.jsonl"
113+
binary_paths = {
114+
cli_module.Engine.DATAFUSION: tmp_path / "datafusion-bench",
115+
cli_module.Engine.DUCKDB: tmp_path / "duckdb-bench",
116+
}
117+
for binary_path in binary_paths.values():
118+
binary_path.write_text("", encoding="utf-8")
119+
120+
monkeypatch.setattr(cli_module, "ResultStore", lambda: run_store)
121+
monkeypatch.setattr(cli_module.BenchmarkBuilder, "get_binary_path", lambda self, backend: binary_paths[backend])
122+
123+
seen_backend_paths = []
124+
125+
def fake_run(self, **kwargs):
126+
backend_output = kwargs["gh_json_v3"]
127+
assert backend_output is not None
128+
assert backend_output != output_path
129+
backend_output.write_text(f"{self.backend.value}-v3\n", encoding="utf-8")
130+
seen_backend_paths.append(backend_output)
131+
return []
132+
133+
monkeypatch.setattr(BenchmarkExecutor, "run", fake_run)
134+
135+
result = runner.invoke(
136+
cli_module.app,
137+
[
138+
"run",
139+
"tpch",
140+
"--targets-json",
141+
'[{"engine":"datafusion","format":"parquet"},{"engine":"duckdb","format":"parquet"}]',
142+
"--no-build",
143+
"--gh-json-v3",
144+
str(output_path),
145+
],
146+
)
147+
148+
assert result.exit_code == 0
149+
assert output_path.read_text(encoding="utf-8") == "datafusion-v3\nduckdb-v3\n"
150+
assert len(seen_backend_paths) == 2
151+
assert seen_backend_paths[0] != seen_backend_paths[1]

bench-orchestrator/tests/test_executor.py

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,31 @@ def test_build_command_omits_formats_for_lance_backend() -> None:
4848
assert "1,3" in cmd
4949

5050

51+
def test_build_command_includes_gh_json_v3_when_set() -> None:
52+
executor = BenchmarkExecutor(Path("/tmp/duckdb-bench"), Engine.DUCKDB)
53+
54+
cmd = executor.build_command(
55+
benchmark=Benchmark.TPCH,
56+
formats=[Format.PARQUET],
57+
gh_json_v3=Path("results.v3.jsonl"),
58+
)
59+
60+
assert "--gh-json-v3" in cmd
61+
flag_idx = cmd.index("--gh-json-v3")
62+
assert cmd[flag_idx + 1] == "results.v3.jsonl"
63+
64+
65+
def test_build_command_omits_gh_json_v3_when_unset() -> None:
66+
executor = BenchmarkExecutor(Path("/tmp/duckdb-bench"), Engine.DUCKDB)
67+
68+
cmd = executor.build_command(
69+
benchmark=Benchmark.TPCH,
70+
formats=[Format.PARQUET],
71+
)
72+
73+
assert "--gh-json-v3" not in cmd
74+
75+
5176
def test_run_streams_logs_without_counting_them(tmp_path: Path) -> None:
5277
script = tmp_path / "fake-bench.py"
5378
script.write_text(

0 commit comments

Comments
 (0)