Skip to content

Commit ab5592f

Browse files
committed
docs(benchmarks-website): refresh planning docs after UI + perf merges
Brings AGENTS.md and README.md in line with what's actually on ct/benchmarks-v3 after the four UI workstreams (tooltip-pr-link, range-scrollbar, global-filters, full-history) and the perf pass (CompressionLayer, LANDING_INLINE_N) landed. AGENTS.md: - Bullet 1: note the tower-http CompressionLayer. - Bullet 7: rewrite. No more 1000-commit cap; fetch is `?n=all`, visual downsampling is client-side LTTB on the visible range, range-scrollbar drives rebuilds in lockstep with the toolbar. - Bullet 8: groups are all collapsed by default; first group's payloads are inlined at LANDING_INLINE_N=100 commits with lazy-fetch on zoom-out. - Bullet 9: the URL params are `?n=&engine=&format=` (not `?y=&mode=&hidden=`); per-chart toolbar state is local-only. - Code map: fix LANDING_INLINE_N name/value; refresh chart-init.js description (LTTB rebuild, range strip, filter chips, click-to-PR); add rows for app.rs and migrate/src/verify.rs; note the new range-strip/filter-chip CSS selectors. - Things to avoid: nuance the "don't refetch on scope change" rule (the inline-payload zoom-out path is a one-time exception); add the predecessor-walk lesson from PR #7723 review (commits[] is oldest-first by SQL); add the "no server-side commit cap" rule; clarify that the global filter bar is a visibility driver, not a forbidden page-level toolbar. README.md: - Status: list the recent merges (LTTB UI, compression, inline trim). - Item 1 (secrets): mark done. - Item 2 (CI wiring): note the f7fd270 dual-write commit; flag end-to-end verification as the still-open subtask. - Item 3 (test deployment): bump the host DNS to the current EC2 box; mention the c6a.4xlarge build-on-box path. - Item 4 (smoke test): mark in-progress; record the Random Access recovery so future agents see the regression class to watch for. - Deferred UI follow-ups: drop client-side LTTB (now done); refresh the collect_group_charts N+1 line range (1131-1162 → 1202-1233); expand the mobile legend item with the actual cause (matchMedia is read once at construction). Signed-off-by: Claude <noreply@anthropic.com>
1 parent 10b07e8 commit ab5592f

2 files changed

Lines changed: 69 additions & 41 deletions

File tree

benchmarks-website/planning/AGENTS.md

Lines changed: 39 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ it as `vortex-bench-server` at `benchmarks-website/server/`.
2424

2525
- Single Rust binary: `axum` (HTTP) + `maud` (SSR HTML) + embedded `duckdb-rs`. All static assets
2626
(`chart.umd.js`, `chart-init.js`, `style.css`) are `include_bytes!`'d into the binary. No CDN.
27+
A `tower-http` `CompressionLayer` wraps every response (gzip/brotli).
2728
- One DuckDB file on local disk holds five fact tables (compression time, query measurement, vector
2829
search, RAG, random access) plus a `commits` dim table. Schema in
2930
[`01-schema.md`](./01-schema.md).
@@ -37,13 +38,23 @@ it as `vortex-bench-server` at `benchmarks-website/server/`.
3738
- Charts render inline on the landing page. Each `<canvas>` is paired with a
3839
`<script id="chart-data-N">` JSON payload that `chart-init.js` hydrates lazily via
3940
`IntersectionObserver`.
40-
- Per-chart toolbar with zoom-as-scope: each chart fetches up to 1000 commits once, then the Show /
41-
Y / Mode buttons and slider adjust the visible range via `chart.update("none")`. Mouse wheel pans
42-
history. Slider uses `input` events with a 16ms throttle + 150ms debounce.
41+
- Per-chart toolbar with zoom-as-scope. Each chart fetches its full raw history once
42+
(`?n=all`); visual downsampling is **client-side LTTB** in `chart-init.js`
43+
(`MAX_VISIBLE_POINTS = 500`, applied only to the currently visible commit range — zoomed-in
44+
views render raw). Drag-pan, drag-rectangle-zoom, wheel-pan, the toolbar slider, and a
45+
horizontal range-scrollbar strip below each chart all drive the same `rebuildVisibleAndUpdate`
46+
so LTTB and the strip stay in lockstep. A "downsampled · K / N" badge surfaces when LTTB is
47+
active.
4348
- Group ordering is hard-coded to match v2's `origin/ct/vfvb:benchmarks-website/index.html` order.
44-
Each group is wrapped in a `<details>`; only the first is open by default.
45-
- URL state (`?n=&y=&mode=&hidden=`) is honored only on permalink pages (`/chart`, `/group`).
46-
Landing page resets to defaults on open; users customize per-chart in place.
49+
Every group is wrapped in a `<details>`, all collapsed by default. The first group's chart
50+
payloads are still inlined (capped at `LANDING_INLINE_N = 100` commits) so opening it skips a
51+
fetch round-trip; `chart-init.js` lazy-fetches `?n=all` once when the user zooms past the
52+
inlined window.
53+
- A sticky filter bar at the top of every page exposes engine/format chips that drive series
54+
visibility across every chart at once. Clicking a data point opens that commit's PR (parsed
55+
from `(#NNNN)` in the message; falls back to the commit URL). URL params `?engine=&format=&n=`
56+
survive permalink shares and refreshes; per-chart toolbar state (Y axis, slider) is
57+
intentionally local-only.
4758
- `vortex-bench-migrate` reads v2 records, runs each through a classifier in
4859
`migrate/src/classifier.rs`, and either routes the record into one of the five fact tables or
4960
marks it `Skip(reason)` with a typed reason. The run **fails if more than 5% of records come back
@@ -54,13 +65,15 @@ it as `vortex-bench-server` at `benchmarks-website/server/`.
5465
| Path | What lives here |
5566
| ------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
5667
| `benchmarks-website/server/src/main.rs` | Binary entrypoint. Reads `INGEST_BEARER_TOKEN`, `VORTEX_BENCH_BIND` (default `127.0.0.1:3000`), `VORTEX_BENCH_DB` (default `./bench.duckdb`), `VORTEX_BENCH_LOG`. |
68+
| `benchmarks-website/server/src/app.rs` | `AppState` (DB handle + bearer + path) and the `Router` composition. `CompressionLayer` wraps every response. |
5769
| `benchmarks-website/server/src/api.rs` | `chart_payload(conn, &ChartKey, &CommitWindow)` — the shared implementation behind `/api/chart/{slug}`, the inline `<script>` JSON, and `collect_group_charts`. Known N+1 in `collect_group_charts` — flagged with a TODO. |
58-
| `benchmarks-website/server/src/html.rs` | Three HTML routes and the `<details>`-per-group landing page. `LANDING_DEFAULT_N: u32 = 50`. `UiQuery` parses `?n=&y=&mode=&hidden=` on permalink routes. |
70+
| `benchmarks-website/server/src/html.rs` | Three HTML routes and the `<details>`-per-group landing page. `LANDING_INLINE_N: u32 = 100` caps the first group's inlined chart JSON; HTML routes default to `CommitWindow::All`. `UiQuery` parses `?n=&engine=&format=`. |
5971
| `benchmarks-website/server/src/slug.rs` | `ChartKey` / `GroupKey` enums and `to_slug` / `from_slug` round-trip. |
60-
| `benchmarks-website/server/static/chart-init.js` | Hydration, `IntersectionObserver`, custom external tooltip with delta rows, inline `afterDatasetsDraw` plugin for the dashed crosshair. |
61-
| `benchmarks-website/server/static/style.css` | `.chart-tooltip-host` is `position: absolute; pointer-events: none;` (do not change — fixes the flicker). `.chart-card` is `position: relative`. |
72+
| `benchmarks-website/server/static/chart-init.js` | Hydration, `IntersectionObserver`, lazy-fetch on `<details>` toggle, `rebuildVisibleAndUpdate` (client-side LTTB on the visible range, `MAX_VISIBLE_POINTS = 500`), custom external tooltip + delta rows + click-to-PR, range-scrollbar strip, global filter chips, inline crosshair plugin. |
73+
| `benchmarks-website/server/static/style.css` | `.chart-tooltip-host` is `position: absolute; pointer-events: none;` (do not change — fixes the flicker). `.chart-card` is `position: relative`. `.chart-range-strip*` and `.filter-*` selectors back the range scrollbar and global filter chips. |
6274
| `benchmarks-website/server/tests/web_ui.rs` | `insta` snapshot tests, seeded by POSTing to `/api/ingest`. No external fixtures. |
6375
| `benchmarks-website/migrate/src/classifier.rs` | `classify_outcome` routes records into a fact table, `Skip(reason)`, or `Unknown`. >5% Unknown gates the run. |
76+
| `benchmarks-website/migrate/src/verify.rs` | Structural diff between a migrated DuckDB and v2's live `/api/metadata`. Exits non-zero if any v2 group is missing in v3 — gates a CI step. |
6477

6578
## Local dev / smoke test
6679

@@ -111,13 +124,25 @@ See the root [`CLAUDE.md`](/CLAUDE.md) for Rust style, test layout, and CI norms
111124
[`README.md`](./README.md) first — it is almost certainly already deferred.
112125
- **Don't write a server-side classifier for live ingest.** The emitter is responsible for v3-shape
113126
records. The migrator's classifier exists only to translate v2 records once.
114-
- **Don't rebuild a global page-level toolbar.** Controls are per-chart. This was a real failure
115-
mode the first time around — the page-level toolbar drove every chart together, which is not what
116-
users want.
127+
- **Don't rebuild a global page-level toolbar with chart-state controls.** Per-chart controls
128+
(slider, Y-axis, scope) stay per-chart. The sticky filter bar at the top of every page is the
129+
exception — it drives series *visibility* across every chart at once, which is what users want
130+
for the engine/format dimension. Don't extend it with per-chart settings.
117131
- **Don't bind a slider's reactive logic to `change` events.** Use `input` events with a small
118132
throttle + debounce, otherwise the slider only updates on release and feels broken.
119-
- **Don't refetch on scope change.** Each chart fetches a generous window once; scope buttons +
120-
slider operate on that buffer via `chart.update("none")`.
133+
- **Don't refetch every time the scope changes.** The chart fetches its full history once; scope
134+
buttons, slider, drag-pan, wheel-pan, and the range strip all rebuild via the in-memory LTTB
135+
pass on the cached payload. The one exception is the inline-payload zoom-out path: when the user
136+
zooms past the first group's inlined `LANDING_INLINE_N` window for the first time,
137+
`chart-init.js` lazy-fetches `?n=all` once and replaces the payload.
138+
- **Don't re-introduce a server-side commit cap.** `?n=all` is the default for HTML routes and the
139+
upper bound is unbounded everywhere. Visual downsampling lives client-side in `chart-init.js`,
140+
not on the wire.
141+
- **Don't reverse the predecessor walk in the tooltip.** The chart payload's `commits[]` is sorted
142+
oldest-first by SQL — `commits[0]` is the oldest commit, `commits[N-1]` is the newest. For
143+
per-row delta the chronological predecessor of `commits[idx]` lives at `idx - 1`. We caught a
144+
regression where a "fix" flipped this to `idx + 1`; the original walk-backward direction was
145+
right.
121146
- **Don't re-introduce `pointer-events: auto` on the tooltip host.** The tooltip is positioned at
122147
the cursor; making it pointer-interactive causes a flicker loop. Keep it `pointer-events: none`
123148
and offset via `transform: translate(12px, 12px)`.

benchmarks-website/planning/README.md

Lines changed: 30 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,11 @@ the v2 Node/React stack.
1010

1111
## Status
1212

13-
- **Alpha shipped** to `ct/benchmarks-v3`. Server, migrator, and inline-charts UI are merged.
14-
- **In production-readiness phase.** v2 is still serving `bench.vortex.dev`. v3 has not been
15-
deployed publicly yet.
13+
- **Alpha shipped** to `ct/benchmarks-v3`. Server, migrator, full-history UI (client-side LTTB,
14+
range scrollbar, global filter chips, click-to-PR tooltip), response compression, and the
15+
`LANDING_INLINE_N` cold-load trim are all merged.
16+
- **In production-readiness phase.** v2 is still serving `bench.vortex.dev`. v3 runs on a
17+
throwaway EC2 host for smoke-testing; not deployed publicly yet.
1618
- **UI follow-ups** are owned by the user, not by agents (see "Deferred UI follow-ups" below).
1719

1820
A 10-bullet architecture summary lives at the top of [`AGENTS.md`](./AGENTS.md). Use that for
@@ -22,34 +24,33 @@ handoffs and external sharing.
2224

2325
In rough order. Each item is a separate task; do not bundle.
2426

25-
### 1. Repo secrets
27+
### 1. Repo secrets — done
2628

27-
Two GitHub repository secrets on `vortex-data/vortex` (admins only):
29+
`INGEST_BEARER_TOKEN` and `V3_INGEST_URL` are set as repo-level secrets on `vortex-data/vortex`.
30+
They're fine at this scope for the test phase. Move to an Environment-scoped secret (gated to
31+
`ct/benchmarks-v3` / protected branches) before prod. Rotate `INGEST_BEARER_TOKEN` if the test
32+
value was ever shared in a comment / Slack / PR review.
2833

29-
- `INGEST_BEARER_TOKEN` — random 32+ byte token. Same value gets set as the `INGEST_BEARER_TOKEN`
30-
env var on whatever host runs the v3 server. Generate with `openssl rand -hex 32`.
31-
- `V3_INGEST_URL` — full URL of the v3 ingest endpoint, e.g. `http://<host>:3000/api/ingest` for
32-
the test box, or `https://bench.vortex.dev/api/ingest` for prod.
34+
### 2. CI ingestion wiring — partial
3335

34-
Repo-level secrets are fine for the test phase. Move to an Environment-scoped secret (gated to
35-
`ct/benchmarks-v3` / protected branches) before prod.
36-
37-
### 2. CI ingestion wiring
38-
39-
Confirm whichever workflow runs the benchmark suites and pushes results uses
40-
`secrets.INGEST_BEARER_TOKEN` and `secrets.V3_INGEST_URL`, and POSTs the versioned envelope shape
41-
defined in [`02-contracts.md`](./02-contracts.md). The current workflow targets the v2 endpoint;
42-
needs to either dual-write or flip.
36+
The dual-write step is wired into `bench.yml` and `sql-benchmarks.yml` via commit `f7fd270`. Still
37+
to do: an end-to-end run that triggers the workflow on a feature branch, POSTs to the EC2 box, and
38+
confirms the envelope lands in DuckDB intact. Outbox-style retry on failed POSTs is a follow-up;
39+
not built until we observe a failure.
4340

4441
### 3. Test deployment
4542

4643
Currently a manual EC2 box for smoke-testing. Latest test host:
4744

48-
- DNS: `ec2-18-116-241-0.us-east-2.compute.amazonaws.com`
45+
- DNS: `ec2-18-219-54-101.us-east-2.compute.amazonaws.com` (changes on stop/start unless an Elastic
46+
IP is associated)
4947
- Port: `3000` (open to `0.0.0.0/0` in the security group)
5048
- Bind: `VORTEX_BENCH_BIND=0.0.0.0:3000` (default `127.0.0.1` does not work for external access)
51-
- HTTP only, no TLS. Public IP changes on stop/start unless an Elastic IP is associated. Throwaway
52-
token only — don't reuse for prod.
49+
- HTTP only, no TLS. Throwaway bearer token only — don't reuse for prod.
50+
51+
Build path: build narrow on the box itself (it's a `c6a.4xlarge` to avoid local-vs-EC2 arch
52+
mismatches). The v2 migration source is fetched directly from the public S3 bucket; no AWS creds
53+
needed.
5354

5455
Smoke test from a laptop:
5556

@@ -59,10 +60,12 @@ curl -i http://<host>:3000/
5960

6061
Should return HTTP 200 with the landing HTML.
6162

62-
### 4. Smoke test with migrated data
63+
### 4. Smoke test with migrated data — in progress
6364

64-
Run `vortex-bench-migrate` against the v2 source, copy the resulting `bench.duckdb` to the deployed
65-
host, point `VORTEX_BENCH_DB` at it, and walk every group's charts in a browser.
65+
Run `vortex-bench-migrate` against the v2 source, point `VORTEX_BENCH_DB` at the result, walk every
66+
group's charts in a browser. Done so far: Random Access (caught and fixed a missing-chart
67+
regression — see `1228e530`); LTTB downsampling, range scrollbar, filter chips, and click-to-PR
68+
all behave on real data. Still to walk: every other group at least once.
6669

6770
### 5. Operational hygiene (not yet done)
6871

@@ -123,10 +126,10 @@ These are user/owner decisions, not agent decisions.
123126
display-name map. Cosmetic but visible.
124127
- **Deferred UI follow-ups.** The user is handling these directly; agents should not pre-empt
125128
them:
126-
- `collect_group_charts` N+1 refactor in `api.rs:1131-1162`.
127-
- Mobile legend resize handler.
129+
- `collect_group_charts` N+1 refactor in `api.rs:1202-1233`.
130+
- Mobile legend resize handler. The position is picked once at chart construction via
131+
`matchMedia("(max-width: 768px)")`; it doesn't update if the viewport crosses the breakpoint.
128132
- Zoom-sync within a group.
129-
- LTTB downsampling for very long histories.
130133
- Swap the inline crosshair plugin for `chartjs-plugin-crosshair`.
131134

132135
## Reading order (alpha-era reference)

0 commit comments

Comments
 (0)