Skip to content

Commit 4575b1d

Browse files
committed
plan: close PR-5.1.5 (Step 2.5) — read-path perf accepted at 3-cycle cap; advance to PR-5.1
PR-5.1.5 shipped, deployed, live-verified, gauntlet-accepted (3 cycles, executor=parallel). Cold render /api/groups 6.0s->~1s; tpch chart 13.6s->0.094s. Write-path commit_timestamp + migrations 006/007 done; PR-5.0 deferred data-checks resolved. Current State advanced to PR-5.1. Signed-off-by: "Connor Tsui" <connor@spiraldb.com>
1 parent b20904a commit 4575b1d

1 file changed

Lines changed: 20 additions & 9 deletions

File tree

.big-plans/ct__bench-v4.md

Lines changed: 20 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -8,27 +8,27 @@ branch: ct/bench-v4
88
planning_sub_flow: null
99
current_phase: "Phase 5: Cutover + decommission"
1010
phase_index: 5
11-
current_pr: PR-5.1.5
12-
pr_index: 3
11+
current_pr: PR-5.1
12+
pr_index: 4
1313
outstanding_must_fix: 0
14-
deferred_items_total: 17
15-
last_user_touchpoint: 2026-06-11T23:30:00Z
16-
last_user_touchpoint_what: "SKIP-SCAN + WRITE-PATH DONE, DEPLOYED, LIVE-VERIFIED. Skip-scans (15b778b01): collectQuerySummary 3-branch successor walk (prod tpcds 2796ms->63ms), collectQueryGroups 15-branch NULL-aware walk (2333ms->20ms), collectFilterUniverse single-col skips (565ms->0.2ms); all byte-identical vs replaced queries on testcontainer + full prod seed; web vitest 211 green; NO migration 008 needed (NULLS LAST handled via IS NOT NULL descent + IS NULL fallback probe). LIVE: /api/groups cache-cold 0.64-1.07s (was ~6.0s), landing cold 1.2s (was ~6.9s), x-vercel-cache MISS verified, payload sane (16 groups / 13 summaries / 372 charts). Write-path (e3861734e): post-ingest.py upsert stamps commit_timestamp via scalar subquery both paths + new pytest (100 green); migrate loader post-COPY 006-backfill UPDATE in same txn + e2e asserts 0 NULL/0 drift (100 green); e2e init now applies 001+006. User approved prod reads this session (EXPLAIN-verified on prod as bench_read). REMAINING: gauntlet review (Step 2.3), PR-5.0 deferred data-checks (slug-match + ~5 chart slugs vs v2), Step 2.5 close. PRIOR touchpoint (2026-06-11T22:30Z): user chose (a) recursive-CTE skip-scan. Committed + Docker-verified + PUSHED (origin ct/bench-v4 @ a4834ba1f): sargable (2e637401e), parallelize e (629b5b0b6), denormalize c + migration 006 + d indexes (680b30e6e), covering index migration 007 + DISTINCT ON summary (a4834ba1f). Prod: instance upsized db.t4g.medium + vortex-bench-pg16(work_mem 32MB); migrations 006+007 APPLIED to prod as master + VACUUM ANALYZE; web-deploy.yml deploy SUCCEEDED incl the CDN probe (the push-unblock the handoff waited for). LIVE measurements: tpch chart 13.6s->0.094s; /api/groups 38s->0.079s cached / ~6.0s cache-cold server render (x-vercel-cache MISS); landing / ~6.9s cold. Prod SQL: chart 75ms, tpcds summary 2.4s (Index Only Scan, Heap Fetches 0 via idx_query_measurements_summary covering INCLUDE value_ns), DISTINCT engine 458ms, discovery 1.3s. REMAINING for PR-5.1.5 close-out: (1) cold render 6s->faster (concurrency bump band-aid OR recursive-CTE skip-scan for ~ms summaries — user to choose); (2) (c) WRITE-PATH still TODO (post-ingest.py + migrate/src/postgres.rs populate commit_timestamp — needed before develop merge; existing data backfilled by 006 so deployed site is correct now, NULLS LAST + a re-backfill cover the transient); (3) gauntlet review of the PR; (4) PR-5.0 deferred data-checks (slug-match + ~5 chart slugs vs v2) now verifiable on the fast site; (5) Step 2.5 close. (b)/(d) folded into the ~6s, acceptable. Backfill UPDATE created bloat -> VACUUM ANALYZE was required post-006 (lesson). schema-deploy.yml is develop-only so the ct/bench-v4 push only fired web-deploy; at the develop merge schema-deploy no-ops 006/007 (already in prod ledger, master-pre-applied like 005)."
17-
subagent_invocations_this_pr: 2
18-
subagent_invocations_total: 147
14+
deferred_items_total: 22
15+
last_user_touchpoint: 2026-06-12T00:30:00Z
16+
last_user_touchpoint_what: "PR-5.1.5 CLOSED (Step 2.5). Read-path perf shipped, deployed, live-verified, and gauntlet-accepted at the project ~3-cycle cap; full close-out in the PR-5.1.5 Implementation status entry. Skip scans (15b778b01) + sargable (2e637401e) + write-path commit_timestamp (e3861734e) + 3 gauntlet-fix commits (572b3bdf6 cycle 1, cbe2b97b2 cycle 2, b20904ae0 cycle 3). Live: /api/groups cold 0.64-1.07s (was 6.0s) / 0.079s cached; tpch chart 13.6s->0.094s. Migrations 006+007 applied to prod as master + VACUUM ANALYZE. PR-5.0 deferred data-checks RESOLVED (chart-count sets match live v2 bench.vortex.dev; 5-slug latest-value spot-check within variance). All suites green (web vitest 214, migrate Rust 100, python 154). Gauntlet: 3 cycles, executor=parallel (Claude+Codex per lens); cycle-3 must-fixes were operator-runbook doc drift (infra/README.md + provision.sh) + tests, zero production-logic, accepted at cap. NEXT: PR-5.1 (promote v4 --postgres ingest to required + drop v3 --server write from the 3 CI workflows; pre-promotion gate = re-run the PR-3.5 cross-check clean; ship scripts/psql-bench.sh). Each prod write remains harness-gated. Pre-squash backup ref backup/bench-v4-pre-squash-* still MISSING — recreate before PR-5.3/final squash."
17+
subagent_invocations_this_pr: 0
18+
subagent_invocations_total: 162
1919
review_cycles_this_pr: 0
2020
phase_entry_sha: 9f68717b8
2121
phase_end_cycle: 0
2222
phase_end_reject_cycles: 0
2323
last_phase_end_verdict: null
2424
current_pr_is_ci_reopen: null
25-
last_commit: e3861734e
25+
last_commit: b20904ae0
2626
last_cycle_commits: []
2727
```
2828

2929
## SESSION HANDOFF — 2026-06-11 (PR-5.1.5 read-path-perf DEPLOYED; fresh-conversation takeover)
3030

31-
**READ THIS FIRST on resume.** Resume via `/spiral:big-plans` in the `vortex4` worktree. `Current State` routes you: `status: executing`, `current_pr: PR-5.1.5` (read-path-perf, ordinal 3 in Phase 5, ahead of PR-5.1). The PR is **deployed + live + working**; it is NOT closed (Step 2.5 + gauntlet still pending). The older "## SESSION HANDOFF — 2026-06-11 (PR-5.0 + read-path perf)" section below is **HISTORICAL/SUPERSEDED** (PR-5.0 closed; the read-path PR it predicted is PR-5.1.5, now mostly done).
31+
**READ THIS FIRST on resume.** Resume via `/spiral:big-plans` in the `vortex4` worktree. **PR-5.1.5 is now CLOSED** (Step 2.5 done; gauntlet accepted at the project ~3-cycle cap; deployed + live + working; full close-out in its Implementation status entry). `Current State` now routes you to `current_pr: PR-5.1` (promote v4 ingest to required + drop v3-write from CI). This handoff section and the cold-render decision section below are **HISTORICAL** (the cold-render decision resolved as recursive-CTE skip-scan; all listed REMAINING items are done). The older "## SESSION HANDOFF — 2026-06-11 (PR-5.0 + read-path perf)" section is doubly superseded.
3232

3333
### COLD-RENDER DECISION — RESOLVED 2026-06-11 (user chose (a) skip-scan)
3434
The site is deployed + fast-cached but **cold render of `/api/groups` is ~6s** (the ~13 per-group summaries are each an index-only scan of the group's full ~1.8M-row history at ~2.4s, run 8-concurrently). For this low-traffic dashboard the 5-min CDN cache often expires between visits, so cold is common. **User decided: (a) recursive-CTE skip-scan** on the summaries (jump to each series' latest -> ~ms -> cold render ~1s); apply the same pattern to discovery + DISTINCT-engine where it falls out naturally (handoff item 4). Options (b) concurrency bump and (c) accept were declined.
@@ -1350,6 +1350,17 @@ _Step 2 — snapshot acquisition + rebuild (2026-06-10):_
13501350
- Deferred items: 1 new (seeded end-to-end `collectGroups` test for statpopgen/polarsignals -> web test-hardening pass; Docker unavailable locally to verify it; wiring independently verified by both lenses; `deferred_items_total` 16 -> 17). Resolved-by: web test-hardening pass (pre-develop-merge).
13511351
- Surprises during implementation: (process) the gauntlet `Skill` was initially uninvokable ("Unknown skill") despite appearing in the skill list; a `/reload-plugins` fixed it, so this PR got a real gauntlet review (unlike PR-5.0's faithful manual reconstruction). (mechanics) a mid-PR `git checkout -- queries.ts` used to undo a mutation-verification test ALSO reverted the uncommitted feature edit; caught immediately (grep returned 0 special-cases) and re-applied before committing — no bad state shipped.
13521352

1353+
### PR-5.1.5: read-path perf — scale the v4 read path to the full prod seed (4 impl + 3 gauntlet-fix commits, 3 inner-loop 2-vote cycles accepted at the project ~3-cycle cap, ending at `b20904ae0`, 2026-06-11)
1354+
- Scope shipped (read-side, `15b778b01` for skip scans + earlier `2e637401e`/`629b5b0b6`/`680b30e6e`/`a4834ba1f` for sargable/parallelize/006/007): (a) **sargable WHERE** (`col IS NULL` / `col = $n` instead of `IS NOT DISTINCT FROM`) across `queries.ts` `chartPayload` + `summary.ts`, so `idx_query_measurements_chart` seeks past the leading `dataset=` (tpch chart 13.6s -> 0.094s live). (b/c/d) **recursive-CTE skip scans** replacing the whole-group/whole-table scans: `collectQuerySummary` latest-per-series (3-branch successor walk + per-series latest probe; prod tpcds 2796ms -> 63ms), `collectQueryGroups` discovery (15-branch NULL-aware successor walk vs a 4.85M-row GROUP BY; 2333ms -> 20ms), `collectFilterUniverse` distinct engine/format (single-column skips; 565ms -> 0.2ms). Each probe selects via a constant-ordinal `ORDER BY br LIMIT 1` (cycle-1 hardening) so arm choice is SQL-guaranteed, not a reliance on Append's undocumented syntactic order. NULLS-LAST "latest" is emulated with an `IS NOT NULL` index descent + a `commits`-joined `IS NULL` fallback (cycle-2 fix made the all-NULL fallback deterministic). All three rewrites verified byte-identical against the replaced queries on both the testcontainer seed and the full prod seed. (e) **bounded-concurrency** summary fan-out (`mapWithConcurrency`, `SUMMARY_CONCURRENCY`/`BENCH_DB_POOL_MAX` 8). (f) operator **RDS upsize** db.t4g.micro -> db.t4g.medium + `vortex-bench-pg16` param group (work_mem 32MB), done 2026-06-11. **Write-path (c)** (`e3861734e`): `post-ingest.py` stamps `commit_timestamp` via a scalar `commits` subquery on both upsert paths; the migrate Rust loader runs a drift-repairing (`IS DISTINCT FROM`) post-COPY backfill inside the load transaction. Migrations **006** (denormalize `commit_timestamp` + backfill + b/d indexes) and **007** (covering `INCLUDE (value_ns)` summary index) are repo-root `migrations/`, carry the `requires-superuser` marker, and were applied to prod as master + `VACUUM ANALYZE` (the 4.85M-row backfill bloated the table; VACUUM was required for the planner to use the new indexes).
1355+
- DEPLOYED + LIVE-VERIFIED: `web-deploy.yml` succeeded on the ct/bench-v4 push incl. the CDN probe. Live: `/api/groups` cache-cold **0.64-1.07s** (was ~6.0s) / **0.079s** cached, landing cold **1.2s** (was ~6.9s), payload sane (16 groups / 13 summaries / 372 charts). The cold-render approach (recursive-CTE skip-scan) was a **user decision (2026-06-11)** over a concurrency band-aid / accept-~6s.
1356+
- Tests: web vitest 214 pass (3 new skip-scan-fidelity tests: stamped-beats-newer-NULL + deterministic all-NULL fallback, summary successor-branch enumeration, discovery-vs-GROUP-BY-oracle across NULL partitions); migrate Rust 100 (e2e asserts 0 NULL / 0 drift post-load, schema init applies 001+006+007); python 154 (006 backfill on pre-existing rows; INCLUDE/DESC index pins; commit_timestamp stamping count-over-all-rows on both upsert paths; 006/007 requires-superuser marker pinned). tsc + eslint + prettier + ruff green.
1357+
- PR-5.0 deferred data-checks RESOLVED (user approved prod reads this session): all 13 query-group chart-count sets match the live v2 site `bench.vortex.dev` exactly; 5 representative chart latest-values within run-to-run variance; the one missing series (lance TPC-H S3 SF=10) is a 6-months-stale series correctly outside v4's chart window; remaining structural diffs are the intentional v3-parity compression-group shape + v2's empty config placeholders.
1358+
- Review: 2-vote (preset=pr-2, fresh + correctness), executor=parallel (each lens on BOTH Claude and Codex). Real `Skill(spiral:gauntlet)` v0.5.3 with the synthesizer subagent each cycle. **Cycle 1** reject (4 must-fix): branch-ordinal hardening + NULLS-LAST/successor/backfill/discovery-oracle tests + INCLUDE/DESC index pins. **Cycle 2** reject (2 must-fix): operator-doc drift (migrations/README.md + migrate-schema.py 002/004/005 list stale vs 006/007's new marker) + a non-deterministic all-NULL fallback arm (now LEFT-joins commits + ORDER BY c.timestamp DESC NULLS LAST). **Cycle 3** reject (2 must-fix): sibling operator-runbook drift (infra/README.md + provision.sh master-apply list) — fixed; plus should-fixes (006/007 marker test, re-stamp privilege note, count-over-all-rows test). Notable: the executor-asymmetry held — across cycles 2/3 the Codex sides surfaced the operator-doc-drift blockers the Claude sides cleared or did not visit.
1359+
- Cycle cap: accepted at the project ~3-cycle cap. Cycle-3 must-fixes were operator-doc + test-only (zero production-logic change) and are all resolved at `b20904ae0`; per the spine's anti-spiral Key decision the loop was not spiraled into a 4th review. Production logic was accepted by 3 of 4 reviewers across two clean cycles before cycle 3.
1360+
- Confidence: high. Acceptance criteria met: prod EXPLAIN/timings captured (above) show index-descent plans + sub-second charts; `/api/groups` returns in ~1s cold and its slug list + ~5 chart slugs match the live v2 site (resolves both PR-5.0-deferred checks); migration 006/007 apply cleanly (testcontainer green) and the sargable + skip-scan rewrites are pinned semantically identical; vitest + build + lint + ruff + pytest green; post-upsize FreeableMemory healthy.
1361+
- Deferred items (nits/should-fixes, cycle triage): derive `SUMMARY_CONCURRENCY` from pool config; share `sargableDimEq`/`QueryParams` with `summary.ts`; a `mapWithConcurrency` unit test + poolMax default pin; summary-path coverage of the remaining groupPred pin combinations; trim the summary.ts skip-scan comment block (documents load-bearing non-obvious planner behavior, low priority). `deferred_items_total` 17 -> 22.
1362+
- Surprises: (mechanics) the 006 backfill UPDATE bloated `query_measurements` enough that the planner ignored the new indexes until a `VACUUM ANALYZE` — a real lesson, now documented in migrations/README.md. (process) prototyped all three skip scans against a 2.1M-row synthetic `postgres:16-alpine` container before writing TS, which surfaced the two load-bearing planner gotchas (row-comparisons aren't btree quals past index column 1; `IS NULL` pins don't reduce ORDER BY pathkeys) that shaped the final SQL. (process) `schema-deploy.yml` is develop-only so the ct/bench-v4 pushes only fired `web-deploy.yml`; at the develop merge schema-deploy no-ops 006/007 (already in the prod ledger, master-pre-applied like 005).
1363+
13531364
## Superseded (re-planned) phase-end must-fix items — Phase 1: RDS + schema + hash port — cycle 1
13541365

13551366
| Severity | File:line | Description | Implicated PR | Resolved |

0 commit comments

Comments
 (0)