feat(m2): reporter compare mode + crucible compare --html#5
Open
suzuke wants to merge 3 commits into
Open
Conversation
Side-by-side static HTML for two ledgers — useful for "greedy vs bfts-lite on the same example" demo-gate comparisons. Strict read-only: no orchestrator changes, no ledger mutation, no config normalization. Renderer: `crucible.reporter.compare.render_comparison_html(left, right, *, left_label, right_label, …)`. Reuses html_tree's `_render_tree` / `_render_summary` / `_best_node_id` / `_color_for` so the per-side cards look identical to the single-view report. CLI: `crucible compare a b --html [--html-out PATH]` writes `<project>/reports/compare-a-vs-b.html` by default. `--right-project DIR` opts into cross-project comparison (e.g. compress-greedy workspace vs compress-bfts workspace from M1b demo gate). Cross-project default output is cwd to avoid writing into the wrong project. Reviewer round 1 verdict: ACCEPT with constraints — all addressed: - Missing data → "n/a" / empty panel, never silently zero - Δ line shown ONLY when both sides agree on metric direction (and both bests exist); otherwise omitted (no auto-winner verdict) - Output path: explicit `--html-out` or predictable default - Strict read-only: no writes anywhere outside the report file - Renderer extraction: kept html_tree.py as stable single-view facade, compare.py imports underscore helpers without changing their API Tests: 11 new in test_reporter_compare.py + 4 new CLI tests in test_cli.py. Full suite: 2413 passed / 4 skipped, 0 regressions over M2 PR 10 baseline (2397). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reviewer round 2 REJECTED the original PR 11 because both ledgers in a compare HTML normally share AttemptNode ids (n000001, n000002…), so rendering two trees produced duplicate `id="n000001"` elements and ambiguous `href="#n000001"` anchors. Fixed by namespacing every DOM id and intra-document anchor with a side-scoped prefix. Changes: - `_render_tree`, `_render_card`, `_render_summary` accept `anchor_prefix: str = ""` (kwarg-only). Default empty → single-view output unchanged. - `compare.py` passes `"left-"` / `"right-"` so `id="left-n000001"` and `id="right-n000001"` coexist; parent links and best-summary links use the same prefixed anchors. Display text remains the bare node id — the prefix is implementation detail, not user-facing. Tests: - Existing compare tests updated to assert side-scoped anchors AND that bare ids (which would collide) do NOT appear. - 2 new dedicated tests: `test_compare_dom_ids_are_unique_per_side` (no collision across 3-node × 2-side ledger) and `test_compare_best_link_uses_side_anchor` (best-link clicks land on the same-side card). - HTML validator tightened to assert `not p.tags_open` at EOF (reviewer non-blocker — catches stray unclosed tags). Full suite: 2415 passed / 4 skipped, 0 regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First demo where BFTS-lite materially outperforms greedy because greedy hits max_retries=5 hard-stop while BFTS keeps exploring via BranchFrom + doom-loop pruning. Greedy: 9 iter, best 2.2528, stopped at 5-consecutive-failure wall BFTS: 30 iter, best 2.5013, clean max_iterations stop Total: $2.05, ~55 min wall (parallel runs) BFTS ledger shows 6 BranchFrom events and 4 nodes explicitly pruned by the M2 PR 10 doom-loop seam (n3, n21, n20, n19 each had 3 trailing failures → pruned from candidate set). Best result (2.5013 at iter 21) came from a deep path n1→n2→n9→n12→n13→n14→n17→n19→n20→n21 — 10 levels deep, well beyond what greedy reached before its hard-stop. Compare HTML rendered via the new `crucible compare --html` (M2 PR 11); file at /tmp/m2-30-compare.html locally, not committed (126 KB). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 25, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Stacked on #4 (M2 PR 10 doom-loop). Adds
crucible compare a b --html— a side-by-side static HTML report for two ledgers. Useful for "greedy vs bfts-lite on the same example" demo-gate comparisons.What's new
Renderer
crucible.reporter.compare.render_comparison_html(left, right, *, left_label, right_label, …)— outputs a single self-contained HTML doc with two columns, each rendered using the existing tree-view from M1b. Reuses_render_tree/_render_summary/_best_node_id/_color_forso the per-side cards match the single-view report.id="..."andhref="#..."in compare mode is namespaced withleft-/right-prefixes so two trees with identical AttemptNode ids (n000001,n000002…) coexist in one document without collisions. Single-view output is bit-identical to before this PR (defaultanchor_prefix="").CLI
crucible compare a b --html [--html-out PATH]— writes<project>/reports/compare-a-vs-b.htmlby default.--right-project DIR— opt-in cross-project compare (e.g.compress-greedy/workspace vscompress-bfts/workspace from M1b demo gate). Cross-project output defaults to cwd to avoid writing into the wrong project.Strict read-only
No orchestrator changes. No ledger mutation. No config normalization. The only file written is the rendered HTML at
--html-out(or its default).Reviewer trail
n/a; Δ only when directions agree; explicit output path; strict read-only)href="#n000001"ambiguous. Required side-anchor namespacing.anchor_prefixkwarg on shared helpers; single-view byte-identical; comprehensive uniqueness tests added.Stats
test_reporter_compare.py(covering uniqueness, Δ rules, label escaping, custom title, per-side direction) + 4 new CLI testsTest plan
Unit (
test_reporter_compare.py, 13 cases)href<script>→<script>)<title>and<h1>CLI (
test_cli.py, 4 new cases)crucible compare a b --htmlwrites to default<project>/reports/compare-a-vs-b.html--html-out PATHhonoured--right-projectrequires--html(rejected otherwise)Manual smoke
★ best, parent links resolve to same-side cards.End-to-end on real ledgers (M2 demo gate)
crucible compare m2-30 m2-30 --html --project-dir .../compress-greedy --right-project .../compress-bftsrendered the M2 30-iter demo gate's two ledgers into a single 126 KB HTML doc. Greedy's 9-node linear chain and BFTS's 30-node branching tree visibly contrast; Δ best metric line showsright − left = +0.2485(raw arithmetic delta, no winner verdict). DOM ids correctly namespaced asleft-nXXXXX/right-nXXXXX— no anchor collisions despite both ledgers using the same id range. Seedocs/M2-DEMO-GATE.md§4 for screenshots/details.Known limitations (non-blockers)
n/aif added later.🤖 Generated with Claude Code