From 10dbe3103676857d25d0cb3795cdeb76d607108f Mon Sep 17 00:00:00 2001
From: Anthony Costanzo <acostanzo@users.noreply.github.com>
Date: Mon, 27 Apr 2026 03:52:20 +0000
Subject: [PATCH 1/2] =?UTF-8?q?docs(pronto):=20file=20H4=20ticket=20?=
 =?UTF-8?q?=E2=80=94=20observations-aware=20scorer=20(open=20questions=20i?=
 =?UTF-8?q?nside)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 .../phase-2-h4-observations-aware-scorer.md   | 153 ++++++++++++++++++
 1 file changed, 153 insertions(+)
 create mode 100644 project/tickets/open/phase-2-h4-observations-aware-scorer.md
diff --git a/project/tickets/open/phase-2-h4-observations-aware-scorer.md b/project/tickets/open/phase-2-h4-observations-aware-scorer.md
new file mode 100644
index 0000000..dcc6584
--- /dev/null
+++ b/project/tickets/open/phase-2-h4-observations-aware-scorer.md
@@ -0,0 +1,153 @@
+---
+id: h4
+plan: phase-2-pronto
+status: open
+updated: 2026-04-26
+---
+
+# H4 — Observations-aware scorer in pronto
+
+## Scope
+
+H3 (merged) bumped the wire contract to schema 2 and specified `observations[]` as the rubric-scoring channel. Without H4, siblings can emit `observations[]` per the new contract and pronto's scorers don't know what to do with them — the architecture exists on paper but doesn't run. New Phase 2 siblings (2a/2b/2c) all ship emitting `observations[]` from day one, so H4 is on the critical path before any sibling PR.
+
+This ticket extends pronto's scoring path to:
+
+- Read `observations[]` from a sibling's audit JSON.
+- Look up the per-observation translation rule in `rubric.md` (keyed on observation `id`).
+- Apply the rule to produce a 0–100 dimension score (`ratio >= 0.8 → 80/100`, count threshold ladders, presence boolean mapping, score passthrough).
+- Fall back to the legacy `composite_score` field via the back-compat passthrough rule from ADR-005 §3 when `observations[]` is absent — treats the v1 `composite_score` as a single coarse observation of `kind: score` and lets it through unchanged.
+
+## Architecture
+
+A pre-implementation plan agent surveyed the current scoring path (SKILL.md Phase 4.1 + Phase 5, the existing `score-<sibling>.sh` scorers, the test harness) and produced the recommendations below. Path comparisons explicitly considered: inline jq in SKILL.md, folding into `compose-composite.sh`, and inlining into each per-sibling scorer. None of those preserve the observe-vs-score split ADR-005 §3 ratifies; a standalone shell helper is the cleanest cut.
+
+### New helper: `plugins/pronto/agents/parsers/scorers/observations-to-score.sh`
+
+The translator. Takes `<dimension-slug> <scorer-json-path>`, reads the rubric stanza for that dimension, applies the per-observation rules, and emits to stdout:
+
+```json
+{
+  "composite_score": 78,
+  "observations_applied": [
+    { "id": "claude-md-redundancy-ratio", "kind": "ratio", "score": 70, "rule": "ladder" }
+  ],
+  "passthrough_used": false,
+  "dropped": []
+}
+```
+
+`SKILL.md` Phase 4.1 captures the scorer's stdout exactly as today (the H2d direct-shell dispatch shape), then pipes that JSON through `observations-to-score.sh`, takes its `composite_score` as the dimension score, and folds entries from `dropped[]` into `sibling_integration_notes`. The `passthrough_used` flag travels through unchanged for visibility but no special handling. Pure shell + jq for arithmetic, plus a YAML extractor for the rubric stanzas (see open question Q1 below).
+
+### `rubric.md` shape — per-observation translation rules
+
+Add a new section `## Observation translation rules` after the existing `## Mechanical vs judgment split`. Per-dimension stanzas live next to the rubric row that owns them. Each stanza is fenced YAML:
+
+````markdown
+### `claude-code-config` translation rules
+
+```yaml
+observations:
+  - id: claude-md-redundancy-ratio
+    kind: ratio
+    rule: ladder
+    bands:
+      - { gte: 0.20, score: 40 }
+      - { gte: 0.10, score: 70 }
+      - { gte: 0.05, score: 85 }
+      - { else: 100 }
+    weight: 0.20
+  - id: mcp-server-count
+    kind: count
+    rule: ladder
+    bands:
+      - { gte: 6, score: 50 }
+      - { gte: 1, score: 100 }
+      - { else: 0 }
+    weight: 0.15
+default_rule: passthrough   # for kind: score observations with no explicit rule
+```
+````
+
+`presence` rules are `{rule: boolean, present: 100, absent: 0}`. `score` rules are `{rule: passthrough}`.
+
+H4 ships stanzas only for the three currently parser-driven dimensions: `claude-code-config`, `skills-quality`, `commit-hygiene`. Phase 2 sibling PRs (2a/2b/2c) add stanzas for their own dimensions in their own work.
+
+### Behavior on missing rubric rule
+
+When an observation's `id` has no matching rubric rule, drop the observation and record it in `sibling_integration_notes` (`"<plugin>:<dimension>: dropped observation '<id>' (no rubric rule registered)"`). Score the dimension from the *remaining* observations. If after dropping there are zero observations, fall through to legacy `composite_score` passthrough; if no `composite_score` either, degrade to presence-cap.
+
+Rationale: matches the contract's existing posture for unknown `kind` and missing-required-field cases (the H3 doc at `Validation` says "drop that entry, record the drop in sibling_integration_notes, continue scoring with the remaining observations"). Falling back to `score: 0` would punish siblings for shipping a new observation faster than pronto's rubric updates. Falling back to legacy `score` per-observation would make rule-drift undetectable.
+
+## Implementation order
+
+1. **`plugins/pronto/references/rubric.md`** — add the `## Observation translation rules` section with stanzas for `claude-code-config`, `skills-quality`, `commit-hygiene`. Stanzas are stub-but-syntactically-complete (real values calibrated against current scorer behavior).
+2. **`plugins/pronto/agents/parsers/scorers/observations-to-score.sh`** — new helper per the contract above.
+3. **`plugins/pronto/agents/parsers/scorers/observations-to-score.test.sh`** — exhaustive cases: each ratio band edge, count ladder, presence true/false, score passthrough, missing rule (drop + warn), all-dropped fallback, both `observations[]` and `composite_score` present (prefers observations), v1 payload (uses passthrough). Following the `compatible-pronto-check.test.sh` `expect_branch` pattern.
+4. **`plugins/pronto/skills/audit/SKILL.md` Phase 4.1** — insert one paragraph between "Capture stdout" and "Validate": pipe scorer JSON through `observations-to-score.sh`, take its `composite_score`, append `dropped[]` entries to `sibling_integration_notes`.
+5. **`plugins/pronto/agents/parsers/scorers/score-fixture-observations.sh`** — synthetic fixture script emitting a v2 payload with hand-crafted `observations[]` covering all four kinds. Used by the unit suite, not by the eval harness.
+6. **Eval harness on `mid` fixture** — verify composite stddev still ≤ 1.0 and per-dimension means within ±0.5 of the H2d-closeout baseline (composite=61, all dimensions stddev=0). Shipped scorers still emit v1 today, so this run exercises the passthrough rule on every dimension; byte-equivalence to pre-H4 is the key invariant.
+
+## Open questions (need sign-off before implementation)
+
+These are real architectural choices that warrant Anthony's call before code lands. Recommendations are mine; defer if disagreed.
+
+### Q1. `yq` as a runtime dependency
+
+The fenced YAML stanzas in `rubric.md` need a YAML→JSON step in `observations-to-score.sh`. Two paths:
+
+- **(a) Add `yq` to pronto's runtime deps.** `yq` is already in batdev's toolchain. Cleanest extractor; one tool, well-trodden CLI shape.
+- **(b) Hand-rolled awk/jq YAML→JSON for the rule subset we use.** No new dep, but more code to test and maintain; deviations from spec become silent extractor bugs.
+
+**Recommendation: (a) — add `yq`.** Plugin runtime deps already include `jq`; adding `yq` is incremental, not categorical. Hand-rolled YAML extraction in shell is the kind of code that bites later.
+
+### Q2. Per-observation weights vs equal weights within a dimension
+
+The example schema gives each observation an explicit `weight` summing to 1.0 within the dimension. Alternative: equal weights, derived (one observation → 1.0; two → 0.5 each; etc.). Explicit weights are more flexible but more rules to maintain; equal weights are simpler but mean adding an observation rebalances all the others.
+
+**Recommendation: explicit weights.** Pronto's existing per-dimension rubric weights (in the `rubric.md` table) are explicit; matching that shape internally to the dimension keeps the surface consistent. Sibling PRs that add observations will need to set weights anyway; equal-weights would force them to not.
+
+### Q3. Surfacing `passthrough_used` in the audit report
+
+When a sibling emits v1 (no observations), `passthrough_used: true` flows through. Should pronto:
+
+- **(a) Always surface in `sibling_integration_notes`** ("\<plugin\>: scored via legacy passthrough — no observations[] emitted").
+- **(b) Gate behind a verbose flag** (`--explain` or similar).
+- **(c) Not surface at all** — passthrough is the steady-state for in-flight migration.
+
+**Recommendation: (b).** Until 2a/2b/2c ship, every audit will have three (claudit, skillet, commventional) passthroughs in the notes — that's noise on every report. Gate behind a verbose flag during the migration window; surface unconditionally once a deprecation policy is set.
+
+### Q4. Stanza coverage in this ticket
+
+Should H4 add observation-rule stanzas to `rubric.md` for *all* eight rubric dimensions, or only the three currently parser-driven (`claude-code-config`, `skills-quality`, `commit-hygiene`)?
+
+**Recommendation: only the three currently parser-driven.** Phase 2 sibling PRs (2a/2b/2c) own their own dimension's stanza as part of their tickets — that's the per-PR ownership pattern. Adding stubs for `code-documentation`, `lint-posture`, `event-emission` here would land empty rules that would either need calibration before the sibling ships (out-of-order work) or shadowed-rules placeholder code in the translator (unnecessary complexity).
+
+`agents-md` and `project-record` are kernel- and avanti-scored respectively and don't go through observations; they don't need stanzas.
+
+## Acceptance
+
+- Fixture with a sibling emitting `observations[]` produces a deterministic dimension score via the new path (synthetic fixture exercises this).
+- Fixture with a sibling emitting only the legacy `composite_score` field produces the same score it does today via the passthrough.
+- Fixture with both present prefers `observations[]`.
+- Eval harness on the existing `mid` fixture set: composite stddev still ≤ 1.0 *and* per-dimension means within ±0.5 of the H2d-closeout baseline (composite=61, all dimensions stddev=0). Byte-equivalence to pre-H4 is the real invariant — passthrough must not perturb shipped-sibling scoring.
+- Unit suite (`observations-to-score.test.sh`) passes with all branches covered.
+
+## Estimated scope
+
+**Medium.** Three files of meaningful new code (helper + tests + synthetic fixture), one section addition to `rubric.md`, one paragraph edit to `SKILL.md`, plus a harness run. Not small because the translator is real logic with four `kind` branches and a fallback ladder. Not large because no new dispatch surface, no sibling-side changes, and synthetic test fixtures don't require Phase 2 sibling work.
+
+## Out of scope
+
+- Phase 2 sibling PRs (2a/2b/2c) ship their own observation stanzas and emit `observations[]` against this scorer.
+- Already-shipped siblings (claudit, skillet, commventional) keep emitting v1 — they ride passthrough until their own work cycle migrates them.
+- A formal deprecation policy for the v1 passthrough rule (when does v1 stop being accepted?). Plan-level concern; not a Phase 2 ticket.
+
+## References
+
+- `project/plans/active/phase-2-pronto.md` — H4 sits in the Hardening group; closes after H3
+- `project/adrs/005-sibling-skill-conventions.md` §3 — the architectural source of truth for observations + passthrough
+- `project/tickets/closed/phase-2-h3-wire-contract-schema-2.md` — the wire-contract spec H4 consumes
+- `plugins/pronto/references/sibling-audit-contract.md` — the v2 contract doc
+- `plugins/pronto/agents/parsers/scorers/compatible-pronto-check.test.sh` — test pattern to follow for `observations-to-score.test.sh`
+- `plugins/pronto/skills/audit/SKILL.md` Phase 4.1 — current scoring path the translator slots into

From 2379698dc35eb5142e7845279b69f2ae9b4e4c8c Mon Sep 17 00:00:00 2001
From: Anthony Costanzo <acostanzo@users.noreply.github.com>
Date: Mon, 27 Apr 2026 15:43:57 +0000
Subject: [PATCH 2/2] docs(phase-2): record H4 decisions and add legacy-sibling
 migration arc
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Q1-Q4 on the H4 ticket were decided 2026-04-27. Encoding the agreed
answers and dropping the open-questions framing:

- Q1 rules format: JSON fenced in rubric.md, not YAML. Drops the yq
  runtime dep entirely. jq parses the rules directly.
- Q2 weights: equal-share default (1/n), explicit `weight` field
  remains an opt-in override. Mixed configurations are rejected by
  the stanza loader.
- Q3 passthrough surfacing: single summary line in
  sibling_integration_notes, always on. "N/M siblings scored via
  legacy passthrough — observations[] migration pending." The
  invisible-migration concern outweighs the per-report-noise
  concern when the surface is one line, not three.
- Q4 stanza coverage: three parser-driven dimensions only
  (claude-code-config, skills-quality, commit-hygiene) — unchanged
  from original recommendation.

Plan also gains a "Post-Phase-2 — legacy sibling migration" section
with M1/M2/M3 tickets for migrating claudit, skillet, and
commventional off the v1 wire shape onto observations[]. These
don't gate Phase 2 closure but they're tracked on the roadmap so
the arc is visible end-to-end. Closing trigger for the back-compat
passthrough rule deprecation: passthrough-count line reads 0/3.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 project/plans/active/phase-2-pronto.md        | 24 +++++
 .../phase-2-h4-observations-aware-scorer.md   | 98 ++++++++++---------
 2 files changed, 74 insertions(+), 48 deletions(-)

diff --git a/project/plans/active/phase-2-pronto.md b/project/plans/active/phase-2-pronto.md
index 8583ec1..28e3ba1 100644
--- a/project/plans/active/phase-2-pronto.md
+++ b/project/plans/active/phase-2-pronto.md
@@ -190,6 +190,30 @@ Full depth: composite across (a) structured logging ratio, (b) metrics presence,
 
 ---
 
+## Post-Phase-2 — legacy sibling migration
+
+The three siblings shipping today (claudit, skillet, commventional) emit the v1 wire shape (`composite_score`, no `observations[]`). H4's scorer handles them via the back-compat passthrough rule from ADR-005 §3, so they continue to score correctly through the transition. But passthrough is a transitional posture, not a destination — the architectural goal is every sibling emitting `observations[]` so pronto's rubric translation rules govern *all* scoring uniformly.
+
+H4 surfaces the count of legacy-passthrough siblings as a single line in `sibling_integration_notes` (`N/M siblings scored via legacy passthrough — observations[] migration pending`). When that count reaches `0/M`, the migration is complete and the back-compat passthrough rule itself can be deprecated.
+
+These migrations don't gate Phase 2 closure (they're closing-out commitments, not phase deliverables) — but they're tracked here so the roadmap sees the whole arc.
+
+### Migration tickets
+
+| Ticket | Plugin | Dimension | Notes |
+|---|---|---|---|
+| **M1** | claudit | `claude-code-config` | Refactor `:audit` skill to emit `observations[]` against the H4 stanzas. Eval invariant: byte-equivalent scoring on existing fixtures. |
+| **M2** | skillet | `skills-quality` | Same shape as M1. |
+| **M3** | commventional | `commit-hygiene` | Same shape as M1. |
+
+All three are structurally identical: replace the v1 emit with an `observations[]` emit whose IDs match the H4 rubric stanzas for that dimension. Each plugin's scorer logic is preserved; only the wire shape changes. Can land in any order, in parallel, in their own work cycles.
+
+### Closing-out trigger — deprecate the passthrough rule
+
+Once M1+M2+M3 ship and the passthrough-count line reads `0/3`, file a follow-up ticket to deprecate the back-compat passthrough rule in pronto: stop accepting v1 payloads, fail loudly on `composite_score` without `observations[]`. That deprecation is a separate work cycle and its acceptance is *outside* Phase 2.
+
+---
+
 ## Links
 
 - Pronto meta: `project/plans/active/phase-1-pronto.md`, `project/plans/active/phase-1-5-pronto.md`
diff --git a/project/tickets/open/phase-2-h4-observations-aware-scorer.md b/project/tickets/open/phase-2-h4-observations-aware-scorer.md
index dcc6584..2fcf711 100644
--- a/project/tickets/open/phase-2-h4-observations-aware-scorer.md
+++ b/project/tickets/open/phase-2-h4-observations-aware-scorer.md
@@ -2,7 +2,7 @@
 id: h4
 plan: phase-2-pronto
 status: open
-updated: 2026-04-26
+updated: 2026-04-27
 ---
 
 # H4 — Observations-aware scorer in pronto
@@ -37,39 +37,46 @@ The translator. Takes `<dimension-slug> <scorer-json-path>`, reads the rubric st
 }
 ```
 
-`SKILL.md` Phase 4.1 captures the scorer's stdout exactly as today (the H2d direct-shell dispatch shape), then pipes that JSON through `observations-to-score.sh`, takes its `composite_score` as the dimension score, and folds entries from `dropped[]` into `sibling_integration_notes`. The `passthrough_used` flag travels through unchanged for visibility but no special handling. Pure shell + jq for arithmetic, plus a YAML extractor for the rubric stanzas (see open question Q1 below).
+`SKILL.md` Phase 4.1 captures the scorer's stdout exactly as today (the H2d direct-shell dispatch shape), then pipes that JSON through `observations-to-score.sh`, takes its `composite_score` as the dimension score, and folds entries from `dropped[]` into `sibling_integration_notes`. The translator also accumulates a passthrough count for the audit-level summary line (see Decision Q3). Pure shell + jq throughout — rules are JSON, no YAML conversion step needed.
 
 ### `rubric.md` shape — per-observation translation rules
 
-Add a new section `## Observation translation rules` after the existing `## Mechanical vs judgment split`. Per-dimension stanzas live next to the rubric row that owns them. Each stanza is fenced YAML:
+Add a new section `## Observation translation rules` after the existing `## Mechanical vs judgment split`. Per-dimension stanzas live next to the rubric row that owns them. Each stanza is fenced JSON (parsed by `jq` directly — see Decision Q1):
 
 ````markdown
 ### `claude-code-config` translation rules
 
-```yaml
-observations:
-  - id: claude-md-redundancy-ratio
-    kind: ratio
-    rule: ladder
-    bands:
-      - { gte: 0.20, score: 40 }
-      - { gte: 0.10, score: 70 }
-      - { gte: 0.05, score: 85 }
-      - { else: 100 }
-    weight: 0.20
-  - id: mcp-server-count
-    kind: count
-    rule: ladder
-    bands:
-      - { gte: 6, score: 50 }
-      - { gte: 1, score: 100 }
-      - { else: 0 }
-    weight: 0.15
-default_rule: passthrough   # for kind: score observations with no explicit rule
+```json
+{
+  "observations": [
+    {
+      "id": "claude-md-redundancy-ratio",
+      "kind": "ratio",
+      "rule": "ladder",
+      "bands": [
+        { "gte": 0.20, "score": 40 },
+        { "gte": 0.10, "score": 70 },
+        { "gte": 0.05, "score": 85 },
+        { "else": 100 }
+      ]
+    },
+    {
+      "id": "mcp-server-count",
+      "kind": "count",
+      "rule": "ladder",
+      "bands": [
+        { "gte": 6, "score": 50 },
+        { "gte": 1, "score": 100 },
+        { "else": 0 }
+      ]
+    }
+  ],
+  "default_rule": "passthrough"
+}
 ```
 ````
 
-`presence` rules are `{rule: boolean, present: 100, absent: 0}`. `score` rules are `{rule: passthrough}`.
+`presence` rules are `{"rule": "boolean", "present": 100, "absent": 0}`. `score` rules are `{"rule": "passthrough"}`. `weight` is optional per observation; absent → equal-weight share within the dimension (see Decision Q2). Comments live in the markdown surrounding the JSON fence, not in the JSON itself.
 
 H4 ships stanzas only for the three currently parser-driven dimensions: `claude-code-config`, `skills-quality`, `commit-hygiene`. Phase 2 sibling PRs (2a/2b/2c) add stanzas for their own dimensions in their own work.
 
@@ -88,42 +95,37 @@ Rationale: matches the contract's existing posture for unknown `kind` and missin
 5. **`plugins/pronto/agents/parsers/scorers/score-fixture-observations.sh`** — synthetic fixture script emitting a v2 payload with hand-crafted `observations[]` covering all four kinds. Used by the unit suite, not by the eval harness.
 6. **Eval harness on `mid` fixture** — verify composite stddev still ≤ 1.0 and per-dimension means within ±0.5 of the H2d-closeout baseline (composite=61, all dimensions stddev=0). Shipped scorers still emit v1 today, so this run exercises the passthrough rule on every dimension; byte-equivalence to pre-H4 is the key invariant.
 
-## Open questions (need sign-off before implementation)
-
-These are real architectural choices that warrant Anthony's call before code lands. Recommendations are mine; defer if disagreed.
-
-### Q1. `yq` as a runtime dependency
+## Decisions
 
-The fenced YAML stanzas in `rubric.md` need a YAML→JSON step in `observations-to-score.sh`. Two paths:
+The four architectural questions originally filed against this ticket were decided by Anthony on 2026-04-27. The agreed answers are encoded above; this section is the audit trail.
 
-- **(a) Add `yq` to pronto's runtime deps.** `yq` is already in batdev's toolchain. Cleanest extractor; one tool, well-trodden CLI shape.
-- **(b) Hand-rolled awk/jq YAML→JSON for the rule subset we use.** No new dep, but more code to test and maintain; deviations from spec become silent extractor bugs.
+### Q1. Rules format — JSON, not YAML
 
-**Recommendation: (a) — add `yq`.** Plugin runtime deps already include `jq`; adding `yq` is incremental, not categorical. Hand-rolled YAML extraction in shell is the kind of code that bites later.
+**Decided: rules are JSON fenced inside `rubric.md`.** Original recommendation was YAML with a new `yq` runtime dependency (cleaner human editing). Anthony pushed back: pronto already parses JSON via `jq`; adding `yq` is a categorical not incremental dep; the only editors are him and me, and neither of us suffers over JSON braces. Co-location in `rubric.md` is preserved (the dimension stanza sits next to the rubric prose for that dimension); inline comments move to the surrounding markdown, where they're more discoverable anyway.
 
-### Q2. Per-observation weights vs equal weights within a dimension
+Net effect: drop `yq` entirely. `observations-to-score.sh` extracts the JSON fences from `rubric.md` with a small awk/sed step (or a markdown-aware extraction helper) and pipes straight into `jq` for evaluation.
 
-The example schema gives each observation an explicit `weight` summing to 1.0 within the dimension. Alternative: equal weights, derived (one observation → 1.0; two → 0.5 each; etc.). Explicit weights are more flexible but more rules to maintain; equal weights are simpler but mean adding an observation rebalances all the others.
+### Q2. Weights — equal-share default, explicit weights as opt-in
 
-**Recommendation: explicit weights.** Pronto's existing per-dimension rubric weights (in the `rubric.md` table) are explicit; matching that shape internally to the dimension keeps the surface consistent. Sibling PRs that add observations will need to set weights anyway; equal-weights would force them to not.
+**Decided: equal weights derived from `1/n` are the default; an observation may opt in to explicit `weight` to override.** Original recommendation was always-explicit weights matching the rubric table's per-dimension weight shape. The pushback: that table is at *dimension* level, not *observation* level — different scope, different math, internal consistency at one level doesn't require it at the next. At our scale (~2–4 observations per dimension) the rebalancing cost of explicit weights outweighs the tuning benefit. Equal-weight default keeps sibling PRs friction-free; explicit weights remain available when a dimension genuinely needs to express dominance.
 
-### Q3. Surfacing `passthrough_used` in the audit report
+The translator treats absent `weight` as `1/n` where `n` is the count of *kept* observations after drops. Mixed (some explicit, some absent) is a configuration error and rejected by the translator's stanza loader.
 
-When a sibling emits v1 (no observations), `passthrough_used: true` flows through. Should pronto:
+### Q3. Passthrough surfacing — single summary line, always on
 
-- **(a) Always surface in `sibling_integration_notes`** ("\<plugin\>: scored via legacy passthrough — no observations[] emitted").
-- **(b) Gate behind a verbose flag** (`--explain` or similar).
-- **(c) Not surface at all** — passthrough is the steady-state for in-flight migration.
+**Decided: surface a single summary line in `sibling_integration_notes` reporting the passthrough count, always on.** Original recommendation was to gate behind a verbose flag. Anthony's read: invisible passthroughs make the migration invisible — six months later you might still have half the fleet on v1 and never notice from reading reports.
 
-**Recommendation: (b).** Until 2a/2b/2c ship, every audit will have three (claudit, skillet, commventional) passthroughs in the notes — that's noise on every report. Gate behind a verbose flag during the migration window; surface unconditionally once a deprecation policy is set.
+Format: `N/M siblings scored via legacy passthrough — observations[] migration pending`. When `N` reaches `0`, the migration is complete; that's the trigger point to file a follow-up deprecating the back-compat passthrough rule itself.
 
-### Q4. Stanza coverage in this ticket
+This replaces the per-sibling warning shape (which would noise every report with three lines today) with a single trend-tracking line that decreases as siblings migrate. Concrete, low-noise, always visible.
 
-Should H4 add observation-rule stanzas to `rubric.md` for *all* eight rubric dimensions, or only the three currently parser-driven (`claude-code-config`, `skills-quality`, `commit-hygiene`)?
+### Q4. Stanza coverage — three parser-driven dimensions only
 
-**Recommendation: only the three currently parser-driven.** Phase 2 sibling PRs (2a/2b/2c) own their own dimension's stanza as part of their tickets — that's the per-PR ownership pattern. Adding stubs for `code-documentation`, `lint-posture`, `event-emission` here would land empty rules that would either need calibration before the sibling ships (out-of-order work) or shadowed-rules placeholder code in the translator (unnecessary complexity).
+**Decided: only `claude-code-config`, `skills-quality`, and `commit-hygiene` get stanzas in this ticket.** Reasoning unchanged from the original recommendation:
 
-`agents-md` and `project-record` are kernel- and avanti-scored respectively and don't go through observations; they don't need stanzas.
+- The other observation-using dimensions (`code-documentation`, `lint-posture`, `event-emission`) are exactly what Phase 2 sibling PRs (2a/2b/2c) introduce. Each of those PRs owns its dimension's stanza — that's the per-PR ownership pattern, and pre-writing those stanzas here means making decisions inside another ticket's scope and calibrating against behavior that doesn't exist yet.
+- `agents-md` and `project-record` are kernel- and avanti-scored respectively and don't go through observations; they don't need stanzas.
+- The three covered dimensions have shipped siblings emitting v1 today, so their stanzas can be calibrated against current scorer behavior — the rules will produce identical scores to today's path on day one (the passthrough invariant).
 
 ## Acceptance
 
@@ -140,8 +142,8 @@ Should H4 add observation-rule stanzas to `rubric.md` for *all* eight rubric dim
 ## Out of scope
 
 - Phase 2 sibling PRs (2a/2b/2c) ship their own observation stanzas and emit `observations[]` against this scorer.
-- Already-shipped siblings (claudit, skillet, commventional) keep emitting v1 — they ride passthrough until their own work cycle migrates them.
-- A formal deprecation policy for the v1 passthrough rule (when does v1 stop being accepted?). Plan-level concern; not a Phase 2 ticket.
+- Already-shipped siblings (claudit, skillet, commventional) keep emitting v1 — they ride passthrough until tickets M1/M2/M3 migrate them. Those tickets are tracked in `phase-2-pronto.md` under "Post-Phase-2 — legacy sibling migration."
+- The follow-up that deprecates the back-compat passthrough rule itself (stop accepting v1 payloads) — fires once M1/M2/M3 ship and the passthrough-count line reads `0/3`. Separate work cycle.
 
 ## References