Plan: move PR 14 earlier (prerequisite for blob tests) + add handoff doc

jucor · claude · jucor · commit bf2dd99a6f3b · 2026-03-19T10:43:38.000Z
PR 14 (vectorized code refactor) is now a prerequisite for all formula
fix PRs, not a post-parity cleanup. It branches off jc/clj-parity-d9-fix
(Stack 13) to make the vectorized production path readable and testable
against Clojure blob values. Remaining dead code cleanup split to PR 14b.

Handoff doc at delphi/docs/HANDOFF_PR14_VECTORIZED_REFACTOR.md.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/delphi/docs/HANDOFF_PR14_VECTORIZED_REFACTOR.md b/delphi/docs/HANDOFF_PR14_VECTORIZED_REFACTOR.md
@@ -0,0 +1,134 @@
+# Handoff: PR 14 — Make Vectorized Code Readable + Blob Injection Tests
+
+## Goal
+
+The scalar functions (`comment_stats`, `add_comparative_stats`, `repness_metric`,
+`finalize_cmt_stats`) read like a step-by-step recipe. Their vectorized replacement
+(`compute_group_comment_stats_df`) buries the same logic in 150 lines of DataFrame
+plumbing. The vectorized path is the ONLY production code path — the scalar functions
+are dead code, called only from tests and benchmarks.
+
+**PR 14 must make the vectorized code at least as readable as the scalar code, then
+delete the scalar functions.** This also enables vectorized blob injection tests —
+the only non-tautological way to verify correctness against the Clojure math blob.
+
+## Starting point
+
+Branch off `jc/clj-parity-d9-fix` (Stack 13). This is BELOW D5-D8 in the stack,
+so the refactor will be inherited by all formula fix PRs.
+
+## Current stack (as of 2026-03-17)
+
+```
+Stack 13: jc/clj-parity-d9-fix          ← branch off HERE for PR 14
+Stack 14: jc/clj-parity-d5-prop-test     (draft PR #2448)
+Stack 15: jc/clj-parity-d6-two-prop-test (draft PR #2449)
+Stack 16: jc/clj-parity-d7-repness-metric (draft PR #2450)
+Stack 17: jc/clj-parity-d8-finalize-stats (draft PR #2451)
+Stack 18: jc/clj-parity-d15-moderation-handling-zeros-vs-removes (PR #2452)
+Stack 19: jc/clj-parity-kmeans-k-divergence (k-divergence fix)
+Stack 20: jc/clj-parity-d10-rep-comment-selection
+Stack 21: jc/clj-parity-d11-consensus-comment-selection
+Stack 22: jc/clj-parity-d3-k-smoother-buffer
+Stack 23: jc/clj-parity-d12-comment-priorities
+Stack 24: jc/clj-parity-d1-pca-sign-flip-prevention
+Stack 25: jc/clj-parity-pr15-load-votes-sort
+```
+
+D5-D8 are draft PRs waiting for vectorized blob tests before being marked ready.
+
+## Task sequence
+
+### 1. Refactor `compute_group_comment_stats_df`
+
+Split into two phases:
+
+**(a) DataFrame construction** (the plumbing):
+- Build participant→group mapping
+- Compute total counts per comment
+- Create full (group, comment) cross-product index
+- Join total counts, compute "other" counts
+
+**(b) Statistics computation** (the readable part):
+A new function with clean DataFrame inputs (columns: `na`, `nd`, `ns`,
+`other_agree`, `other_disagree`, `other_votes`) → outputs (columns: `pa`, `pd`,
+`pat`, `pdt`, `ra`, `rd`, `rat`, `rdt`, `agree_metric`, `disagree_metric`, `repful`).
+
+This function should read like the scalar recipe:
+```python
+# Probabilities with pseudocounts
+df['pa'] = (df['na'] + PSEUDO_COUNT/2) / (df['ns'] + PSEUDO_COUNT)
+# Proportion tests
+df['pat'] = prop_test_vectorized(df['na'], df['ns'])
+# Representativeness ratios
+df['ra'] = df['pa'] / df['other_pa']
+# ...etc
+```
+
+### 2. Write vectorized blob injection tests
+
+Inject Clojure group memberships + votes into the new statistics function,
+compare output to blob values. Test the PRODUCTION code path.
+
+For each `(group, tid)` in the blob's `repness`:
+- Compare `pat` (or `pdt`) to blob `p-test`
+- Compare `rat` (or `rdt`) to blob `repness-test`
+- Compare `pa` (or `pd`) to blob `p-success`
+- Compare `repful` to blob `repful-for`
+
+The blob only stores the winning side's values. `repful-for` tells you which
+side won: if `agree`, then `n-success`=na, `p-test`=pat, `repness-test`=rat.
+If `disagree`, then `n-success`=nd, `p-test`=pdt, `repness-test`=rdt.
+
+### 3. Verify scalar ≡ vectorized
+
+Run both paths on all datasets, compare outputs field by field. Must be
+identical (not just close — exact match) since they implement the same formulas.
+
+### 4. Delete scalar functions
+
+Remove: `comment_stats`, `add_comparative_stats`, `repness_metric`,
+`finalize_cmt_stats`, and scalar `prop_test`/`two_prop_test` if no longer
+needed (the vectorized versions remain).
+
+Update all tests that called the scalar API.
+
+### 5. Cascade rebase
+
+Rebase D5→D6→D7→D8→D15→k-divergence→D10→D11→D3→D12→D1→PR15 on top.
+Use `.claude/skills/pr-stack/rebase-stack.sh` for the cascade.
+
+### 6. Add per-PR vectorized blob injection tests
+
+For each of D5, D6, D8: insert a RED blob test commit before the fix,
+verify it fails, then verify the fix makes it pass. Same TDD pattern used
+for the scalar blob tests already in those branches.
+
+Then mark #2448-#2451 as ready (un-draft).
+
+## Key blob context
+
+- `n-trials` in the Clojure blob = `S` (total seen, INCLUDING passes), not
+  `A+D` (agrees + disagrees). Verified: `prop_test(11, 14)` matches blob
+  `p-test` for tid=49 group 0 in vw (A=2, D=11, S=14, A+D=13).
+- `group-votes[gid].votes[tid]` = `{A: agrees, D: disagrees, S: total_seen}`
+- Blob `repness[gid][i]` fields: `n-success`, `n-trials`, `p-success`,
+  `p-test`, `repness` (=ra or rd), `repness-test` (=rat or rdt),
+  `repful-for`, `tid`, `best-agree` (optional).
+
+## Files to modify
+
+- `delphi/polismath/pca_kmeans_rep/repness.py` — the refactor
+- `delphi/tests/test_discrepancy_fixes.py` — vectorized blob tests
+- `delphi/tests/test_repness_unit.py` — update for new API
+- `delphi/tests/test_old_format_repness.py` — update for new API
+- `delphi/polismath/benchmarks/bench_repness.py` — update imports
+
+## Reference
+
+- Journal: `delphi/docs/CLJ-PARITY-FIXES-JOURNAL.md` — Session 11 entry,
+  PR 14 readability goal in Notes for Future Sessions
+- Plan: `delphi/docs/PLAN_DISCREPANCY_FIXES.md` — mandatory blob comparison
+  section in Testing Principles
+- Clojure source: `math/src/polismath/math/stats.clj` (prop-test, two-prop-test),
+  `math/src/polismath/math/repness.clj` (finalize-cmt-stats, repness-metric)
diff --git a/delphi/docs/PLAN_DISCREPANCY_FIXES.md b/delphi/docs/PLAN_DISCREPANCY_FIXES.md
@@ -387,11 +387,51 @@ This is non-trivial and should be one of the last fixes.
 
 ---
 
-### PR 14: Cleanup — Remove Dead Code
+### PR 14: Refactor Vectorized Code for Readability + Blob Injection Tests
+
+**MOVED EARLIER**: PR 14 is now a prerequisite for all formula fix PRs (D5-D8+),
+not a post-parity cleanup. It branches off `jc/clj-parity-d9-fix` (Stack 13),
+below all formula fixes. Reason: the vectorized production path
+(`compute_group_comment_stats_df`) is too monolithic to test against the Clojure
+blob. The refactor makes it testable AND readable.
+
+**The problem**: The scalar functions (`comment_stats`, `add_comparative_stats`,
+`repness_metric`, `finalize_cmt_stats`) read like a step-by-step recipe. The
+vectorized replacement (`compute_group_comment_stats_df`) buries the same logic
+in 150 lines of DataFrame plumbing. The scalar path is dead code in production —
+only called from tests and benchmarks.
+
+**Task**:
+1. Split `compute_group_comment_stats_df` into (a) DataFrame construction
+   (group mapping, cross-product index, joins) and (b) statistics computation
+   as its own function with clean inputs/outputs — readable AND testable.
+2. Write vectorized blob injection tests: inject Clojure group memberships +
+   votes, compare output to blob values. Tests the PRODUCTION code path.
+3. Verify scalar and vectorized paths produce identical output on all datasets.
+4. Delete scalar functions. Update tests.
+
+**Files**: `polismath/pca_kmeans_rep/repness.py`, `tests/test_discrepancy_fixes.py`,
+`tests/test_repness_unit.py`, `tests/test_old_format_repness.py`,
+`polismath/benchmarks/bench_repness.py`
+
+See `delphi/docs/HANDOFF_PR14_VECTORIZED_REFACTOR.md` for full details.
+
+After PR 14, each fix PR gets vectorized blob injection tests added in RED→GREEN
+TDD pattern. This includes D5-D8 (repness formula fixes), D10/D11 (selection),
+D15 (moderation), D12 (priorities). For D3 (k-smoother) and D1 (PCA sign flip),
+which are incremental-only features, add synthetic tests + skip markers for
+incremental blob comparison pending replay infrastructure (see Replay PRs A/B/C).
+
+**After adding vectorized tests to each PR, update the plan AND journal** to
+record what was tested, what blob fields were compared, and any discrepancies found.
+This is mandatory — the plan and journal are how future sessions know what's done.
+
+---
+
+### PR 14b: Cleanup — Remove Remaining Dead Code (after parity)
 
 **Files**: Multiple (see `08-dead-code.md`)
 - Custom kmeans chain in `clusters.py`
-- Non-vectorized repness functions in `repness.py`
 - Buggy `_compute_votes_base()` (after D12 replaces it)
 - `stats.py` inconsistencies (after D9 makes `repness.py` authoritative)
 
@@ -436,6 +476,7 @@ By this point, we should have good test coverage from all the per-discrepancy te
 | D13 | Subgroup clustering | — | — | **Deferred** (unused) |
 | D14 | Large conv optimization | — | — | **Deferred** (Python fast enough) |
 | D15 | Moderation handling | PR 12 | — | Fix |
+| Replay | Replay infrastructure (A/B/C) | — | — | NOT BUILT — D3/D1 used synthetic tests only. Needed for incremental blob comparison. |
 
 ### Non-discrepancy PRs in the stack