Skip to content

Commit bf2dd99

Browse files
jucorclaude
andcommitted
Plan: move PR 14 earlier (prerequisite for blob tests) + add handoff doc
PR 14 (vectorized code refactor) is now a prerequisite for all formula fix PRs, not a post-parity cleanup. It branches off jc/clj-parity-d9-fix (Stack 13) to make the vectorized production path readable and testable against Clojure blob values. Remaining dead code cleanup split to PR 14b. Handoff doc at delphi/docs/HANDOFF_PR14_VECTORIZED_REFACTOR.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 798e970 commit bf2dd99

2 files changed

Lines changed: 177 additions & 2 deletions

File tree

Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
# Handoff: PR 14 — Make Vectorized Code Readable + Blob Injection Tests
2+
3+
## Goal
4+
5+
The scalar functions (`comment_stats`, `add_comparative_stats`, `repness_metric`,
6+
`finalize_cmt_stats`) read like a step-by-step recipe. Their vectorized replacement
7+
(`compute_group_comment_stats_df`) buries the same logic in 150 lines of DataFrame
8+
plumbing. The vectorized path is the ONLY production code path — the scalar functions
9+
are dead code, called only from tests and benchmarks.
10+
11+
**PR 14 must make the vectorized code at least as readable as the scalar code, then
12+
delete the scalar functions.** This also enables vectorized blob injection tests —
13+
the only non-tautological way to verify correctness against the Clojure math blob.
14+
15+
## Starting point
16+
17+
Branch off `jc/clj-parity-d9-fix` (Stack 13). This is BELOW D5-D8 in the stack,
18+
so the refactor will be inherited by all formula fix PRs.
19+
20+
## Current stack (as of 2026-03-17)
21+
22+
```
23+
Stack 13: jc/clj-parity-d9-fix ← branch off HERE for PR 14
24+
Stack 14: jc/clj-parity-d5-prop-test (draft PR #2448)
25+
Stack 15: jc/clj-parity-d6-two-prop-test (draft PR #2449)
26+
Stack 16: jc/clj-parity-d7-repness-metric (draft PR #2450)
27+
Stack 17: jc/clj-parity-d8-finalize-stats (draft PR #2451)
28+
Stack 18: jc/clj-parity-d15-moderation-handling-zeros-vs-removes (PR #2452)
29+
Stack 19: jc/clj-parity-kmeans-k-divergence (k-divergence fix)
30+
Stack 20: jc/clj-parity-d10-rep-comment-selection
31+
Stack 21: jc/clj-parity-d11-consensus-comment-selection
32+
Stack 22: jc/clj-parity-d3-k-smoother-buffer
33+
Stack 23: jc/clj-parity-d12-comment-priorities
34+
Stack 24: jc/clj-parity-d1-pca-sign-flip-prevention
35+
Stack 25: jc/clj-parity-pr15-load-votes-sort
36+
```
37+
38+
D5-D8 are draft PRs waiting for vectorized blob tests before being marked ready.
39+
40+
## Task sequence
41+
42+
### 1. Refactor `compute_group_comment_stats_df`
43+
44+
Split into two phases:
45+
46+
**(a) DataFrame construction** (the plumbing):
47+
- Build participant→group mapping
48+
- Compute total counts per comment
49+
- Create full (group, comment) cross-product index
50+
- Join total counts, compute "other" counts
51+
52+
**(b) Statistics computation** (the readable part):
53+
A new function with clean DataFrame inputs (columns: `na`, `nd`, `ns`,
54+
`other_agree`, `other_disagree`, `other_votes`) → outputs (columns: `pa`, `pd`,
55+
`pat`, `pdt`, `ra`, `rd`, `rat`, `rdt`, `agree_metric`, `disagree_metric`, `repful`).
56+
57+
This function should read like the scalar recipe:
58+
```python
59+
# Probabilities with pseudocounts
60+
df['pa'] = (df['na'] + PSEUDO_COUNT/2) / (df['ns'] + PSEUDO_COUNT)
61+
# Proportion tests
62+
df['pat'] = prop_test_vectorized(df['na'], df['ns'])
63+
# Representativeness ratios
64+
df['ra'] = df['pa'] / df['other_pa']
65+
# ...etc
66+
```
67+
68+
### 2. Write vectorized blob injection tests
69+
70+
Inject Clojure group memberships + votes into the new statistics function,
71+
compare output to blob values. Test the PRODUCTION code path.
72+
73+
For each `(group, tid)` in the blob's `repness`:
74+
- Compare `pat` (or `pdt`) to blob `p-test`
75+
- Compare `rat` (or `rdt`) to blob `repness-test`
76+
- Compare `pa` (or `pd`) to blob `p-success`
77+
- Compare `repful` to blob `repful-for`
78+
79+
The blob only stores the winning side's values. `repful-for` tells you which
80+
side won: if `agree`, then `n-success`=na, `p-test`=pat, `repness-test`=rat.
81+
If `disagree`, then `n-success`=nd, `p-test`=pdt, `repness-test`=rdt.
82+
83+
### 3. Verify scalar ≡ vectorized
84+
85+
Run both paths on all datasets, compare outputs field by field. Must be
86+
identical (not just close — exact match) since they implement the same formulas.
87+
88+
### 4. Delete scalar functions
89+
90+
Remove: `comment_stats`, `add_comparative_stats`, `repness_metric`,
91+
`finalize_cmt_stats`, and scalar `prop_test`/`two_prop_test` if no longer
92+
needed (the vectorized versions remain).
93+
94+
Update all tests that called the scalar API.
95+
96+
### 5. Cascade rebase
97+
98+
Rebase D5→D6→D7→D8→D15→k-divergence→D10→D11→D3→D12→D1→PR15 on top.
99+
Use `.claude/skills/pr-stack/rebase-stack.sh` for the cascade.
100+
101+
### 6. Add per-PR vectorized blob injection tests
102+
103+
For each of D5, D6, D8: insert a RED blob test commit before the fix,
104+
verify it fails, then verify the fix makes it pass. Same TDD pattern used
105+
for the scalar blob tests already in those branches.
106+
107+
Then mark #2448-#2451 as ready (un-draft).
108+
109+
## Key blob context
110+
111+
- `n-trials` in the Clojure blob = `S` (total seen, INCLUDING passes), not
112+
`A+D` (agrees + disagrees). Verified: `prop_test(11, 14)` matches blob
113+
`p-test` for tid=49 group 0 in vw (A=2, D=11, S=14, A+D=13).
114+
- `group-votes[gid].votes[tid]` = `{A: agrees, D: disagrees, S: total_seen}`
115+
- Blob `repness[gid][i]` fields: `n-success`, `n-trials`, `p-success`,
116+
`p-test`, `repness` (=ra or rd), `repness-test` (=rat or rdt),
117+
`repful-for`, `tid`, `best-agree` (optional).
118+
119+
## Files to modify
120+
121+
- `delphi/polismath/pca_kmeans_rep/repness.py` — the refactor
122+
- `delphi/tests/test_discrepancy_fixes.py` — vectorized blob tests
123+
- `delphi/tests/test_repness_unit.py` — update for new API
124+
- `delphi/tests/test_old_format_repness.py` — update for new API
125+
- `delphi/polismath/benchmarks/bench_repness.py` — update imports
126+
127+
## Reference
128+
129+
- Journal: `delphi/docs/CLJ-PARITY-FIXES-JOURNAL.md` — Session 11 entry,
130+
PR 14 readability goal in Notes for Future Sessions
131+
- Plan: `delphi/docs/PLAN_DISCREPANCY_FIXES.md` — mandatory blob comparison
132+
section in Testing Principles
133+
- Clojure source: `math/src/polismath/math/stats.clj` (prop-test, two-prop-test),
134+
`math/src/polismath/math/repness.clj` (finalize-cmt-stats, repness-metric)

delphi/docs/PLAN_DISCREPANCY_FIXES.md

Lines changed: 43 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -387,11 +387,51 @@ This is non-trivial and should be one of the last fixes.
387387

388388
---
389389

390-
### PR 14: Cleanup — Remove Dead Code
390+
### PR 14: Refactor Vectorized Code for Readability + Blob Injection Tests
391+
392+
**MOVED EARLIER**: PR 14 is now a prerequisite for all formula fix PRs (D5-D8+),
393+
not a post-parity cleanup. It branches off `jc/clj-parity-d9-fix` (Stack 13),
394+
below all formula fixes. Reason: the vectorized production path
395+
(`compute_group_comment_stats_df`) is too monolithic to test against the Clojure
396+
blob. The refactor makes it testable AND readable.
397+
398+
**The problem**: The scalar functions (`comment_stats`, `add_comparative_stats`,
399+
`repness_metric`, `finalize_cmt_stats`) read like a step-by-step recipe. The
400+
vectorized replacement (`compute_group_comment_stats_df`) buries the same logic
401+
in 150 lines of DataFrame plumbing. The scalar path is dead code in production —
402+
only called from tests and benchmarks.
403+
404+
**Task**:
405+
1. Split `compute_group_comment_stats_df` into (a) DataFrame construction
406+
(group mapping, cross-product index, joins) and (b) statistics computation
407+
as its own function with clean inputs/outputs — readable AND testable.
408+
2. Write vectorized blob injection tests: inject Clojure group memberships +
409+
votes, compare output to blob values. Tests the PRODUCTION code path.
410+
3. Verify scalar and vectorized paths produce identical output on all datasets.
411+
4. Delete scalar functions. Update tests.
412+
413+
**Files**: `polismath/pca_kmeans_rep/repness.py`, `tests/test_discrepancy_fixes.py`,
414+
`tests/test_repness_unit.py`, `tests/test_old_format_repness.py`,
415+
`polismath/benchmarks/bench_repness.py`
416+
417+
See `delphi/docs/HANDOFF_PR14_VECTORIZED_REFACTOR.md` for full details.
418+
419+
After PR 14, each fix PR gets vectorized blob injection tests added in RED→GREEN
420+
TDD pattern. This includes D5-D8 (repness formula fixes), D10/D11 (selection),
421+
D15 (moderation), D12 (priorities). For D3 (k-smoother) and D1 (PCA sign flip),
422+
which are incremental-only features, add synthetic tests + skip markers for
423+
incremental blob comparison pending replay infrastructure (see Replay PRs A/B/C).
424+
425+
**After adding vectorized tests to each PR, update the plan AND journal** to
426+
record what was tested, what blob fields were compared, and any discrepancies found.
427+
This is mandatory — the plan and journal are how future sessions know what's done.
428+
429+
---
430+
431+
### PR 14b: Cleanup — Remove Remaining Dead Code (after parity)
391432

392433
**Files**: Multiple (see `08-dead-code.md`)
393434
- Custom kmeans chain in `clusters.py`
394-
- Non-vectorized repness functions in `repness.py`
395435
- Buggy `_compute_votes_base()` (after D12 replaces it)
396436
- `stats.py` inconsistencies (after D9 makes `repness.py` authoritative)
397437

@@ -436,6 +476,7 @@ By this point, we should have good test coverage from all the per-discrepancy te
436476
| D13 | Subgroup clustering ||| **Deferred** (unused) |
437477
| D14 | Large conv optimization ||| **Deferred** (Python fast enough) |
438478
| D15 | Moderation handling | PR 12 || Fix |
479+
| Replay | Replay infrastructure (A/B/C) ||| NOT BUILT — D3/D1 used synthetic tests only. Needed for incremental blob comparison. |
439480

440481
### Non-discrepancy PRs in the stack
441482

0 commit comments

Comments
 (0)