|
| 1 | +# Handoff: PR 14 — Make Vectorized Code Readable + Blob Injection Tests |
| 2 | + |
| 3 | +## Goal |
| 4 | + |
| 5 | +The scalar functions (`comment_stats`, `add_comparative_stats`, `repness_metric`, |
| 6 | +`finalize_cmt_stats`) read like a step-by-step recipe. Their vectorized replacement |
| 7 | +(`compute_group_comment_stats_df`) buries the same logic in 150 lines of DataFrame |
| 8 | +plumbing. The vectorized path is the ONLY production code path — the scalar functions |
| 9 | +are dead code, called only from tests and benchmarks. |
| 10 | + |
| 11 | +**PR 14 must make the vectorized code at least as readable as the scalar code, then |
| 12 | +delete the scalar functions.** This also enables vectorized blob injection tests — |
| 13 | +the only non-tautological way to verify correctness against the Clojure math blob. |
| 14 | + |
| 15 | +## Starting point |
| 16 | + |
| 17 | +Branch off `jc/clj-parity-d9-fix` (Stack 13). This is BELOW D5-D8 in the stack, |
| 18 | +so the refactor will be inherited by all formula fix PRs. |
| 19 | + |
| 20 | +## Current stack (as of 2026-03-17) |
| 21 | + |
| 22 | +``` |
| 23 | +Stack 13: jc/clj-parity-d9-fix ← branch off HERE for PR 14 |
| 24 | +Stack 14: jc/clj-parity-d5-prop-test (draft PR #2448) |
| 25 | +Stack 15: jc/clj-parity-d6-two-prop-test (draft PR #2449) |
| 26 | +Stack 16: jc/clj-parity-d7-repness-metric (draft PR #2450) |
| 27 | +Stack 17: jc/clj-parity-d8-finalize-stats (draft PR #2451) |
| 28 | +Stack 18: jc/clj-parity-d15-moderation-handling-zeros-vs-removes (PR #2452) |
| 29 | +Stack 19: jc/clj-parity-kmeans-k-divergence (k-divergence fix) |
| 30 | +Stack 20: jc/clj-parity-d10-rep-comment-selection |
| 31 | +Stack 21: jc/clj-parity-d11-consensus-comment-selection |
| 32 | +Stack 22: jc/clj-parity-d3-k-smoother-buffer |
| 33 | +Stack 23: jc/clj-parity-d12-comment-priorities |
| 34 | +Stack 24: jc/clj-parity-d1-pca-sign-flip-prevention |
| 35 | +Stack 25: jc/clj-parity-pr15-load-votes-sort |
| 36 | +``` |
| 37 | + |
| 38 | +D5-D8 are draft PRs waiting for vectorized blob tests before being marked ready. |
| 39 | + |
| 40 | +## Task sequence |
| 41 | + |
| 42 | +### 1. Refactor `compute_group_comment_stats_df` |
| 43 | + |
| 44 | +Split into two phases: |
| 45 | + |
| 46 | +**(a) DataFrame construction** (the plumbing): |
| 47 | +- Build participant→group mapping |
| 48 | +- Compute total counts per comment |
| 49 | +- Create full (group, comment) cross-product index |
| 50 | +- Join total counts, compute "other" counts |
| 51 | + |
| 52 | +**(b) Statistics computation** (the readable part): |
| 53 | +A new function with clean DataFrame inputs (columns: `na`, `nd`, `ns`, |
| 54 | +`other_agree`, `other_disagree`, `other_votes`) → outputs (columns: `pa`, `pd`, |
| 55 | +`pat`, `pdt`, `ra`, `rd`, `rat`, `rdt`, `agree_metric`, `disagree_metric`, `repful`). |
| 56 | + |
| 57 | +This function should read like the scalar recipe: |
| 58 | +```python |
| 59 | +# Probabilities with pseudocounts |
| 60 | +df['pa'] = (df['na'] + PSEUDO_COUNT/2) / (df['ns'] + PSEUDO_COUNT) |
| 61 | +# Proportion tests |
| 62 | +df['pat'] = prop_test_vectorized(df['na'], df['ns']) |
| 63 | +# Representativeness ratios |
| 64 | +df['ra'] = df['pa'] / df['other_pa'] |
| 65 | +# ...etc |
| 66 | +``` |
| 67 | + |
| 68 | +### 2. Write vectorized blob injection tests |
| 69 | + |
| 70 | +Inject Clojure group memberships + votes into the new statistics function, |
| 71 | +compare output to blob values. Test the PRODUCTION code path. |
| 72 | + |
| 73 | +For each `(group, tid)` in the blob's `repness`: |
| 74 | +- Compare `pat` (or `pdt`) to blob `p-test` |
| 75 | +- Compare `rat` (or `rdt`) to blob `repness-test` |
| 76 | +- Compare `pa` (or `pd`) to blob `p-success` |
| 77 | +- Compare `repful` to blob `repful-for` |
| 78 | + |
| 79 | +The blob only stores the winning side's values. `repful-for` tells you which |
| 80 | +side won: if `agree`, then `n-success`=na, `p-test`=pat, `repness-test`=rat. |
| 81 | +If `disagree`, then `n-success`=nd, `p-test`=pdt, `repness-test`=rdt. |
| 82 | + |
| 83 | +### 3. Verify scalar ≡ vectorized |
| 84 | + |
| 85 | +Run both paths on all datasets, compare outputs field by field. Must be |
| 86 | +identical (not just close — exact match) since they implement the same formulas. |
| 87 | + |
| 88 | +### 4. Delete scalar functions |
| 89 | + |
| 90 | +Remove: `comment_stats`, `add_comparative_stats`, `repness_metric`, |
| 91 | +`finalize_cmt_stats`, and scalar `prop_test`/`two_prop_test` if no longer |
| 92 | +needed (the vectorized versions remain). |
| 93 | + |
| 94 | +Update all tests that called the scalar API. |
| 95 | + |
| 96 | +### 5. Cascade rebase |
| 97 | + |
| 98 | +Rebase D5→D6→D7→D8→D15→k-divergence→D10→D11→D3→D12→D1→PR15 on top. |
| 99 | +Use `.claude/skills/pr-stack/rebase-stack.sh` for the cascade. |
| 100 | + |
| 101 | +### 6. Add per-PR vectorized blob injection tests |
| 102 | + |
| 103 | +For each of D5, D6, D8: insert a RED blob test commit before the fix, |
| 104 | +verify it fails, then verify the fix makes it pass. Same TDD pattern used |
| 105 | +for the scalar blob tests already in those branches. |
| 106 | + |
| 107 | +Then mark #2448-#2451 as ready (un-draft). |
| 108 | + |
| 109 | +## Key blob context |
| 110 | + |
| 111 | +- `n-trials` in the Clojure blob = `S` (total seen, INCLUDING passes), not |
| 112 | + `A+D` (agrees + disagrees). Verified: `prop_test(11, 14)` matches blob |
| 113 | + `p-test` for tid=49 group 0 in vw (A=2, D=11, S=14, A+D=13). |
| 114 | +- `group-votes[gid].votes[tid]` = `{A: agrees, D: disagrees, S: total_seen}` |
| 115 | +- Blob `repness[gid][i]` fields: `n-success`, `n-trials`, `p-success`, |
| 116 | + `p-test`, `repness` (=ra or rd), `repness-test` (=rat or rdt), |
| 117 | + `repful-for`, `tid`, `best-agree` (optional). |
| 118 | + |
| 119 | +## Files to modify |
| 120 | + |
| 121 | +- `delphi/polismath/pca_kmeans_rep/repness.py` — the refactor |
| 122 | +- `delphi/tests/test_discrepancy_fixes.py` — vectorized blob tests |
| 123 | +- `delphi/tests/test_repness_unit.py` — update for new API |
| 124 | +- `delphi/tests/test_old_format_repness.py` — update for new API |
| 125 | +- `delphi/polismath/benchmarks/bench_repness.py` — update imports |
| 126 | + |
| 127 | +## Reference |
| 128 | + |
| 129 | +- Journal: `delphi/docs/CLJ-PARITY-FIXES-JOURNAL.md` — Session 11 entry, |
| 130 | + PR 14 readability goal in Notes for Future Sessions |
| 131 | +- Plan: `delphi/docs/PLAN_DISCREPANCY_FIXES.md` — mandatory blob comparison |
| 132 | + section in Testing Principles |
| 133 | +- Clojure source: `math/src/polismath/math/stats.clj` (prop-test, two-prop-test), |
| 134 | + `math/src/polismath/math/repness.clj` (finalize-cmt-stats, repness-metric) |
0 commit comments