Skip to content

Commit ddb4e01

Browse files
jucorclaude
andcommitted
Journal: add session 6 (D4 fix), update plan marking D4 done
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 93542b2 commit ddb4e01

2 files changed

Lines changed: 48 additions & 5 deletions

File tree

delphi/docs/CLJ-PARITY-FIXES-JOURNAL.md

Lines changed: 46 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -343,8 +343,52 @@ Will re-record after those are resolved and rebased.
343343

344344
### What's Next
345345

346-
1. **PR 2 — Fix D4 (Pseudocount)**: `PSEUDO_COUNT = 1.5``2.0` to match Clojure's Beta(2,2) prior.
347-
2. **PR 3 — Fix D9 (Z-score thresholds)**: Switch from two-tailed to one-tailed z-scores.
346+
1. **PR 3 — Fix D9 (Z-score thresholds)**: Switch from two-tailed to one-tailed z-scores.
347+
2. Regression test performance investigation (see `HANDOFF_REGRESSION_TEST_PERF.md`)
348+
349+
---
350+
351+
## PR 2: Fix D4 — Pseudocount Formula
352+
353+
### TDD steps
354+
1. **Baseline**: 25 passed, 3 skipped, 28 xfailed (discrepancy tests)
355+
2. **Red**: Removed xfail from 3 D4 tests → 6 failures (constant check, pa values × 4 datasets, synthetic)
356+
3. **Fix**: `PSEUDO_COUNT = 1.5``2.0` in `repness.py`
357+
4. **Green**: All 6 D4 tests pass
358+
5. **Full suite**: 258 passed, 3 skipped, 30 xfailed, 0 failures (public datasets)
359+
6. **Private datasets**: 60 passed, 6 skipped, 53 xfailed (discrepancy tests with --include-local)
360+
7. Re-recorded golden snapshots for all 7 datasets
361+
362+
### Changes
363+
- `repness.py`: `PSEUDO_COUNT = 2.0`, updated comment to reference Beta(2,2) prior
364+
- `test_discrepancy_fixes.py`: removed xfail from 3 D4 tests
365+
- `test_repness_unit.py`, `test_old_format_repness.py`: import `PSEUDO_COUNT` instead of hardcoding 1.5
366+
- `simplified_repness_test.py`: updated hardcoded constant
367+
368+
### Side finding: regression test performance
369+
Large private datasets take 1-5 minutes per regression test due to:
370+
- Benchmark mode running pipeline 3× (n_runs=3)
371+
- O(participants × comments) per-participant loop in `_compute_participant_info_optimized`
372+
- Intermediate stages computed redundantly
373+
374+
Fitted model: `t ≈ 1.66 + 3.87e-5 × votes + 9.90e-7 × (ptpts × cmts)` (R²=0.9995).
375+
Detailed analysis in `HANDOFF_REGRESSION_TEST_PERF.md` for a future session.
376+
377+
### Session 6 (2026-03-11)
378+
379+
- Created branch `jc/clj-parity-d4-fix` stacked on `jc/clj-parity-d2-fix`
380+
- D4 fix (TDD): red (6 tests failed) → fix PSEUDO_COUNT → green (all 6 pass)
381+
- Fixed 2 unit tests that hardcoded old pseudocount value (used import instead)
382+
- Re-recorded golden snapshots for all 7 datasets (public + private)
383+
- Investigated regression test performance (engage 317s, pakistan 179s) — confirmed
384+
consistent with O(votes + ptpts×cmts) complexity, not a regression from D4
385+
- Created `HANDOFF_REGRESSION_TEST_PERF.md` for future optimization work
386+
- Pushed branch, created PR
387+
388+
### What's Next
389+
390+
1. **PR 3 — Fix D9 (Z-score thresholds)**: `Z_90=1.645``1.2816`, `Z_95=1.96``1.6449`
391+
2. Regression test performance optimization (separate session)
348392

349393
---
350394

delphi/docs/PLAN_DISCREPANCY_FIXES.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -162,8 +162,7 @@ All test docstrings explain what would break under delta processing.
162162

163163
**File**: `delphi/polismath/pca_kmeans_rep/repness.py`
164164

165-
**Current**: `PSEUDO_COUNT = 1.5``pa = (na + 0.75) / (ns + 1.5)` (Beta(1.75,1.75) prior)
166-
**Target**: `PSEUDO_COUNT = 2.0``pa = (na + 1) / (ns + 2)` (Beta(2,2) prior, matching Clojure)
165+
**Current**: ~~`PSEUDO_COUNT = 1.5`~~**DONE**: `PSEUDO_COUNT = 2.0``pa = (na + 1) / (ns + 2)` (Beta(2,2) prior, matching Clojure)
167166

168167
**Test-first approach**:
169168
1. Add unit test: for known (na, ns) pairs from Clojure math blob repness, verify `pa` values match
@@ -409,7 +408,7 @@ By this point, we should have good test coverage from all the per-discrepancy te
409408
| D2c | Vote count source (raw vs filtered matrix) | **PR 1** | **DONE**|
410409
| D2d | In-conv monotonicity (once in, always in) | **PR 1** | **DONE** ✓ (5 guard tests, T1-T5) |
411410
| D3 | K-smoother buffer | PR 10 | Fix |
412-
| D4 | Pseudocount formula | **PR 2** | Fix |
411+
| D4 | Pseudocount formula | **PR 2** | **DONE** |
413412
| D5 | Proportion test | PR 4 | Fix |
414413
| D6 | Two-proportion test | PR 5 | Fix |
415414
| D7 | Repness metric | PR 6 | Fix (with flag for old formula) |

0 commit comments

Comments
 (0)