Skip to content

Commit 4c2d5a7

Browse files
jucorclaude
andcommitted
Journal: add session 3 findings (D2c vote count source, D2d monotonicity)
- D2c: structural discrepancy in vote counts and n_cmts — both must use raw_rating_mat to match Clojure's zero-out-columns behavior - D2d: full recompute guarantees monotonicity without persistence, unlike Clojure which persists because it uses delta processing - Corrected earlier "not needed" note about raw_rating_mat - Updated What's Next: D2c → D2d → D4 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 9a45392 commit 4c2d5a7

1 file changed

Lines changed: 72 additions & 4 deletions

File tree

delphi/docs/CLJ-PARITY-FIXES-JOURNAL.md

Lines changed: 72 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -142,9 +142,41 @@ After rebase onto updated `origin/kmeans_analysis_docs`:
142142
`generate_cold_start_clojure.py` with the prodclone DB and Clojure math service.
143143
This is delegated to a separate session.
144144

145-
### What was NOT needed
146-
- `raw_rating_mat` vs `rating_mat` — not needed, the existing vote counting works
147-
- Greedy fallback / monotonic persistence — not needed for parity (cold-start only)
145+
### What was NOT needed (revised — see D2c/D2d below)
146+
- ~~`raw_rating_mat` vs `rating_mat` — not needed~~ **WRONG**: See D2c below, this IS needed.
147+
- Greedy fallback / monotonic persistence — not needed for cold-start parity, but monotonicity tests needed for future-proofing (see D2d).
148+
149+
### D2c: Vote count source — raw vs filtered matrix (discovered in session 3)
150+
151+
Deep investigation of Clojure vs Python revealed a **structural discrepancy** in how
152+
votes are counted for the in-conv threshold, independent of delta vs full processing:
153+
154+
| Aspect | Clojure | Python (current) |
155+
|--------|---------|-----------------|
156+
| Vote counts per participant | From `raw-rating-mat` (conversation.clj:217-225) — includes votes on moderated-out comments | From `self.rating_mat` (conversation.py:1226/1244) — excludes moderated-out columns |
157+
| `n_cmts` (threshold cap) | From `rating-mat` (conversation.clj:214-215) — columns zeroed but still present, so count includes moderated-out | From `self.rating_mat` (conversation.py:1268) — moderated-out columns removed entirely |
158+
159+
Key insight: Clojure's `zero-out-columns` (named_matrix.clj:214-228) sets moderated-out
160+
column values to 0 but **keeps the columns** in `rating-mat`. Python's `_apply_moderation`
161+
(conversation.py:308) **removes columns entirely**. This means both `user-vote-counts`
162+
and `n-cmts` differ between implementations.
163+
164+
**Fix**: Both `_compute_user_vote_counts` and `n_cmts` must use `self.raw_rating_mat`.
165+
Two xfail unit tests planned (vote count source + n_cmts threshold).
166+
167+
### D2d: In-conv monotonicity — full recompute vs persistence (discovered in session 3)
168+
169+
Investigated whether Python needs to persist in-conv to DynamoDB (like Clojure persists to
170+
`math_main`). Finding: **no, full recompute is better**.
171+
172+
Clojure persists in-conv (conv_man.clj:55) because it uses delta vote processing. Python
173+
rebuilds `raw_rating_mat` from all votes every time. Since votes are immutable in PostgreSQL
174+
(can be updated, never deleted), a participant's vote count never decreases → monotonicity
175+
is a free consequence of full recompute from `raw_rating_mat`.
176+
177+
**Decision**: No DynamoDB persistence. Instead, 6 tests (T1-T6) guard the monotonicity
178+
invariant, each documenting that switching to delta processing would require adding
179+
persistence. See plan PR 1bis for test details. Ref: compdemocracy/polis#2358.
148180

149181
### D2b: Base-cluster sort order (added from Copilot review)
150182

@@ -170,7 +202,23 @@ matching Clojure's `sort-by :id`.
170202
- D9 repness_not_empty × 6 — test too weak (checks non-empty, not correct count)
171203
- **4 errors**: engage dataset has duplicate vote files (pre-existing data issue)
172204

173-
### What's Next: PR 2 — Fix D4 (Pseudocount)
205+
### Golden snapshot re-recording (BLOCKED)
206+
207+
After D2b fix, regression tests fail on all datasets except vw (as expected — sort order
208+
change cascades to different cluster assignments). biodiversity was re-recorded and verified.
209+
Private datasets (FLI, bg2018, bg2050, pakistan) need re-recording but the recorder crashes
210+
on `engage` (pre-existing duplicate vote files) before reaching the others. Need to record
211+
them individually, but **blocked on fixes to PRs earlier in the stack** (#2393, #2397).
212+
Will re-record after those are resolved and rebased.
213+
214+
### What's Next
215+
216+
1. **PR 1 (D2c)**: Implement the vote count source fix — switch `_compute_user_vote_counts`
217+
and `n_cmts` to use `self.raw_rating_mat`. Write xfail tests first, then fix, then verify.
218+
2. **PR 1bis (D2d)**: Write the 6 monotonicity tests (T1-T6). These should pass immediately
219+
after D2c is fixed (full recompute + raw_rating_mat = monotonicity for free). Add the
220+
code comment block and PR description documenting the design decision.
221+
3. **PR 2 — Fix D4 (Pseudocount)**: Next in pipeline order after D2 is fully resolved.
174222

175223
---
176224

@@ -214,6 +262,26 @@ matching Clojure's `sort-by :id`.
214262
- Discovered 3 private datasets have incomplete Clojure blobs (4 keys instead of 23) — delegated regeneration to separate session
215263
- Committed and pushed D2 fix (`df2d013ec`)
216264

265+
### Session 3 (2026-03-04)
266+
267+
- Deep investigation of in-conv vote counting: compared Clojure `user-vote-counts`
268+
(conversation.clj:217-225, uses `raw-rating-mat`) vs Python `_compute_user_vote_counts`
269+
(conversation.py:1226, uses `self.rating_mat`)
270+
- Discovered D2c: structural discrepancy in both vote count source AND `n_cmts` — Clojure's
271+
`zero-out-columns` keeps moderated-out columns (zeroed), Python's `_apply_moderation`
272+
removes them entirely. Both vote counts and threshold cap differ.
273+
- Investigated whether to persist in-conv to DynamoDB (like Clojure does to `math_main`).
274+
Found: Clojure persists because it uses delta vote processing. Python does full recompute
275+
→ monotonicity is free. No persistence needed.
276+
- Verified Clojure persistence path: `prep-main` (conv_man.clj:55) writes `:in-conv`,
277+
`restructure-json-conv` (conv_man.clj:182) restores it on restart.
278+
- Updated plan: added D2c (vote count source, 2 xfail tests) and D2d (monotonicity, 6 tests
279+
guarding against future delta-processing refactor). Ref: compdemocracy/polis#2358.
280+
- Corrected earlier journal entry that said `raw_rating_mat` was "not needed" — it IS needed.
281+
- Added terminology rule to CLAUDE.local.md: always say "moderated-out" or "moderated-in",
282+
never just "moderated".
283+
- Committed plan update (`f2bf77c38`)
284+
217285
---
218286

219287
## TDD Discipline

0 commit comments

Comments
 (0)