You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix D2c: use raw_rating_mat for vote counts and n_cmts threshold
Vote counts and n_cmts now come from raw_rating_mat (includes votes on
moderated-out comments), matching Clojure's user-vote-counts which reads
from raw-rating-mat. Previously, _compute_user_vote_counts and
_get_in_conv_participants used the filtered rating_mat, causing
participants to drop below the in-conv threshold when their voted
comments were moderated-out.
Also adds D2d monotonicity tests (T1-T5) guarding the invariant that
once a participant qualifies for in-conv, they can never be removed.
These pass for free with full recompute from raw_rating_mat; documented
that switching to delta vote processing would require persisting in-conv
to DynamoDB (see #2358).
Tests: 253 passed, 5 skipped, 36 xfailed (0 failures)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
**D2c**: ~~Vote counts and `n_cmts` from `rating_mat`~~ → **DONE**: Both use `raw_rating_mat` (includes moderated-out comments). Matches Clojure's `user-vote-counts` (conversation.clj:217-225).
108
+
**D2b**: ~~Base clusters sorted by size~~ → **DONE**: Sort by k-means ID (matches Clojure's `sort-by :id`).
108
109
109
-
**Additional changes**:
110
-
- **D2c — Vote count source**: Use `self.raw_rating_mat` instead of `self.rating_mat` for computing per-participant vote counts. Clojure's `user-vote-counts` (conversation.clj:217-225) counts votes from `raw-rating-mat`, which includes votes on moderated-out comments. Python currently uses the filtered `rating_mat`, which excludes moderated-out columns — undercounting votes and potentially dropping participants below the in-conv threshold. This is a **structural** discrepancy that occurs on every computation (cold-start or incremental), not just with delta updates — because the difference is in *which matrix* the count is derived from. Additionally, `n_cmts` differs: Clojure's `n-cmts` (conversation.clj:214-215) counts columns of `rating-mat` which still includes moderated-out columns (zeroed but present, via `zero-out-columns` in named_matrix.clj:214-228). Python's `n_cmts = len(self.rating_mat.columns)` excludes moderated-out columns entirely (removed by `_apply_moderation`, line 308). This means the `min(7, n_cmts)` threshold itself also differs. Both aspects (vote count source and `n_cmts` source) must use `raw_rating_mat` to match Clojure. (This is related to but distinct from D15 — moderation handling — which tracks the broader zero-out vs remove discrepancy.)
111
-
- Add greedy fallback (top-15 voters if <15 qualify)
112
-
- Add monotonic persistence (once in, always in)
113
-
114
-
**Test-first approach**:
115
-
1. Add test comparing in-conv participant count/set between Python and Clojure for ALL datasets
116
-
2. Add synthetic edge-case tests: tiny conversation (fewer comments than threshold), conversation where greedy fallback triggers
117
-
3.**D2c unit test — vote count source**: Synthetic test with 10 comments, 3 moderated-out. Participant P voted on all 10. Verify `_compute_user_vote_counts` returns 10 for P (from `raw_rating_mat`), not 7 (from `rating_mat`). Mark xfail before fix.
118
-
4.**D2c unit test — n_cmts threshold**: Same setup. Verify `n_cmts` used in the `min(7, n_cmts)` threshold equals 10 (all comments including moderated-out), not 7 (only non-moderated-out). This matters when a conversation has fewer than 7 non-moderated-out comments but more than 7 total: the threshold would be artificially low in Python, changing who qualifies. Mark xfail before fix.
119
-
5. Run tests → expect failure on D2c tests
120
-
4. Apply fix
121
-
5. Re-run ALL tests — cluster count may change (possibly 3→2 for biodiversity, matching Clojure)
122
-
6.**Cluster comparison test (CRITICAL)**: After fixing D2, add a test that compares Python and Clojure cluster assignments at the participant level — number of clusters AND which participants are in which cluster (using Jaccard similarity or exact match). The Python clustering calls sklearn K-means; Clojure uses a two-level approach. If clusters still don't match after the threshold fix, investigate why and potentially create an additional PR (PR 1b) to fix the remaining clustering discrepancy before moving on to repness fixes.
123
-
7. Document new baseline in journal
110
+
**Remaining (deferred)**:
111
+
- Greedy fallback (top-15 voters if <15 qualify) — not needed for current datasets
112
+
- Cluster comparison test at participant level — deferred to after repness fixes
**Related upstream issue**: [compdemocracy/polis#2358](https://github.com/compdemocracy/polis/issues/2358) — "Non-Deterministic K-Means Clustering Due to Worker Restart". That issue focuses on `group-clusterings` not being persisted. In-conv IS persisted in Clojure (see below), but we take a different — and better — approach.
**Current**: `_get_in_conv_participants()` (line 1256) recomputes from scratch using `self.rating_mat` (the filtered matrix). With the D2c fix (PR 1), it will use `self.raw_rating_mat`. Since `raw_rating_mat` contains all historical votes, monotonicity is guaranteed without explicit persistence.
157
-
158
-
**Test-first approach — tests that guard the monotonicity invariant**:
159
-
160
-
These tests must capture every scenario that would break if someone switched to delta processing without adding persistence. Each test should have a docstring explaining: "This passes today because we do full recompute. If you switch to delta vote processing, you MUST persist in-conv to DynamoDB — see issue #2358."
161
-
162
-
1.**T1 — Basic monotonicity across updates**: Participant P votes on 7 comments in batch 1 → qualifies. Batch 2 adds new comments but P doesn't vote on them. P's count stays 7. Assert P is still in-conv after batch 2.
163
-
164
-
2.**T2 — Monotonicity survives moderation-out**: P votes on 7 comments in batch 1 → qualifies. Between batches, 3 of those comments are moderated-out. Batch 2 arrives with new votes from other participants. Assert P is still in-conv (because `raw_rating_mat` still has P's 7 votes).
165
-
166
-
3.**T3 — Worker restart with moderation**: P votes on 7 comments in batch 1 → qualifies. Destroy the `Conversation` object (simulates worker death). Moderate-out 3 of P's comments. Create a new `Conversation` object from all votes (simulates worker restart with full recompute). Assert P is still in-conv. This is the key test that would FAIL under delta processing without persistence.
167
-
168
-
4.**T4 — Worker restart with no new votes**: Same as T3 but no new votes arrive after restart — just moderation happened. Rebuild `Conversation` from existing votes. Assert P is still in-conv. (Tests that recompute alone is sufficient, no "trigger" of new votes needed.)
169
-
170
-
5.**T5 — Vote on comment that is later moderated-out, then new votes arrive**: P votes on c1-c7. c1-c3 are moderated-out. New participant Q votes on c4-c10. Rebuild conversation from all votes. Assert both P (7 votes on raw matrix) and Q (7 votes on non-moderated-out comments) are in-conv.
145
+
**DONE**: `_get_in_conv_participants()` uses `self.raw_rating_mat` (D2c fix). Monotonicity is a free consequence. Code comment on the function documents the design decision and the delta-processing caveat.
171
146
172
-
6.**T6 — Greedy fallback after moderation**: Conversation has 8 comments, 14 participants who each voted on all 8, plus participant P who voted on exactly 7. All qualify. Then 5 comments are moderated-out, leaving only 3 non-moderated-out. After full recompute from `raw_rating_mat`: `n_cmts` = 8 (from `raw_rating_mat`), threshold = `min(7, 8)` = 7. P still has 7 votes → still qualifies. Verify the greedy fallback (top-15) is not erroneously triggered.
147
+
**Tests implemented** (T1-T5 in `TestD2dInConvMonotonicity`):
-**Code comment** on `_get_in_conv_participants()`: block comment explaining the full-recompute-guarantees-monotonicity design, the delta-processing caveat, and references to Clojure's persistence approach and issue #2358.
176
-
-**PR description**: dedicated "Design Decision: In-Conv Monotonicity" section explaining why we chose full recompute over persistence, with the future-proofing warning.
177
-
-**Test docstrings**: each test explains what it guards against and what would need to change under delta processing.
155
+
All test docstrings explain what would break under delta processing.
178
156
179
157
---
180
158
@@ -426,10 +404,10 @@ By this point, we should have good test coverage from all the per-discrepancy te
| D2b | Base-cluster sort order |**PR 1**|Fix (keep k-means ID order, match Clojure sort-by :id)|
431
-
| D2c | Vote count source (raw vs filtered matrix) |**PR 1**|Fix (use `raw_rating_mat` for vote counts, so votes on moderated-out comments are included)|
432
-
| D2d | In-conv monotonicity (once in, always in) |**PR 1bis**|Full recompute from `raw_rating_mat` guarantees monotonicity without persistence. 6 tests guarding the invariant for future delta-processing refactors. Ref: [#2358](https://github.com/compdemocracy/polis/issues/2358)|
407
+
| D2 | In-conv threshold |**PR 1**|**DONE** ✓|
408
+
| D2b | Base-cluster sort order |**PR 1**|**DONE** ✓|
0 commit comments