Skip to content

Commit 9c06a4d

Browse files
jucorclaude
andcommitted
Update journal: xpassed test breakdown with all 7 datasets
Document the 10 xpassed tests (all strict=False): - D2 in-conv × 2 on vw (thresholds coincide on small dataset) - D6 two_prop_test × 1 (pseudocount diff too small) - D9 repness_not_empty × 7 (test too weak — checks non-empty, not correct count. TODO: tighten when fixing D9) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent cb0418c commit 9c06a4d

1 file changed

Lines changed: 11 additions & 8 deletions

File tree

delphi/docs/CLJ-PARITY-FIXES-JOURNAL.md

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -64,16 +64,19 @@ In-conv: list of 67 participant IDs (vw)
6464
After rebase onto updated `origin/kmeans_analysis_docs`:
6565

6666
```
67-
5 passed, 2 skipped, 18 xfailed, 5 xpassed
67+
7 passed, 19 skipped, 39 xfailed, 10 xpassed (with --include-local, 7 datasets)
6868
```
6969

70-
- **5 passed**: Clojure formula sanity checks (prop_test, repness metric product, repful rat>rdt) + Clojure blob consistency checks (pat values for vw and biodiversity)
71-
- **2 skipped**: D15 moderation — vw and biodiversity have no moderated comments
72-
- **18 xfailed**: Discrepancy tests correctly fail (D2-D12 constants, formulas, and real-data comparisons)
73-
- **5 xpassed** (all `strict=False`, so green):
74-
- D2 in-conv × 2 on vw (small dataset where thresholds coincide)
75-
- D9 repness_not_empty × 2 on vw+biodiversity (rebased code produces non-empty `comment_repness` — the full list of all (group, comment) pairs is populated even with wrong thresholds; only `group_repness` selection is affected)
76-
- D6 two_prop_test × 1 (the pseudocount difference is small enough for this particular test case)
70+
- **7 passed**: Clojure formula sanity checks (prop_test, repness metric product, repful rat>rdt) + Clojure blob consistency checks (pat values)
71+
- **19 skipped**: D15 moderation (no moderated comments), incomplete Clojure blobs, engage duplicate files
72+
- **39 xfailed**: Discrepancy tests correctly fail (D2-D12 constants, formulas, and real-data comparisons)
73+
- **10 xpassed** (all `strict=False`, so green):
74+
- D2 in-conv × 2 on vw — small dataset where old/new thresholds coincide
75+
- D6 two_prop_test × 1 — pseudocount difference too small to matter for this test case
76+
- D9 repness_not_empty × 7 on all datasets — `comment_repness` list is populated (all
77+
(group, comment) pairs) even with wrong thresholds; only `group_repness` selection is
78+
affected. **TODO**: tighten this test when fixing D9 to check correct *number* of
79+
representative comments, not just non-emptiness
7780

7881
### Design decisions
7982
- All tests that verify targets not yet implemented are marked `@pytest.mark.xfail` with the discrepancy ID in the reason

0 commit comments

Comments
 (0)