You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs(genetics): D-GEN-CHAODA-ENSEMBLE increment 1 RUN — ensemble clears the synthetic bar (AUC 0.62 -> 0.99)
Records the ndarray #220 result: the multi-method CHAODA ensemble
resolves the kernel-level blocker the #219 spike surfaced.
MEASURED (ndarray #220, same synthetic fixture as #219):
single-LFD AUC 0.6240 -> ensemble AUC 0.9906 (+0.3667, clears 0.85)
The dominant signal is the parent-child path-minority ratio (immune to
the leaf-fragmentation that defeated a naive leaf-cardinality/degree
attempt at AUC 0.621), averaged with connected-component cardinality.
Updates:
- Sequencing table: split P0 into P0a (ensemble, DONE, AUC 0.991) and
P0b (genomic probe, unblocked at kernel level but gated on real
corpora). Blocker note flipped from surfaced to resolved-at-kernel.
- Added a FOLLOW-UP block under PROBE-CHAODA-1000G with the ensemble
measurement and the honest scope (synthetic only).
- D-GEN-CHAODA-ENSEMBLE: marked INCREMENT 1 DONE (ndarray #220);
listed what remains (random-walk method; Step 3 wiring lands with
D-GEN-1+2). Noted the ~half-day actual vs ~1-week estimate.
- GENETIC_RESEARCH_VIA_STACK.md S 1.4: caveat flipped from
"NOT a working detector" to "kernel capability now demonstrated via
ensemble_anomaly_scores; genomic claim still gated on D-GEN-1+2."
Honest scope preserved throughout: synthetic smoke test proves the
ensemble approach; genomic novelty detection remains unproven until the
VCF->feature-vector pipeline exists.
https://claude.ai/code/session_01VysoWJ6vsyg3wEGc5v7T5v
Copy file name to clipboardExpand all lines: .claude/plans/genetics-probes-v1.md
+53-19Lines changed: 53 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,15 +21,20 @@
21
21
22
22
| Phase | Probe | Cost | Status | Gates |
23
23
|---|---|---|---|---|
24
-
|**P0**| PROBE-CHAODA-1000G |~3 days (after D-GEN-1+2) | ⚠ **spike RUN — AUC 0.624, BELOW bar** (ndarray #219) | The "CHAODA-as-novelty-detector" line of the entire plan |
|**P0b**| PROBE-CHAODA-1000G (genomic) |~3 days (after D-GEN-1+2) | ⏳ unblocked at kernel level; gated on real corpora | The "CHAODA-as-novelty-detector" line of the entire plan |
25
26
|**P1**| PROBE-KRAS-COUNTERFACTUAL-DET |~2 days (included in D-GEN-7) | queued | D-GEN-7 flagship dynamics-axis claim |
Copy file name to clipboardExpand all lines: docs/GENETIC_RESEARCH_VIA_STACK.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -71,7 +71,7 @@ impl ClamTree {
71
71
72
72
**The composition:** build a CLAM tree on your per-variant feature vectors; CHAODA scores every variant against the local manifold's intrinsic dimensionality. A novel variant in a region of high LFD lights up as `AnomalyScore { score → 1.0, awareness → AwarenessState::Noise }` (the `score ≥ 0.75` quartile per `clam.rs:1556`) because its position differs from the population's local manifold — *without you having to train a classifier or annotate a truth set first*. This is *unsupervised* outlier detection on the same tree your range queries walk.
73
73
74
-
> **⚠ MEASURED CAVEAT (2026-06-16, ndarray PR #219):** the *shipped*`anomaly_scores` implements **only the single-method leaf-LFD signal**, not the full multi-method CHAODA ensemble of Ishaq et al. 2021. A spike on ideal synthetic data (clean Gaussian clusters + far outliers) measured **ROC-AUC = 0.624** — well below the ≥ 0.85 bar a novelty detector needs. Leaf LFD captures *intra-leaf* geometry complexity, not *inter-leaf* isolation, so isolated outliers and dense-cluster points end up in the same score band. **As shipped today, this composition is NOT a working novel-variant detector.** Realising the claim requires porting the multi-method CHAODA ensemble (relative/component cardinality, graph neighbourhood, random-walk stationary distribution, vertex degree) — see `PROBE-CHAODA-1000G` in `.claude/plans/genetics-probes-v1.md`. The pattern match is real; the *single shipped signal* is not yet sufficient.
74
+
> **⚠→✅ MEASURED CAVEAT (2026-06-16):** the *original* `anomaly_scores` implements **only the single-method leaf-LFD signal**. A spike (ndarray PR #219) on ideal synthetic data measured **ROC-AUC = 0.624** — below the ≥ 0.85 bar — because leaf LFD captures *intra-leaf* geometry complexity, not *inter-leaf* isolation. **The multi-method ensemble has since been built** (ndarray PR #220, `ClamTree::ensemble_anomaly_scores`: parent-child path-minority ⊕ connected-component cardinality) and measured at **ROC-AUC = 0.991** on the same fixture — clearing the bar. So the *kernel* now does isolation-aware novelty detection; use `ensemble_anomaly_scores`, not the single-LFD `anomaly_scores`, for this composition. **Still gated:** this is synthetic-only proof. Genomic novelty detection (`PROBE-CHAODA-1000G` on 1000-Genomes + ClinVar) remains unproven until the VCF→feature-vector pipeline (plan D-GEN-1+2) exists. The pattern match is real and the kernel capability is now demonstrated; the genomic claim is not yet measured.
0 commit comments