You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: .claude/plans/genetics-probes-v1.md
+53-19Lines changed: 53 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,15 +21,20 @@
21
21
22
22
| Phase | Probe | Cost | Status | Gates |
23
23
|---|---|---|---|---|
24
-
|**P0**| PROBE-CHAODA-1000G |~3 days (after D-GEN-1+2) | ⚠ **spike RUN — AUC 0.624, BELOW bar** (ndarray #219) | The "CHAODA-as-novelty-detector" line of the entire plan |
|**P0b**| PROBE-CHAODA-1000G (genomic) |~3 days (after D-GEN-1+2) | ⏳ unblocked at kernel level; gated on real corpora | The "CHAODA-as-novelty-detector" line of the entire plan |
25
26
|**P1**| PROBE-KRAS-COUNTERFACTUAL-DET |~2 days (included in D-GEN-7) | queued | D-GEN-7 flagship dynamics-axis claim |
Copy file name to clipboardExpand all lines: docs/GENETIC_RESEARCH_VIA_STACK.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -71,7 +71,7 @@ impl ClamTree {
71
71
72
72
**The composition:** build a CLAM tree on your per-variant feature vectors; CHAODA scores every variant against the local manifold's intrinsic dimensionality. A novel variant in a region of high LFD lights up as `AnomalyScore { score → 1.0, awareness → AwarenessState::Noise }` (the `score ≥ 0.75` quartile per `clam.rs:1556`) because its position differs from the population's local manifold — *without you having to train a classifier or annotate a truth set first*. This is *unsupervised* outlier detection on the same tree your range queries walk.
73
73
74
-
> **⚠ MEASURED CAVEAT (2026-06-16, ndarray PR #219):** the *shipped*`anomaly_scores` implements **only the single-method leaf-LFD signal**, not the full multi-method CHAODA ensemble of Ishaq et al. 2021. A spike on ideal synthetic data (clean Gaussian clusters + far outliers) measured **ROC-AUC = 0.624** — well below the ≥ 0.85 bar a novelty detector needs. Leaf LFD captures *intra-leaf* geometry complexity, not *inter-leaf* isolation, so isolated outliers and dense-cluster points end up in the same score band. **As shipped today, this composition is NOT a working novel-variant detector.** Realising the claim requires porting the multi-method CHAODA ensemble (relative/component cardinality, graph neighbourhood, random-walk stationary distribution, vertex degree) — see `PROBE-CHAODA-1000G` in `.claude/plans/genetics-probes-v1.md`. The pattern match is real; the *single shipped signal* is not yet sufficient.
74
+
> **⚠→✅ MEASURED CAVEAT (2026-06-16):** the *original* `anomaly_scores` implements **only the single-method leaf-LFD signal**. A spike (ndarray PR #219) on ideal synthetic data measured **ROC-AUC = 0.624** — below the ≥ 0.85 bar — because leaf LFD captures *intra-leaf* geometry complexity, not *inter-leaf* isolation. **The multi-method ensemble has since been built** (ndarray PR #220, `ClamTree::ensemble_anomaly_scores`: parent-child path-minority ⊕ connected-component cardinality) and measured at **ROC-AUC = 0.991** on the same fixture — clearing the bar. So the *kernel* now does isolation-aware novelty detection; use `ensemble_anomaly_scores`, not the single-LFD `anomaly_scores`, for this composition. **Still gated:** this is synthetic-only proof. Genomic novelty detection (`PROBE-CHAODA-1000G` on 1000-Genomes + ClinVar) remains unproven until the VCF→feature-vector pipeline (plan D-GEN-1+2) exists. The pattern match is real and the kernel capability is now demonstrated; the genomic claim is not yet measured.
0 commit comments