RandomCoder-lab
diff --git a/‎README.md‎
Lines changed: 43 additions & 1 deletion b/‎README.md‎
Lines changed: 43 additions & 1 deletion
diff --git a/‎examples/datascience/anomaly_tutorial.omc‎
Lines changed: 123 additions & 0 deletions b/‎examples/datascience/anomaly_tutorial.omc‎
Lines changed: 123 additions & 0 deletions
@@ -10,6 +10,7 @@ OMNIcode (OMC) treats φ-math (Fibonacci attractors, resonance scoring, harmonic
 - **Bidirectional callbacks** — Python can invoke OMC functions via `py_callback("name")`, useful for `df.apply(omc_fn)` patterns
 - **Package manager** — `omc --install np` resolves through a registry, sha256-verifies, caches under `omc_modules/`
 - **Harmonic-distinctive primitives** — `harmonic_index` (sub-linear lookup by attractor neighborhood), `harmonic_sort` (by HIM score), `harmonic_partition` (Fibonacci-bucketed), all in [`examples/harmonic_collections.omc`](examples/harmonic_collections.omc)
+- **Multi-dim anomaly detection that beats IsolationForest** on structural patterns — `harmonic_anomaly` library catches credential-stuffing 10/10 vs IF's 7/10 at top-K=10 ([`examples/datascience/multidim_anomaly.omc`](examples/datascience/multidim_anomaly.omc))
 
 Single binary, two engines (tree-walk + bytecode VM with byte-identical output across 43 functional examples), no opt-in flags for any of this.
 
@@ -83,8 +84,9 @@ For the full real-world demo, run `examples/datascience/titanic.omc` — Kaggle
 - `requests.omc` — HTTP client (get, post, json, fetch_json)
 - `sqlite.omc` — embedded SQL via Python's sqlite3
 - `torch.omc` — PyTorch tensors, nn.Linear, optimizers
+- `harmonic_anomaly.omc` — multi-dim structural anomaly detection (drop-in IsolationForest replacement; wins on credential-stuffing patterns)
 
-Each one is 30-110 lines of OMC. Fork them or write your own.
+Each one is 30-110 lines of OMC. Fork them or write your own. All registered in [`registry/index.json`](registry/index.json) with sha256 verification.
 
 ### Harmonic primitives
 - `harmonic_set` — dedupe by Fibonacci attractor equivalence
@@ -108,6 +110,11 @@ Each one is 30-110 lines of OMC. Fork them or write your own.
 | [`examples/datascience/titanic.omc`](examples/datascience/titanic.omc) | Kaggle Titanic via seaborn → harmonic feature engineering → sklearn classifier |
 | [`examples/datascience/movielens_harmonic.omc`](examples/datascience/movielens_harmonic.omc) | pandas-loaded movielens → harmonic_partition → numpy stats per bucket |
 | [`examples/datascience/harmonic_ml.omc`](examples/datascience/harmonic_ml.omc) | sklearn wine + Python→OMC callback via `numpy.vectorize` |
+| [`examples/datascience/anomaly_detection.omc`](examples/datascience/anomaly_detection.omc) | Power-law anomaly detection: harmonic 4/5 vs IF 0/5 @ K=5 (alert-budget regime) |
+| [`examples/datascience/multidim_anomaly.omc`](examples/datascience/multidim_anomaly.omc) | Credential-stuffing detection: harmonic 10/10 vs IF 7/10 @ K=10 |
+| [`examples/datascience/anomaly_tutorial.omc`](examples/datascience/anomaly_tutorial.omc) | Tutorial — using `harmonic_anomaly` as drop-in IsolationForest replacement |
+| [`examples/datascience/nab_validation.omc`](examples/datascience/nab_validation.omc) | NAB benchmark: both detectors tie at 7/19 windows (naive baseline tier) |
+| [`examples/datascience/nab_time_aware.omc`](examples/datascience/nab_time_aware.omc) | Time-aware harmonic — honest negative result; needs CUSUM/seasonality to beat IF on NAB |
 
 ---
 
@@ -180,6 +187,41 @@ OMC is now usable for real-world data sizes (10k → 100k records routine). The
 
 ---
 
+## Where harmonic detection actually wins (vs scikit-learn)
+
+Real comparisons against scikit-learn's IsolationForest. Not synthetic glory — measured on real and reproducible workloads.
+
+| Workload | OMC harmonic | IsolationForest | Where it matters |
+|---|:---:|:---:|---|
+| **Power-law data, K=5** (alert-budget regime) | **4/5** | 0/5 | Top-of-queue precision: SRE oncall paging |
+| **Multi-dim credential stuffing, K=10** | **10/10** | 7/10 | Account-takeover, exfiltration, structural attacks |
+| Multi-dim K=25 | **25/25** | 17/25 | Subspace anomaly detection |
+| Multi-dim K=50 | **50/50** | 40/50 | Same as above, broader recall |
+| NAB realKnownCause (1-D time series) | 7/19 | 7/19 | Tie at naive baseline tier (SOTA needs CUSUM/HMM) |
+| Power-law K=30 (broad recall) | 5/30 | 15/30 | IF wins when you can investigate everything |
+
+The pattern: **harmonic decisively wins on multi-dim structural anomalies** (the credential-stuffing regime — values that look normal per-dim but rare in combination). Ties on simple time-series benchmarks where neither approach exploits temporal structure. Loses on broad-recall 1-D where IF's magnitude-based detection is the right tool.
+
+The harmonic_anomaly library at [`examples/lib/harmonic_anomaly.omc`](examples/lib/harmonic_anomaly.omc) packages the multi-dim detector with a clean `new` / `fit` / `top_k` API. Install it:
+
+```bash
+omnimcode-standalone --install harmonic_anomaly
+```
+
+Then in OMC:
+
+```omc
+import "harmonic_anomaly" as ha;
+h det = ha.new(["latency", "status", "endpoint", "hour"]);
+ha.set_strategy(det, 1, "discrete");   # status_code is categorical
+ha.fit(det, training_rows);
+h alerts = ha.top_k(det, all_rows, 10);
+```
+
+See [`examples/datascience/anomaly_tutorial.omc`](examples/datascience/anomaly_tutorial.omc) for the drop-in IsolationForest replacement walkthrough.
+
+---
+
 ## Status & honest limits
 
 OMC is a research artifact built around an architectural premise. What works:
 
@@ -0,0 +1,123 @@
+# =============================================================================
+# Tutorial: drop-in IsolationForest replacement using harmonic_anomaly
+# =============================================================================
+# If you've used scikit-learn's IsolationForest for production anomaly
+# detection on tabular data, this is the OMC equivalent — same input
+# shape, same API surface, but with measurable advantages on STRUCTURAL
+# anomalies (the kind credential-stuffing / account-takeover produces).
+#
+# Run:
+#   ./target/release/omnimcode-standalone examples/datascience/anomaly_tutorial.omc
+# =============================================================================
+
+import "examples/lib/harmonic_anomaly.omc" as ha;
+
+println("=== harmonic_anomaly tutorial ===");
+println("");
+
+# ---- Example 1: detect a credential-stuffing attack ---------------------
+# Synthesize 200 normal web requests + 5 credential-stuffing anomalies.
+# Each row = [latency_ms, status_code, endpoint_id, hour_of_day].
+
+h py_random = py_import("numpy.random");
+py_call(py_random, "seed", [144]);
+
+# Normal traffic: 30ms latency, mostly status 200, endpoint 0, hour 14.
+fn synth_normal() {
+    h lat = 20 + py_call(py_random, "random", []) * 40;
+    return [lat, 200, 0, 14];
+}
+
+# Credential stuffing: low latency 401s on /api/login at 3am.
+fn synth_attack() {
+    h lat = 10 + py_call(py_random, "random", []) * 10;
+    return [lat, 401, 8, 3];
+}
+
+h rows = [];
+h i = 0;
+while i < 200 { arr_push(rows, synth_normal()); i = i + 1; }
+h attack_indices = [];
+h j = 0;
+while j < 5 {
+    arr_push(attack_indices, arr_len(rows));
+    arr_push(rows, synth_attack());
+    j = j + 1;
+}
+
+println(concat_many("synthesized ", arr_len(rows),
+    " rows (200 normal + 5 attacks at indices ", attack_indices, ")"));
+
+# ---- The 3-line API: new → fit → top_k -----------------------------------
+
+h det = ha.new(["latency", "status", "endpoint", "hour"]);
+ha.set_strategy(det, 1, "discrete");   # status_code is categorical
+ha.set_strategy(det, 2, "discrete");   # endpoint_id is categorical
+ha.set_strategy(det, 3, "modulo");     # hour-of-day is small periodic
+
+ha.fit(det, rows);
+h top = ha.top_k(det, rows, 5);
+
+println("");
+println("Top 5 anomalies detected:");
+h k = 0;
+while k < arr_len(top) {
+    h idx = arr_get(top, k);
+    h row = arr_get(rows, idx);
+    h s = ha.score(det, row);
+    println(concat_many("  #", k + 1, ": idx=", idx,
+        "  row=", row,
+        "  score=", s));
+    k = k + 1;
+}
+
+# Compare with the ground truth
+fn count_hits(picks, truth_set) {
+    h hits = 0;
+    h k = 0;
+    while k < arr_len(picks) {
+        h key = concat_many("", arr_get(picks, k));
+        if dict_has(truth_set, key) == 1 { hits = hits + 1; }
+        k = k + 1;
+    }
+    return hits;
+}
+
+h truth = {};
+h ti = 0;
+while ti < arr_len(attack_indices) {
+    dict_set(truth, concat_many("", arr_get(attack_indices, ti)), 1);
+    ti = ti + 1;
+}
+h hits = count_hits(top, truth);
+println(concat_many("Recall: ", hits, "/", arr_len(attack_indices),
+    " attacks caught in top-5"));
+println("");
+
+# ---- Example 2: one-shot detection via ha.detect(...) -------------------
+
+println("=== One-shot detection (ha.detect) ===");
+
+# Same data, simpler API: ha.detect(dim_names, rows, k) returns top-K.
+# Useful for one-off analyses.
+h top2 = ha.detect(["latency", "status", "endpoint", "hour"], rows, 5);
+h hits2 = count_hits(top2, truth);
+println(concat_many("ha.detect top-5 recall: ", hits2, "/",
+    arr_len(attack_indices)));
+
+println("");
+println("=== When to use harmonic_anomaly vs IsolationForest ===");
+println("");
+println("Use harmonic_anomaly when:");
+println("  - Multi-dim tabular data (3+ columns)");
+println("  - Anomalies are STRUCTURAL (rare combinations of normal values)");
+println("  - You want the top picks to be high-precision (alert fatigue)");
+println("  - You don't have labeled training data");
+println("  - Deterministic results matter (no random_state to set)");
+println("");
+println("Stick with IsolationForest when:");
+println("  - 1-D continuous time series (NAB benchmark style)");
+println("  - You can afford to investigate every flagged value (high K)");
+println("  - You need to tune via contamination / n_estimators");
+println("");
+println("=== Done ===");