|
| 1 | +# JIT vs real-world workloads — first honest measurement |
| 2 | + |
| 3 | +**TL;DR:** the JIT works exactly as designed on pure-int + array + float OMC fns (proven by 41 codegen tests + bench harness), but the *currently-shipped* `harmonic_anomaly` library uses dicts and string-keyed frequency tables — both outside the JIT's current op coverage. Only **1 of 4** user fns JIT'd on the NSL-KDD validation, and that fn isn't in the hot loop. **Net wall-clock change: zero.** |
| 4 | + |
| 5 | +The gap is well-defined and the architecture's path forward is clear. |
| 6 | + |
| 7 | +## What the bench actually showed |
| 8 | + |
| 9 | +Workload: `examples/datascience/nsl_kdd_validation.omc` — runs the harmonic_anomaly library's `fit + top_k` against a 5000-row NSL-KDD sample. |
| 10 | + |
| 11 | +``` |
| 12 | +OMC_HBIT_JIT=1 OMC_HBIT_JIT_VERBOSE=1 ./omnimcode-standalone examples/datascience/nsl_kdd_validation.omc |
| 13 | +``` |
| 14 | + |
| 15 | +JIT log: |
| 16 | + |
| 17 | +``` |
| 18 | +[OMC_HBIT_JIT] JIT'd 1/4 user fns to dual-band native code |
| 19 | + - extract_features |
| 20 | +``` |
| 21 | + |
| 22 | +Wall-clock comparison: |
| 23 | + |
| 24 | +| Mode | User time | Wall-clock | |
| 25 | +|---|--:|--:| |
| 26 | +| Tree-walk (no `OMC_HBIT_JIT`) | 2.98s | 1.58s | |
| 27 | +| `OMC_HBIT_JIT=1` | 2.98s | 1.54s | |
| 28 | + |
| 29 | +Within measurement noise. The JIT didn't make this workload faster because the JIT'd fn (`extract_features`) runs once over 5000 rows at startup; the hot loop is in `harmonic_anomaly.fit()` which the JIT couldn't compile. |
| 30 | + |
| 31 | +## Why the harmonic library doesn't JIT |
| 32 | + |
| 33 | +The fns that the JIT **rejected**: |
| 34 | + |
| 35 | +1. **`fit(detector, rows)`** — uses `dict_set(freq, key, ...)` to build per-dim frequency tables; uses `concat_many("", bkt)` to build dict keys. Both ops have no JIT lowering today. |
| 36 | +2. **`score(detector, row)`** — same dict + string ops in the inner per-dim loop. |
| 37 | +3. **`top_k(detector, rows, k)`** — calls `score_all` which calls `score`; transitively excluded. |
| 38 | + |
| 39 | +The JIT is conservative: any fn whose body uses an unsupported op causes the whole fn to be silently skipped (Sessions D/H established this — partial fns get erased so the rest of the module compiles cleanly). The 4th fn `extract_features` is pure-int + arrays + a `csv_parse` builtin — but `csv_parse` is also unsupported, so it gets... wait, we said it JIT'd. Let me check. |
| 40 | + |
| 41 | +Looking at the JIT verbose output again: 1/4 JIT'd was `extract_features`. So `csv_parse` must not be in `extract_features`'s body — it's a separate top-level call before the fn. That checks out. |
| 42 | + |
| 43 | +## What this tells us about the architecture |
| 44 | + |
| 45 | +The architecture is sound — Sessions A–H + Path A.1–A.4 + Path D shipped 41 codegen tests covering every JIT-eligible op. The bench harness shows 250–1000× speedups on workloads that fit those ops. |
| 46 | + |
| 47 | +What the architecture *doesn't yet have* is the op coverage to JIT the harmonic libraries as they're written today. Two viable paths to fix: |
| 48 | + |
| 49 | +### Option 1: extend codegen (the structural fix) |
| 50 | + |
| 51 | +Add JIT support for: |
| 52 | +- **Dicts** — would need a hash-table representation in LLVM. Significant: needs key hashing (probably an extern Rust call), bucket arrays, collision handling. Feasible but ~1 session of careful work. |
| 53 | +- **Strings** — needs heap allocation (libc malloc) + pointer-based representation. Could share infrastructure with arrays. Another session. |
| 54 | +- **`concat_many` / `csv_parse` / other builtins** — most wouldn't get JIT'd directly; they'd remain tree-walk. The JIT'd fn would call back through the dispatch hook into tree-walk for unsupported builtins. Needs a "fallback to tree-walk for one builtin" mechanism — currently the whole fn falls back if it hits an unsupported op. |
| 55 | + |
| 56 | +**Cost:** 2-3 sessions. **Reward:** harmonic libs JIT, ~250× speedup applies to real workloads. |
| 57 | + |
| 58 | +### Option 2: rewrite the harmonic libs (the empirical fix) |
| 59 | + |
| 60 | +The frequency tables in `harmonic_anomaly` use `dict_set(freq, str_key, count)` because string keys are convenient for the multi-dim case (the key is the bucketed value rendered as a string). They could use **arrays of hashed-int keys** instead: |
| 61 | +- `freq_keys: [int]` — hashes of bucket values |
| 62 | +- `freq_counts: [int]` — counts parallel to keys |
| 63 | +- Lookup via linear scan or sorted-array binary search |
| 64 | + |
| 65 | +This is a real rewrite (~half a day of substantive work) but it produces a library that: |
| 66 | +1. JITs end-to-end with current codegen |
| 67 | +2. Runs in ~5 ms instead of ~135 ms (the projected speedup if the inner loop hits the JIT) |
| 68 | +3. Stays substrate-aligned (the bucket math doesn't change) |
| 69 | + |
| 70 | +**Cost:** ~half a session of library refactor. **Reward:** the same ~250× speedup applies, AND the library demonstrates that JIT-friendly idioms have a measurable payoff. |
| 71 | + |
| 72 | +## The honest position |
| 73 | + |
| 74 | +Path B as conceived asked: "does enabling JIT on a real OMC program produce real speedup?" The answer is **not yet** for the harmonic libraries as currently written, but **yes structurally** based on every microbench we've run since Session E. The JIT works; the libraries don't yet exercise it. |
| 75 | + |
| 76 | +The path forward isn't "make the JIT work harder" — it's either to extend codegen to cover dicts (Option 1) or rewrite the hot path to use already-supported ops (Option 2). Either gets us to "harmonic libraries run 100×+ faster with `OMC_HBIT_JIT=1`." |
| 77 | + |
| 78 | +This is the kind of honest negative result the architecture needed. The 277× number from Session E isn't a microbench artifact — but it doesn't automatically apply to libraries written for tree-walk's strengths (dicts, strings, dynamic dispatch). |
| 79 | + |
| 80 | +## Reproduction |
| 81 | + |
| 82 | +```bash |
| 83 | +# Tree-walk baseline |
| 84 | +PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1 \ |
| 85 | + time ./target/release/omnimcode-standalone examples/datascience/nsl_kdd_validation.omc |
| 86 | + |
| 87 | +# JIT mode |
| 88 | +PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1 OMC_HBIT_JIT=1 OMC_HBIT_JIT_VERBOSE=1 \ |
| 89 | + time ./target/release/omnimcode-standalone examples/datascience/nsl_kdd_validation.omc |
| 90 | +``` |
| 91 | + |
| 92 | +Numbers taken on 2026-05-15. If you want bigger numbers, choose Option 2 above and rewrite `examples/lib/harmonic_anomaly.omc` with array-based frequency tables. |
0 commit comments