Skip to content

Commit 8567aff

Browse files
L1: harmonic_anomaly + import-inlining + cascade-cleanup so score JITs
Six interlocking changes that close the L1 gap from Path B: "the harmonic libraries don't JIT because they use dicts and strings." 1. harmonic_anomaly rewritten to use array-of-int representations instead of dict-of-string-keys. Detector is now an array indexed by constants (DET_N_DIMS=0, DET_STRATEGIES=1, ...). Strategies are int codes (0=log, 1=modulo, 2=discrete) — public API still accepts strings. Per-dim freq tables are parallel arrays of (key, count). Hot path (score) is dict-free, string-free. 2. Interpreter::inline_imports public API. Walks Statement::Import recursively, parses the imported file, applies alias prefix to fn defs, returns the flattened AST. Uses the same rewrite_module _calls helper as the runtime import path so intra-module calls stay correctly aliased. 3. CLI's maybe_register_jit calls inline_imports BEFORE compile_program. Without this, the bytecode compiler sees Statement::Import as a no-op and the JIT only sees the top-level user fns (Path B's "1/4 fns JIT'd" finding). 4. omnimcode-codegen jit_module: dependency-cleanup fixpoint pass. Previously, when a fn body failed to lower mid-emission, we left a "broken stub returns 0" body in place. That caused silent wrong-results in callers. Now: failed fns are deleted entirely; any fn that called a deleted fn cascades to also-deleted; iterate to fixpoint. Honest: only ship fns whose full dep graph compiled. 5. Two new substrate intrinsics callable from JIT'd code: - omc_log_phi_pi_fibonacci(arg_bits) -> i64 (float-bit-pattern) - omc_fold(value) -> i64 Both pre-declared in JitContext::new with global mappings. Op::Fold1 now lowers via omc_fold; Op::Call("log_phi_pi_fibonacci") intercepted as intrinsic. Without these, _bucket_log couldn't JIT and the cleanup pass would cascade-delete the entire harmonic library. 6. Compiler infer_type tagged log_phi_pi_fibonacci as float-returning so `logv * 50.0` emits Op::MulFloat. Bumped harmonic_anomaly's bucket fn to use 50.0 (not 50) so the multiplication is provably float-typed. What works after this: - ha.score JITs; small test ha.score(det, [1, 2]) returns 0.268 (correct, matches tree-walk) - Tests 1-5 of harmonic_libs (4 anomaly + 1 clustering) pass under OMC_HBIT_JIT=1 - 15/53 user fns JIT in the NSL-KDD program (vs 1/4 before) What does NOT yet work (added as L1.5 follow-up task): - JIT execution of the full NSL-KDD program is FLAKY (1/5 runs succeed; 4/5 segfault before producing output). The runs that succeed produce the correct anomaly numbers. The flakiness is almost certainly an MCJIT memory-protection or cross-fn-call lifetime issue, not an algorithmic correctness bug. Investigate in its own session. - Tests 6+ in harmonic_libs (clustering + recommend) segfault under JIT — those libs still use dict-based representations. Workspace: 41 codegen tests pass, 149 core unit tests pass. The harmonic_lib test suite passes 18/18 in tree-walk; tests 1-5 pass under JIT before the clustering segfault. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1 parent 0fe4f76 commit 8567aff

6 files changed

Lines changed: 487 additions & 77 deletions

File tree

examples/lib/harmonic_anomaly.omc

Lines changed: 134 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -27,117 +27,198 @@
2727
# ha.fit(det, [[15, 200, 0, 14], [12, 200, 1, 15], ...]);
2828
# h scores = ha.score_all(det, rows);
2929
# h top = ha.top_k(det, rows, 10);
30+
#
31+
# L1 rewrite (Path L1, 2026-05-15): the previous implementation used
32+
# dict-of-string-keys for per-dim frequency tables AND a dict for the
33+
# detector struct itself AND string-tagged strategies. None of those
34+
# JIT today. This rewrite makes the hot path (score) entirely JIT-
35+
# eligible by:
36+
#
37+
# 1. Detector is now an array indexed by constants:
38+
# DET_N_DIMS=0, DET_STRATEGIES=1 (array of int codes),
39+
# DET_FREQ_KEYS=2 (array of arrays), DET_FREQ_COUNTS=3, DET_N=4
40+
# Read via arr_get(detector, INDEX) instead of dict_get(d, "key").
41+
# 2. Strategies are int codes (0=log, 1=modulo, 2=discrete). The
42+
# public API (set_strategy / new) still accepts strings; we
43+
# convert internally.
44+
# 3. Per-dim frequency tables are parallel arrays of int keys + int
45+
# counts (no dict, no string concat).
46+
#
47+
# fit() (cold path) keeps a small dict for dim_names mapping and uses
48+
# arr_push for dynamic growth. score() (hot path) is dict-free,
49+
# string-free, and JIT-eligible end-to-end.
3050
# =============================================================================
3151

3252
import "examples/lib/np.omc" as np;
3353

54+
# Detector array indices (constants).
55+
fn _DET_N_DIMS() { return 0; }
56+
fn _DET_STRATEGIES() { return 1; }
57+
fn _DET_FREQ_KEYS() { return 2; }
58+
fn _DET_FREQ_COUNTS() { return 3; }
59+
fn _DET_N() { return 4; }
60+
61+
# Strategy codes (constants).
62+
fn _STRAT_LOG() { return 0; }
63+
fn _STRAT_MODULO() { return 1; }
64+
fn _STRAT_DISCRETE() { return 2; }
65+
3466
# ---- Bucketing per dim ---------------------------------------------------
3567
# Three bucketing strategies depending on dim type:
36-
# "discrete" — value IS the bucket (status codes, endpoint IDs)
37-
# "log" — fold(log_phi_pi_fibonacci(v) * 50) — substrate-routed
38-
# log bucketing for ranges spanning multiple magnitudes
39-
# (latency, bytes). Uses OMC's canonical φ-π-fibonacci
40-
# substrate so buckets align with the attractor lattice
41-
# rather than arbitrary base-10 decades.
42-
# "modulo" — fold(v) — for periodic small ints (hour-of-day)
68+
# 0 = "log" — fold(log_phi_pi_fibonacci(v) * 50) — substrate-routed
69+
# log bucketing for ranges spanning multiple magnitudes
70+
# (latency, bytes). Uses OMC's canonical φ-π-fibonacci
71+
# substrate so buckets align with the attractor lattice
72+
# rather than arbitrary base-10 decades.
73+
# 1 = "modulo" — fold(v) — for periodic small ints (hour-of-day)
74+
# 2 = "discrete" — value IS the bucket (status codes, endpoint IDs)
4375
#
44-
# Default: "log" if numeric and varies, else "discrete".
76+
# Default: 0 (log) for everything.
4577

4678
fn _bucket_log(v) {
4779
if v <= 0 { return 0; }
4880
h logv = log_phi_pi_fibonacci(to_float(v));
49-
return fold(to_int(logv * 50));
81+
# 50.0 (not 50) so the compiler emits MulFloat — the JIT path
82+
# treats float bit-patterns and ints differently for *. With the
83+
# int literal `50`, the multiplication would be Op::Mul (untyped)
84+
# which the JIT would treat as integer multiplication of a float
85+
# bit-pattern (garbage).
86+
return fold(to_int(logv * 50.0));
5087
}
5188
fn _bucket_modulo(v) { return fold(to_int(v)); }
5289
fn _bucket_discrete(v) { return v; }
5390

54-
fn _bucket_for(strategy, v) {
55-
if strategy == "log" { return _bucket_log(v); }
56-
if strategy == "modulo" { return _bucket_modulo(v); }
91+
# Strategy-coded bucket lookup. JIT-eligible: pure int comparison +
92+
# arithmetic. No string equality.
93+
fn _bucket_for_code(code, v) {
94+
if code == 0 { return _bucket_log(v); }
95+
if code == 1 { return _bucket_modulo(v); }
5796
return _bucket_discrete(v);
5897
}
5998

99+
# Convert public string strategy → internal int code.
100+
fn _strategy_to_code(s) {
101+
if s == "log" { return 0; }
102+
if s == "modulo" { return 1; }
103+
if s == "discrete" { return 2; }
104+
return 0;
105+
}
106+
107+
# Linear scan: find the index of `target` in `keys`, or -1 if absent.
108+
# JIT-eligible (pure arr_get + comparison + while loop).
109+
fn _find_key(keys, target) {
110+
h n = arr_len(keys);
111+
h i = 0;
112+
while i < n {
113+
if arr_get(keys, i) == target { return i; }
114+
i = i + 1;
115+
}
116+
return 0 - 1;
117+
}
118+
60119
# ---- Detector lifecycle --------------------------------------------------
61120

62121
# Create a fresh detector. dim_names is an array of strings (one per
63-
# dimension). dim_strategies is an OPTIONAL array of equal length;
64-
# default is "log" for everything.
122+
# dimension). Default strategy is 0 (log) for every dim.
123+
#
124+
# Returns an array layout: [n_dims, strategies, freq_keys, freq_counts, n].
125+
# Use ha.set_strategy() / ha.fit() / ha.score_all() to interact.
65126
fn new(dim_names) {
127+
h n_dims = arr_len(dim_names);
66128
h strategies = [];
67129
h k = 0;
68-
while k < arr_len(dim_names) {
69-
arr_push(strategies, "log");
130+
while k < n_dims {
131+
arr_push(strategies, 0);
70132
k = k + 1;
71133
}
72-
return {
73-
"dims": dim_names,
74-
"strategies": strategies,
75-
"freqs": [],
76-
"n": 0
77-
};
134+
h freq_keys = [];
135+
h freq_counts = [];
136+
h d = 0;
137+
while d < n_dims {
138+
arr_push(freq_keys, []);
139+
arr_push(freq_counts, []);
140+
d = d + 1;
141+
}
142+
h det = [];
143+
arr_push(det, n_dims);
144+
arr_push(det, strategies);
145+
arr_push(det, freq_keys);
146+
arr_push(det, freq_counts);
147+
arr_push(det, 0);
148+
return det;
78149
}
79150

80151
# Override one dim's bucket strategy. Useful when you have a discrete
81152
# field (status_code, country_code) where log-bucketing makes no sense.
82153
# ha.set_strategy(det, 1, "discrete") # 2nd dim is categorical
83154
fn set_strategy(detector, dim_idx, strategy) {
84-
h strats = dict_get(detector, "strategies");
85-
arr_set(strats, dim_idx, strategy);
86-
dict_set(detector, "strategies", strats);
155+
h strats = arr_get(detector, 1);
156+
arr_set(strats, dim_idx, _strategy_to_code(strategy));
87157
return detector;
88158
}
89159

90160
# Fit the detector to a corpus of rows. Each row is an array of
91-
# values parallel to dim_names. Builds per-dim frequency tables.
161+
# values parallel to dim_names. Builds per-dim frequency arrays.
162+
# Cold path — runs once at startup; uses arr_push for dynamic growth.
92163
fn fit(detector, rows) {
93-
h dims = dict_get(detector, "dims");
94-
h strategies = dict_get(detector, "strategies");
95-
h n_dims = arr_len(dims);
164+
h n_dims = arr_get(detector, 0);
165+
h strategies = arr_get(detector, 1);
166+
h freq_keys = arr_get(detector, 2);
167+
h freq_counts = arr_get(detector, 3);
96168
h n_rows = arr_len(rows);
97169

98-
# Build one frequency dict per dim.
99-
h freqs = [];
100-
h d = 0;
101-
while d < n_dims { arr_push(freqs, {}); d = d + 1; }
102-
103170
h r = 0;
104171
while r < n_rows {
105172
h row = arr_get(rows, r);
106173
h di = 0;
107174
while di < n_dims {
108-
h strategy = arr_get(strategies, di);
109-
h bkt = _bucket_for(strategy, arr_get(row, di));
110-
h freq = arr_get(freqs, di);
111-
h key = concat_many("", bkt);
112-
dict_set(freq, key, dict_get(freq, key, 0) + 1);
175+
h code = arr_get(strategies, di);
176+
h bkt = _bucket_for_code(code, arr_get(row, di));
177+
h keys = arr_get(freq_keys, di);
178+
h counts = arr_get(freq_counts, di);
179+
h idx = _find_key(keys, bkt);
180+
if idx < 0 {
181+
arr_push(keys, bkt);
182+
arr_push(counts, 1);
183+
} else {
184+
arr_set(counts, idx, arr_get(counts, idx) + 1);
185+
}
113186
di = di + 1;
114187
}
115188
r = r + 1;
116189
}
117-
dict_set(detector, "freqs", freqs);
118-
dict_set(detector, "n", n_rows);
190+
arr_set(detector, 4, n_rows);
119191
return detector;
120192
}
121193

122194
# Score a single row. Returns sum-of-marginal-log-rarities; higher =
123195
# more structurally anomalous.
196+
#
197+
# Hot path — called once per scored row. Uses ONLY JIT-eligible ops:
198+
# arr_get, arr_len, while loop, arithmetic, log_phi_pi_fibonacci,
199+
# to_float, int comparison. No dict ops, no string ops. The whole
200+
# function compiles in dual-band mode.
124201
fn score(detector, row) {
125-
h dims = dict_get(detector, "dims");
126-
h strategies = dict_get(detector, "strategies");
127-
h freqs = dict_get(detector, "freqs");
128-
h n = dict_get(detector, "n");
129-
h n_dims = arr_len(dims);
202+
h n_dims = arr_get(detector, 0);
203+
h strategies = arr_get(detector, 1);
204+
h freq_keys = arr_get(detector, 2);
205+
h freq_counts = arr_get(detector, 3);
206+
h n = arr_get(detector, 4);
130207
h total = 0.0;
131208
h di = 0;
132209
while di < n_dims {
133-
h strategy = arr_get(strategies, di);
134-
h bkt = _bucket_for(strategy, arr_get(row, di));
135-
h freq = arr_get(freqs, di);
136-
h key = concat_many("", bkt);
137-
h count = dict_get(freq, key, 1);
210+
h code = arr_get(strategies, di);
211+
h bkt = _bucket_for_code(code, arr_get(row, di));
212+
h keys = arr_get(freq_keys, di);
213+
h counts = arr_get(freq_counts, di);
214+
h idx = _find_key(keys, bkt);
215+
h count = 1;
216+
if idx >= 0 {
217+
count = arr_get(counts, idx);
218+
}
138219
# Critical: float division, not int division.
139-
h p = to_float(count) / n;
140-
if p <= 0 { p = 1.0 / to_float(n); }
220+
h p = to_float(count) / to_float(n);
221+
if p <= 0.0 { p = 1.0 / to_float(n); }
141222
# Substrate-routed rarity. -log(p) = log(1/p); use the
142223
# φ-π-fibonacci substrate so rarity is measured in the same
143224
# units as resonance/HIM elsewhere in OMC. Monotonic transform

omnimcode-cli/src/main.rs

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -120,7 +120,24 @@ fn maybe_register_jit(
120120
if std::env::var("OMC_HBIT_JIT").as_deref() != Ok("1") {
121121
return;
122122
}
123-
let module = match omnimcode_core::compiler::compile_program(statements) {
123+
// Inline imports BEFORE compile_program. The bytecode compiler
124+
// treats Statement::Import as a no-op (the tree-walk interpreter
125+
// normally handles imports at statement-execution time), so
126+
// without inlining the JIT can only see top-level user fns and
127+
// misses the entire imported library surface. This was the L1
128+
// measurement gap on NSL-KDD: harmonic_anomaly's score/fit/top_k
129+
// live in the imported library, so jit_module never saw them.
130+
let inlined = match Interpreter::inline_imports(statements.to_vec()) {
131+
Ok(v) => v,
132+
Err(e) => {
133+
eprintln!(
134+
"[OMC_HBIT_JIT] inline_imports failed: {} — falling back to tree-walk",
135+
e
136+
);
137+
return;
138+
}
139+
};
140+
let module = match omnimcode_core::compiler::compile_program(&inlined) {
124141
Ok(m) => m,
125142
Err(e) => {
126143
eprintln!("[OMC_HBIT_JIT] compile_program failed: {} — falling back to tree-walk", e);

omnimcode-codegen/src/dual_band.rs

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -393,6 +393,36 @@ impl<'ctx, 'a> DualBandLowerer<'ctx, 'a> {
393393
let val = self.emit_array_index(arr_v, idx_v, i)?;
394394
stack.push(self.splat(val, "aidx_v")?);
395395
}
396+
// L1: substrate fold (snap to nearest Fibonacci attractor).
397+
// Calls the extern omc_fold helper with α; splats result.
398+
Op::Fold1 => {
399+
let v_v = self.pop(&mut stack, i, "Fold1 arg")?;
400+
let alpha = self
401+
.builder
402+
.build_extract_element(v_v, i64_type.const_int(0, false), "fold_a")
403+
.map_err(|e| format!("hbit Fold1 extract at op{}: {}", i, e))?;
404+
let alpha_iv = match alpha {
405+
BasicValueEnum::IntValue(iv) => iv,
406+
_ => return Err(format!("hbit Fold1 α not int at op{}", i)),
407+
};
408+
let fold_fn = self
409+
.module
410+
.get_function("omc_fold")
411+
.ok_or_else(|| format!("omc_fold not declared at op{}", i))?;
412+
let call = self
413+
.builder
414+
.build_call(fold_fn, &[alpha_iv.into()], "fold_call")
415+
.map_err(|e| format!("hbit Fold1 call at op{}: {}", i, e))?;
416+
let ret = call
417+
.try_as_basic_value()
418+
.left()
419+
.ok_or_else(|| format!("hbit Fold1 call no value at op{}", i))?;
420+
let ret_iv = match ret {
421+
BasicValueEnum::IntValue(iv) => iv,
422+
_ => return Err(format!("hbit Fold1 call ret not int at op{}", i)),
423+
};
424+
stack.push(self.splat(ret_iv, "fold_ret_v")?);
425+
}
396426
// Path D: array writes. ArrSetNamed(name) is the
397427
// optimized form the compiler emits for
398428
// `arr_set(name, idx, val)` where `name` is a literal
@@ -593,6 +623,40 @@ impl<'ctx, 'a> DualBandLowerer<'ctx, 'a> {
593623
stack.push(new_v);
594624
continue;
595625
}
626+
if name == "log_phi_pi_fibonacci" && *argc == 1 {
627+
// L1: substrate-routed log via extern Rust call.
628+
// Arg is float-bit-pattern in α lane; pass scalar
629+
// to the extern; splat the f64-bit-pattern result.
630+
let v_v = self.pop(&mut stack, i, "log_phi_pi_fibonacci arg")?;
631+
let alpha = self
632+
.builder
633+
.build_extract_element(v_v, i64_type.const_int(0, false), "log_a")
634+
.map_err(|e| format!("hbit log_phi extract at op{}: {}", i, e))?;
635+
let alpha_iv = match alpha {
636+
BasicValueEnum::IntValue(iv) => iv,
637+
_ => return Err(format!("hbit log_phi α not int at op{}", i)),
638+
};
639+
let log_fn = self
640+
.module
641+
.get_function("omc_log_phi_pi_fibonacci")
642+
.ok_or_else(|| {
643+
format!("omc_log_phi_pi_fibonacci not declared at op{}", i)
644+
})?;
645+
let call = self
646+
.builder
647+
.build_call(log_fn, &[alpha_iv.into()], "log_call")
648+
.map_err(|e| format!("hbit log_phi call at op{}: {}", i, e))?;
649+
let ret = call
650+
.try_as_basic_value()
651+
.left()
652+
.ok_or_else(|| format!("hbit log_phi call no value at op{}", i))?;
653+
let ret_iv = match ret {
654+
BasicValueEnum::IntValue(iv) => iv,
655+
_ => return Err(format!("hbit log_phi call ret not int at op{}", i)),
656+
};
657+
stack.push(self.splat(ret_iv, "log_ret_v")?);
658+
continue;
659+
}
596660
if name == "to_int" && *argc == 1 {
597661
let v_v = self.pop(&mut stack, i, "to_int arg")?;
598662
let f64_type = self.ctx.f64_type();

0 commit comments

Comments
 (0)