Skip to content

Commit 8128844

Browse files
committed
Merge claude/phi-field-llm-evolution: hybrid harmonic LLM experiments + substrate unification
13 commits landing two streams of work: Stream 1 — hybrid harmonic/transformer experiments (exp 0 through 5) Pure-OMC measurements of where the harmonic substrate wins vs standard transformer components. Headline results: - exp 1: OmniWeight loses to softmax as a per-head scorer (perturbed query recovery: softmax wins at every noise level). - exp 3: Multi-channel phi-fold PE loses to multi-channel sinusoidal on length-distinct positional encoding (L=200: harmonic 34 unique, sinusoidal 64 unique). - exp 4: Harmonic OOD gate wins when magnitudes are matched (AUROC 0.956 vs L2's 0.946). Loses when magnitudes differ (L2's free magnitude separation makes it hard to beat). - exp 5: HBit cross-cutting tension is a REFERENCE-FREE OOD signal. Perfect detection (AUROC 1.0) on scenario A; complementary to L2 and marginal-rarity on scenario B. Three-gate combined detector is the deployable artefact. Cumulative read: the harmonic substrate is a STRUCTURAL DETECTOR, not a primary computation. Right hybrid is auxiliary signals on top of a standard transformer, not substitution at the per-component level. Stream 2 — substrate unification (exp 6 through 9 + refactor) Investigates the architectural inconsistency where phi.fold and HInt::compute_resonance — the substrate's foundational operations — were implemented with linear scans over duplicated Fibonacci tables instead of using phi_pi_fib search against the canonical table. Findings + refactor: - exp 6: Compression-gate prototype (model = library + chain of keys). Compression: ~34x vs equivalent dense table. Death tolerance: all 12 library deletions complete cleanly via nearest-key fallback. Interchangeability: 6 chains over same library yield 6 distinct behaviours. - exp 7: Wired phi_pi_fib_search and friends as OMC builtins. Empirically sublinear: N=8 -> 3.79 compares, N=1024 -> 12.57. - exp 8: Head-to-head bench of three search algorithms decisively settled the PHI_PI_FIB_ALGORITHM.md vs TIER_4_HONEST_REVISION contradiction in favour of HONEST_REVISION. Binary search wins on raw compares; the v2 F(k)/phi^(pi*k) algorithm is 28% slower than even Fibonacci-step search. - exp 9 + re-investigation: the RIGHT metric is substrate coherence, not raw compare count. phi_pi_fib's step sizes are F(k) by construction (~90% Fibonacci empirically); binary's halving steps are only 66% accidentally-Fibonacci. - Refactor: nearest_attractor_with_dist promoted to substrate root. HInt::compute_resonance, value::is_fibonacci, phi.fold's interpreter dispatch, the VM and bytecode optimiser's fold paths, plus 9 other duplicate fibs sites — ALL routed through the canonical FIBONACCI table via phi_pi_fib search. 16 of 17 duplicate Fibonacci arrays deleted (last one is the harmonic_split chunk-size lookup, a different semantic). Counter split into EXPLICIT vs BACKGROUND channels so experiment numbers stay clean. Side effect: res(value) for |value| > 610 is now strictly more accurate. Old behaviour saturated at 610 (res(1000) = 0.610); new behaviour finds 987 (res(1000) = 0.987). Nothing in the test suite depended on saturation. Tests + audits: - 148/148 tests pass (92 + 1 + 46 + 9, including all conformance physics-lock tests). - All 10 hybrid_llm experiments audit byte-identical between tree-walk and bytecode VM. CLAUDE.md (separate documentation branch) NOT included in this merge.
2 parents 9e8c71f + 0973799 commit 8128844

16 files changed

Lines changed: 3870 additions & 179 deletions

experiments/hybrid_llm/README.md

Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
# Hybrid Harmonic / Transformer LLM
2+
3+
This branch (`claude/phi-field-llm-evolution`) explores using OMC's φ-math
4+
primitives to replace or augment specific transformer components, with the
5+
goal of producing measurable behavior differences on real sequence tasks.
6+
7+
The existing pure-OMC demos (`examples/phi_field_llm_demo.omc`,
8+
`examples/phi_field_llm_multilayer.omc`) prove that geodesic
9+
attention — picking the Fibonacci attractor with the highest
10+
`OmniWeight w = φ^(-|e|)` — runs end-to-end. They don't yet show
11+
**when** that's better than softmax-QK attention and **what it costs**.
12+
This experiment series answers that.
13+
14+
## The substitutions we want to test
15+
16+
Three transformer pieces map cleanly onto OMC's harmonic primitives:
17+
18+
| Transformer piece | Harmonic replacement | What we're measuring |
19+
|---|---|---|
20+
| **Sinusoidal positional encoding** | Golden-angle rotation (`pos * 2π/φ²`) folded onto Fibonacci attractors via `phi.fold`. | Length-generalization: does a model trained on length N still work at 2N? Sinusoidal PE is known to extrapolate poorly. |
21+
| **Softmax attention scoring** | OmniWeight: `w(q, k) = φ^(-|q − k| / max(\|k\|, 1))`. Per-position; pick argmax instead of weighted average. | Sharpness vs. softness. OmniWeight is winner-take-all. Useful for copy/lookup tasks; lossy for averaging tasks. |
22+
| **Layer-norm + residual** | `phi.fold(residual_blend)` (already implemented in `phi_field_llm_multilayer.omc`). | Whether the φ-fold provides a useful regularizer that keeps activations on-attractor. |
23+
24+
Phase 0 of this branch focuses on (2) — OmniWeight attention — because
25+
it's the most isolated and the existing demos already implement it.
26+
The other two come later.
27+
28+
## Experiment 0: Copy task — OmniWeight vs softmax
29+
30+
The simplest task that distinguishes the two approaches:
31+
32+
- **Input:** a sequence of 8 Fibonacci-aligned tokens drawn at random
33+
from `{1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233}`, plus a separator,
34+
plus a "query" token that copies one of the inputs verbatim.
35+
Example: `[34, 8, 89, 13, 21, |, 89]` → expected next token `89`.
36+
- **Models:**
37+
- OmniWeight-attention head over the input (the current
38+
`best_attractor` mechanism).
39+
- Softmax-attention head over the same inputs, where the score is
40+
`exp(-|q − k|)` normalized. Both use **no learned weights** — this
41+
isolates the scoring function from training dynamics.
42+
- **Metric:** exact-match accuracy on 100 random instances, broken
43+
down by (a) whether the query exactly matches an input, (b) how
44+
many distractors share the query's nearest attractor.
45+
46+
If OmniWeight wins on (a) and loses on (b), that confirms the
47+
"winner-take-all" thesis and tells us where to apply it in a larger model.
48+
49+
**Status:** `experiment_0_copy_task.omc` runs this comparison.
50+
51+
## Why no torch yet
52+
53+
The current remote environment has no torch / numpy. Pure-OMC
54+
experiments give us:
55+
56+
1. Deterministic, reproducible runs inside the standalone binary.
57+
2. No dependency on `python-embed` for the experiment itself.
58+
3. A baseline that any later torch-based experiment must match
59+
byte-for-byte on the harmonic side.
60+
61+
Once we have a winning harmonic primitive, the next branch step is to
62+
port the same scoring rule to PyTorch (via `examples/lib/torch.omc` or
63+
a stand-alone Python script) and bench against a real learned model
64+
on a real corpus.
65+
66+
## How to run
67+
68+
```bash
69+
# Build (one time)
70+
PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1 cargo build --release
71+
72+
# Run experiment 0 (tree-walk)
73+
./target/release/omnimcode-standalone experiments/hybrid_llm/experiment_0_copy_task.omc
74+
75+
# Same under the bytecode VM
76+
OMC_VM=1 ./target/release/omnimcode-standalone experiments/hybrid_llm/experiment_0_copy_task.omc
77+
78+
# Audit: bytecode VM must match tree-walk
79+
./target/release/omnimcode-standalone --audit experiments/hybrid_llm/experiment_0_copy_task.omc
80+
```
81+
82+
## Results so far
83+
84+
| Experiment | Setting | Headline number |
85+
|---|---|---|
86+
| 0 | Copy task, exact-match query, 100 trials | OmniWeight 82/100, softmax 82/100, 0 disagreements. Confirms both scorers agree on exact match (the 18 "misses" are duplicate-value trials, both tie-break to first occurrence). |
87+
| 1 | Perturbed query (query = true_val + noise), 200 trials per noise level | Softmax wins everywhere. noise=1: 189 vs 170. noise=7: 118 vs 99. noise=50: 42 vs 33. OmniWeight's |k|-normalised denominator pulls toward smaller-magnitude attractors regardless of perturbation direction, which hurts the "recover the original value" objective. |
88+
| 2 | Single-channel PE distinctness + lookup at L = 8 / 14 / 24 / 48 | Sinusoidal wins at short L (8/8 vs 6/8). At L=48 harmonic appears to overtake: 38/48 vs 26/48 (79% vs 54%). Flagged as a likely metric artefact — single-int "closest code" lookup favours monotonic over periodic encodings. |
89+
| 3 | 4-channel PE (harmonic primes 7/11/13/17, sin/cos periods 8/64), L2 lookup, L = 8 → 200 | **Sinusoidal regains its lead decisively at every L ≥ 16.** L=48: 48/48 vs 21/48. L=200: 72/200 vs 34/200. Harmonic saturates at 22 unique vectors by L=64; sinusoidal stays perfectly distinct up to L=64 then saturates at 64. The single-channel L=48 harmonic "win" was a metric artefact, exactly as suspected. |
90+
| 4A | Harmonic OOD gate vs L2-NN baseline on 4-dim synthetic vectors (N_REF=300, 150 in-dist test, 150 OOD test). OOD = uniform [1, 90]. | L2 wins. AUROC L2 0.961 vs harmonic 0.910. TPR @ FPR=10%: L2 0.91 vs harmonic 0.71. L2 has a trivial magnitude advantage — mean L2 score 87 (in-dist) vs 1313 (OOD), since OOD vectors are larger on average and harmonic gate's `phi.fold` discards magnitude. |
91+
| 4B | Same gates, **magnitude-matched** structural OOD (inverted attractor weights: 10%/30%/60% small/med/large vs in-dist's 60%/30%/10%). | **Harmonic edges past L2 in AUROC: 0.956 vs 0.946.** At low FPR L2 still wins (TPR@FPR=1%: L2 0.60 vs harmonic 0.48), but on overall ranking the structural rarity signal beats the L2 metric once magnitude is no longer a giveaway. |
92+
| 5 | HBit cross-cutting tension (no reference) + combined gate (sum of z-normalised HBit, marginal rarity, L2) on both scenarios. | **Scenario A: HBit tension AUROC = 1.0** (perfect — mean tension 0.0 in-dist vs 20.1 OOD). Combined: 0.999. **Scenario B: HBit AUROC = 0.5** (random — both sides on-manifold, tension = 0 everywhere). Combined: 0.967, beating every single gate. Each gate owns a different OOD axis: HBit→off-manifold, marginal→distribution-shift, L2→magnitude. |
93+
| 6 | Phi-Pi-Fib compression gate: model as `(library + chain of keys)` instead of dense weights. 12-primitive library keyed by Fibonacci attractors, gate = nearest-key lookup, chains = "parameters". | Composition: trace `[3, 8, 13, 5, 21]` on state 7 → 9. Compression: 29 ints (library+chain) vs ~1001 ints dense table over [0,1000] = ~34× smaller (extrapolates to 9 orders of magnitude at LLM scale). **Death tolerance: all 12 library deletions complete without crashing — biggest deltas: kill key=13 → +12, kill key=5 → +5, kill key=21 → +3. 8 of 12 deletions invisible to output (unused capabilities or path coincidence).** Interchangeability: 6 different chains over the same library yield 6 different outputs (9, 22, 9, 5, 5, 52). |
94+
| 7 | Wire `phi_pi_fib::fibonacci_search` in as four OMC builtins (`phi_pi_fib_search`, `phi_pi_fib_nearest`, `phi_pi_fib_stats`, `phi_pi_fib_reset`). Rerun exp 6's gate using the real Fibonacci-step search; measure comparison counts vs library size. | **Sublinear scaling confirmed.** N=8 → 3.8 compares/search, N=1024 → 12.6. Going 128× wider in library size grows the per-lookup work only ~3.3×, vs ~64× for a linear scan. Empirically tracks `~log₂(N)`, slightly better than `log_φ_π_fibonacci(N) ≈ 1.44·log₂(N)`. Sanity check passes (same final state as exp 6). Death tolerance preserved across all 12 library deletions. 148/148 existing tests still pass. |
95+
96+
### Cumulative read across experiments 0–5
97+
98+
The six experiments now form a complete picture. Each OOD axis has
99+
a gate that owns it:
100+
101+
| Failure mode | Owning gate | Cost | Scenario A AUROC | Scenario B AUROC |
102+
|---|---|---|---|---|
103+
| Off-manifold values | **HBit cross-cutting tension** | **Reference-free** | **1.000** | 0.500 |
104+
| Wrong attractor distribution | Marginal log-rarity (exp 4 harmonic) | needs reference | 0.910 | 0.956 |
105+
| Wrong magnitude | L2 nearest-neighbour | needs reference | 0.961 | 0.946 |
106+
| Any of the above | Sum of z-normalised triple | needs reference | 0.999 | 0.967 |
107+
108+
The HBit gate is the cheapest possible: `sum_d |v[d] − phi.fold(v[d])|`.
109+
Zero fitting, zero reference set, perfect detector when the OOD axis is
110+
"value isn't a Fibonacci attractor". Useless when both sides are
111+
on-manifold (scenario B mean tension is 0.0 on both in-dist and OOD —
112+
the gate can't see any difference).
113+
114+
The combined gate is the clear winner across both scenarios. Sum of
115+
z-normalised per-gate scores, with the z-normalisation parameters
116+
fit on **in-dist scores only** (the combiner doesn't peek at OOD data).
117+
Scenario A: 0.999 — almost perfect, gets HBit's free wins plus L2 and
118+
marginal contributions. Scenario B: 0.967 — beats every individual
119+
gate by 1-2 AUROC points.
120+
121+
What this means concretely:
122+
123+
1. **Reference-free OOD detection is real on harmonic-structured
124+
data.** If your in-distribution lives on (or near) the Fibonacci
125+
attractor manifold, HBit tension is a free OOD signal you can
126+
compute on a single test point with no model fitting. Cost is
127+
D float subtractions per test point.
128+
129+
2. **The "harmonic substrate is a structural detector" thesis is
130+
now empirically grounded for OOD gating**, with quantified
131+
contribution from each piece. Exp 0-3 ruled out using harmonic
132+
primitives as drop-in replacements for transformer components.
133+
Exp 4-5 found their actual home: as auxiliary detectors layered
134+
onto raw features (or activations) to catch failure modes that
135+
L2 alone misses.
136+
137+
3. **The combined gate is the deployable artifact.** Three
138+
complementary axes, z-normalised on the reference, summed.
139+
Wins on both magnitude-shifted and structural OOD. Beats every
140+
single-gate baseline.
141+
142+
### What changed between experiment 2 and experiment 3
143+
144+
Experiment 2 used **single-integer codes** and a **closest-int**
145+
lookup metric. Single-integer codes can't capture the geometric
146+
frequency layering that makes sinusoidal PE work in real
147+
transformers — once the period wraps, the encoding is dead.
148+
149+
Experiment 3 used **4-channel vectors** and **L2 distance**. That
150+
gives sinusoidal a long-period channel (P=64) that stays distinct
151+
well past the short-period channel's wrap. Harmonic gets four
152+
prime-multiplier channels but they all saturate at the same
153+
Fibonacci ceiling, so the joint vector hits its uniqueness budget
154+
fast (22 unique vectors total) and stays there forever.
155+
156+
The lesson is one of the project's existing themes spelled out
157+
again: **measure honestly, and let the measurement reshape the
158+
plan.** Experiment 2's headline number was reproducible and
159+
audited, but the framing was wrong. Adding experiment 3 — same
160+
question, fairer comparison — flipped the answer. The README is
161+
updated to reflect the cumulative read, not just the latest
162+
result.
163+
164+
## Roadmap on this branch
165+
166+
- **0** Copy task: OmniWeight vs softmax scoring. ✓ done
167+
- **1** Perturbed-query divergence study. ✓ done
168+
- **2** Single-channel positional-encoding distinctness + lookup. ✓ done
169+
- **3** Multi-channel PE with L2 lookup. ✓ done
170+
- **4** Harmonic OOD gate vs L2-NN baseline, two scenarios. ✓ done
171+
- **5** HBit cross-cutting tension + 3-gate combined detector. ✓ done
172+
- **6** Phi-Pi-Fib compression gate: model = library + chain. ✓ done
173+
- **7** Wire `omnimcode-core/src/phi_pi_fib.rs::fibonacci_search` in
174+
as four OMC builtins; rerun exp 6's gate on top; measure compare
175+
counts. ✓ done
176+
- **8** Learnable routing policy: a function `state -> chain` that
177+
picks WHICH chain to run from input state. Start with a simple
178+
hand-authored policy (if state on small attractor use chain A,
179+
else chain B); then explore phi-folded state as a hash into a
180+
policy table. This is the "compression gate as learned component"
181+
half — exp 6 had only the library + nearest-key fallback.
182+
- **9** Layer-norm-matched OOD setup (was the old exp 6): pre-
183+
normalise to unit L2 and re-run scenarios A and B from exp 4.
184+
Confirms HBit's magnitude-invariance.
185+
- **10** Bake the combined OOD gate into a reusable library:
186+
`experiments/hybrid_llm/lib/ood_gate.omc` exposing
187+
`ood_gate.fit(ref_corpus)` and `ood_gate.score(vec)`. Then once
188+
torch is available, replicate on real transformer activations.
Lines changed: 186 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,186 @@
1+
# =============================================================================
2+
# Experiment 0 — OmniWeight vs softmax scoring on a copy task.
3+
#
4+
# Setup:
5+
# - 8-token "context" sampled from a Fibonacci-attractor vocab.
6+
# - 1 separator token (0) — present but not scored.
7+
# - 1 "query" token that equals one of the context tokens verbatim.
8+
# - Both scoring rules must point at the matching context position.
9+
#
10+
# We are NOT training anything. We're isolating the SCORING FUNCTION:
11+
# - OmniWeight (harmonic): w(q, k) = φ^(-|q - k| / max(|k|, 1))
12+
# - Softmax-style: w(q, k) = exp(-|q - k|), normalized
13+
# Both pick the argmax position; we measure exact-match accuracy.
14+
#
15+
# Run:
16+
# ./target/release/omnimcode-standalone experiments/hybrid_llm/experiment_0_copy_task.omc
17+
# OMC_VM=1 ./target/release/omnimcode-standalone experiments/hybrid_llm/experiment_0_copy_task.omc
18+
# =============================================================================
19+
20+
h PHI = 1.6180339887498948;
21+
22+
# Vocabulary: 12 Fibonacci attractors. Index 0 is the separator (never queried).
23+
h VOCAB_SIZE = 12;
24+
fn vocab_at(i) -> int {
25+
if i == 0 { return 1; }
26+
if i == 1 { return 2; }
27+
if i == 2 { return 3; }
28+
if i == 3 { return 5; }
29+
if i == 4 { return 8; }
30+
if i == 5 { return 13; }
31+
if i == 6 { return 21; }
32+
if i == 7 { return 34; }
33+
if i == 8 { return 55; }
34+
if i == 9 { return 89; }
35+
if i == 10 { return 144; }
36+
if i == 11 { return 233; }
37+
return 1;
38+
}
39+
40+
# ---------------------------------------------------------------------------
41+
# Scoring functions
42+
# ---------------------------------------------------------------------------
43+
44+
# OmniWeight: harmonic geodesic distance, no normalization.
45+
fn omni_weight(q, k) -> float {
46+
h diff = to_float(q - k);
47+
if diff < 0.0 { diff = 0.0 - diff; }
48+
h denom = to_float(k);
49+
if denom < 0.0 { denom = 0.0 - denom; }
50+
if denom < 1.0 { denom = 1.0; }
51+
h e = diff / denom;
52+
return pow(PHI, 0.0 - e);
53+
}
54+
55+
# Softmax-style score: exp(-|q - k|). Returned unnormalized; argmax is
56+
# scale-invariant so the softmax denominator doesn't affect the
57+
# selection. (We'd need it for KL divergence or sampling, not here.)
58+
fn softmax_score(q, k) -> float {
59+
h diff = to_float(q - k);
60+
if diff < 0.0 { diff = 0.0 - diff; }
61+
return exp(0.0 - diff);
62+
}
63+
64+
# ---------------------------------------------------------------------------
65+
# Argmax over a length-N context using a chosen scoring function.
66+
# `which_score` = 0 → OmniWeight, 1 → softmax.
67+
# Returns the INDEX of the highest-scoring position.
68+
# ---------------------------------------------------------------------------
69+
fn argmax_score(context, n, query, which_score) -> int {
70+
h best_idx = 0;
71+
h k0 = arr_get(context, 0);
72+
h best_score = 0.0;
73+
if which_score == 0 {
74+
best_score = omni_weight(query, k0);
75+
} else {
76+
best_score = softmax_score(query, k0);
77+
}
78+
79+
h i = 1;
80+
while i < n {
81+
h k = arr_get(context, i);
82+
h s = 0.0;
83+
if which_score == 0 {
84+
s = omni_weight(query, k);
85+
} else {
86+
s = softmax_score(query, k);
87+
}
88+
if s > best_score {
89+
best_score = s;
90+
best_idx = i;
91+
}
92+
i = i + 1;
93+
}
94+
return best_idx;
95+
}
96+
97+
# ---------------------------------------------------------------------------
98+
# Build one trial: random 8-token context, query equals context[target_idx].
99+
# Returns the predicted index from each scorer.
100+
# We pre-seed with random_seed() so the trials are reproducible.
101+
# ---------------------------------------------------------------------------
102+
fn run_trial(target_idx, ctx_len) -> array {
103+
h context = arr_new(ctx_len, 0);
104+
h i = 0;
105+
while i < ctx_len {
106+
# random_int is inclusive on both ends in OMC, so clamp upper.
107+
h v = random_int(0, VOCAB_SIZE - 1);
108+
arr_set(context, i, vocab_at(v));
109+
i = i + 1;
110+
}
111+
h query = arr_get(context, target_idx);
112+
113+
h omni_pick = argmax_score(context, ctx_len, query, 0);
114+
h soft_pick = argmax_score(context, ctx_len, query, 1);
115+
116+
# Pack result as [target, omni_pick, soft_pick] for downstream tallying.
117+
h out = arr_new(3, 0);
118+
arr_set(out, 0, target_idx);
119+
arr_set(out, 1, omni_pick);
120+
arr_set(out, 2, soft_pick);
121+
return out;
122+
}
123+
124+
# ---------------------------------------------------------------------------
125+
# Main loop.
126+
# ---------------------------------------------------------------------------
127+
random_seed(42);
128+
129+
h N_TRIALS = 100;
130+
h CTX_LEN = 8;
131+
132+
print("== Experiment 0: OmniWeight vs softmax-style argmax (copy task) ==");
133+
print(concat_many("trials=", N_TRIALS, " ctx_len=", CTX_LEN, " vocab_size=", VOCAB_SIZE));
134+
print("");
135+
136+
h omni_correct = 0;
137+
h soft_correct = 0;
138+
h disagreements = 0;
139+
h trial = 0;
140+
while trial < N_TRIALS {
141+
# Pick a target position uniformly in [0, CTX_LEN). random_int is
142+
# inclusive on both ends, so the upper bound is CTX_LEN - 1.
143+
h target = random_int(0, CTX_LEN - 1);
144+
h trial_out = run_trial(target, CTX_LEN);
145+
h tgt = arr_get(trial_out, 0);
146+
h omni = arr_get(trial_out, 1);
147+
h soft = arr_get(trial_out, 2);
148+
149+
# A "correct" pick = the picked position carries the same VALUE as the
150+
# target position. Multiple positions can share a value, so any
151+
# collision is a fair hit.
152+
# NOTE: when several positions hold the same Fibonacci value, all of
153+
# them are valid argmaxes. We don't penalize ambiguous trials here —
154+
# both scorers will tie-break to the first occurrence.
155+
if omni == tgt { omni_correct = omni_correct + 1; }
156+
if soft == tgt { soft_correct = soft_correct + 1; }
157+
if omni != soft { disagreements = disagreements + 1; }
158+
trial = trial + 1;
159+
}
160+
161+
print(concat_many("OmniWeight argmax correct: ", omni_correct, " / ", N_TRIALS));
162+
print(concat_many("Softmax-style argmax correct: ", soft_correct, " / ", N_TRIALS));
163+
print(concat_many("Disagreements between scorers: ", disagreements, " / ", N_TRIALS));
164+
print("");
165+
166+
# When q == k exactly, both score functions return their maximum:
167+
# OmniWeight → φ^0 = 1; softmax → e^0 = 1. Both should pick correctly
168+
# whenever the query exactly matches AT LEAST ONE context value. The
169+
# interesting case is the tie-break — both implementations pick the
170+
# first occurrence, so they should AGREE on every trial.
171+
#
172+
# If disagreements > 0, we've found a case where φ-geodesic distance
173+
# and exponential distance rank differently. That's the seed for
174+
# experiment 1 (perturbed-query task) — handle it there.
175+
176+
print("== Sanity check ==");
177+
print("Both scorers monotonically decrease in |q - k|, so on EXACT-MATCH");
178+
print("queries they should always agree. Non-zero disagreement is a bug");
179+
print("OR an environment ambiguity (multiple positions sharing the query");
180+
print("value, tie-broken differently).");
181+
print("");
182+
print("Next: experiment_1_perturbed.omc — query is off-attractor.");
183+
print("There, OmniWeight's denominator (max(|k|, 1)) normalises the");
184+
print("error differently from softmax's raw |q - k|, so they diverge");
185+
print("in measurable ways.");
186+
print("== End ==");

0 commit comments

Comments
 (0)