Skip to content

Latest commit

 

History

History
173 lines (134 loc) · 7.17 KB

File metadata and controls

173 lines (134 loc) · 7.17 KB

tape_abs + tape_phi_log: standard vs substrate-native primitives

Headline

Two new tape autograd primitives. One boring, one substrate-native invention. They are mathematically equivalent on the Q6 attention modulation path, and the A/B benchmark confirms the substrate-native fusion introduces no training-time divergence — composed and fused agree to ~1e-7 after AdamW training.

That means the substrate-native primitive is a free abstraction: it matches the standard composed math exactly, runs as one tape node instead of four, and exposes the substrate basis (π·ln φ) at the AST level rather than hiding it in a scalar constant.

What was added

tape_abs(x) — boring PyTorch parity

Element-wise |x|, with subgradient sign(x) at x ≠ 0, 0 at x = 0. Filled an obvious hole — the autograd tape had tape_log, tape_exp, tape_sin, tape_cos, tape_relu, tape_sigmoid, tape_tanh, but no absolute value. Q6 modulation needs |q·scale|, which requires this.

tape_phi_log(x, scale=10.0) — substrate-native

One fused op that computes ln(|x · scale| + 1) / (π · ln φ) — the exact log-distance formula from the Q6 PyTorch finding, but expressed as a single tape node with the substrate basis (π·ln φ) baked into the backward derivation.

Forward:

y = ln(|x · scale| + 1) / (π · ln φ)

Backward:

dy/dx = scale · sign(x) / ((|x · scale| + 1) · π · ln φ)

Properties the boring tape_abs + tape_log composition lacks:

  • Defined at zero: tape_log(0) returns -∞; tape_phi_log(0) = 0 cleanly.
  • One tape node instead of four (tape_abstape_mul_scalartape_logtape_div_scalar): less allocation, simpler backward graph.
  • Substrate basis visible: π·ln φ appears in the op's name/derivation, not as a magic constant.

The Q6 A/B in pure-OMC Prometheus

The Q6 attention modulation can be written either way:

Composed (boring PyTorch-parity path):

ten = tape_const(10.0)
qs = tape_mul(q, ten)
qs_abs = tape_abs(qs)
qs_abs1 = tape_add(qs_abs, tape_const(1.0))
ln_qs = tape_log(qs_abs1)
log_d = tape_div(ln_qs, tape_const(π · ln φ))      # 1.5119192...

Fused (substrate-native path):

log_d = tape_phi_log(q, 10.0)

Both yield the same log_d (verified to 1e-9 forward, 1e-9 backward in unit tests). Both then flow through modulation = tape_exp(-γ · log_d) and q_mod = q * modulation.

Result: composed and fused agree under AdamW training

examples/prometheus_q6_ab.omc, single-block substrate-K transformer, seq_len=6, d_model=8, ff_dim=16, 80 steps, AdamW lr=0.01:

seed off (no Q6) composed Q6 fused Q6 composed − fused
42 2.5688 2.5580 2.5580 2.3 × 10⁻¹¹
7 2.5688 2.5713 2.5713 8.6 × 10⁻⁷
123 2.5698 2.5297 2.5297 5.2 × 10⁻⁷
mean 2.5692 2.5530 2.5530 1.2 × 10⁻⁷

The composed-fused divergence sits at the floor of float64 accumulation noise after ~80 forward+backward passes through AdamW. The two paths produce parameter trajectories that agree to single-precision rounding. The fused abstraction does not pay any precision cost — it computes the same answer as the four-op composition does.

Q6 vs off baseline (directional Q6 evidence in OMC)

mean val Δ vs off seeds Q6 wins
off (no Q6) 2.5692
composed Q6 2.5530 −0.0162 (−0.63%) 2/3
fused Q6 2.5530 −0.0162 (−0.63%) 2/3

Q6 wins 2/3 seeds at this tiny scale (45-char corpus, d_model=8, single head, 80 steps). PyTorch finding was -12.15% 6/6 seeds at TinyShakespeare L1-MH — a much stronger test. The OMC small-scale signal is directionally consistent: Q6 helps, both paths agree it helps by the same amount.

This is the first cross-runtime validation of Q6 in OMC — the PyTorch finding now has an OMC-side replication.

Pre-existing tape_div / tape_mul backward bug, fixed in the same chapter

While wiring Q6, the tape_div backward was found to panic with col-broadcast denominators (bv.cols == 1). The substrate-modulated softmax path (prom_substrate_softmax with smod_alpha > 0) ends in tape_div(attn_unnorm[N, N], row_sums[N, 1]), and the backward was indexing bv.at(i, j) for j up to N-1 in a [N, 1] matrix — out of bounds.

Fix: both Mul and Div backwards now respect broadcast shapes on both operands. They iterate the OUTPUT shape (dy's shape), reduce indices against the operand's actual extent, and accumulate gradient sums across the broadcast axes.

This bug had latently affected any training that combined S-MOD (smod_alpha > 0) with substrate-K — the path was never being exercised to completion in OMC before because it would panic during backward. Now it works, which means L1-MH + S-MOD α=1.0 can be cross-validated in pure-OMC Prometheus, not just PyTorch.

Tests

  • examples/tests/test_tape_abs_phi_log.omc — 12 tests covering forward, backward, edge cases (0, negative), and composed-vs-fused equivalence at the primitive level
  • examples/tests/test_q6_modulate.omc — 4 tests covering the prom_q6_modulate dispatch with off/composed/fused modes, including forward and backward equivalence of composed and fused

Full suite: 1103/1103 pass after these additions and the broadcast-backward fix.

What this opens up

The fused tape_phi_log is the precedent-setting substrate-native primitive. It shows the path for replacing other ad-hoc tape compositions with substrate-native fused ops:

  • tape_substrate_resample (currently does tape_value snapshot → attractor-distance modulator → tape_consttape_mul) could become one fused op with substrate-aware backward
  • tape_attractor_snap — forward snaps to nearest Fibonacci attractor, backward is the substrate-aware gradient (full at attractors, dampened off-attractor)
  • tape_phi_log_v2 — same forward as tape_phi_log but with attractor-modulated backward (gradient amplified at off-attractor inputs to drive drift toward attractors)

Each one is its own A/B against the boring reference, with the same protocol: verify composed ≡ fused at the unit level first, then measure end-to-end training divergence. If the substrate-aware backward variant beats the mathematically-equivalent baseline, that is the proof that the substrate is the architecture, not a postprocessing step.

Files

  • omnimcode-core/src/interpreter.rs — added TapeOp::Abs, TapeOp::PhiLog(usize, f64), forward and backward; fixed broadcast handling in Mul/Div backwards
  • examples/lib/prometheus.omc — added prom_q6_modulate(q, scale, gamma, mode) with three modes; wired q6_mode field into prom_attention_substrate_k_*
  • examples/prometheus_q6_ab.omc — the OMC-side A/B harness
  • examples/tests/test_tape_abs_phi_log.omc — primitive unit tests
  • examples/tests/test_q6_modulate.omc — modulation dispatch tests

Reproduction

cargo build --release -p omnimcode-cli
./target/release/omnimcode-standalone --test examples/tests/test_tape_abs_phi_log.omc
./target/release/omnimcode-standalone --test examples/tests/test_q6_modulate.omc
./target/release/omnimcode-standalone examples/prometheus_q6_ab.omc