|
| 1 | +# OMC Substrate Primitives (v1.8.x) |
| 2 | + |
| 3 | +This document describes the substrate capabilities promoted into the OMC core language in the |
| 4 | +v1.8.x series — content-addressing, an addressable heap, content-similarity, verify-gated |
| 5 | +self-modification, correct-by-construction synthesis, and HBit dual-band computation. Every claim |
| 6 | +here is backed by a test or a runnable demo; honest limits are stated alongside. |
| 7 | + |
| 8 | +The one idea underneath all of it: **in OMC a value's identity can be its content, not its |
| 9 | +location.** `address = f(content)`, in an address space that is *uniform* (equal-area, χ²≈9 on |
| 10 | +uniform points) and has *locality* (similar content lands near). That inversion is what makes |
| 11 | +memoization a property of identity, equality O(1), programs navigable, and self-modification safe. |
| 12 | + |
| 13 | +--- |
| 14 | + |
| 15 | +## 1. Content-addressing — `haddr` |
| 16 | + |
| 17 | +Uniform dodecahedral address of any string/value. The 12 face normals are the icosahedral |
| 18 | +vertices `(0, ±1, ±φ)` and cyclic permutations — equal solid angles, so a decorrelated hash lands |
| 19 | +on each face with equal probability. |
| 20 | + |
| 21 | +``` |
| 22 | +haddr(text) -> {face: 0..11, sub_face: 0..2, zeck: [Fibonacci values]} |
| 23 | +haddr_face(text) -> int 0..11 |
| 24 | +haddr_distance(a, b) -> float (a, b are address dicts or strings) |
| 25 | +``` |
| 26 | + |
| 27 | +**Verified:** face χ² = 9.16 on uniform sphere points, 4.90 on 20k hashed strings (uniform |
| 28 | +expectation ≈ 11; the old sin/cos fingerprint scored ≈216). `haddr` is for **exact keys / uniform |
| 29 | +buckets** — *not* similarity (see §3). |
| 30 | + |
| 31 | +--- |
| 32 | + |
| 33 | +## 2. The addressable heap + `@memo` |
| 34 | + |
| 35 | +A value's content hash is its address. Compute something once; find it by content forever. |
| 36 | + |
| 37 | +``` |
| 38 | +value_addr(v) -> dodecahedral address of any value (structural) |
| 39 | +value_hash(v) -> content key (string) |
| 40 | +same_value(a,b) -> bool O(1) semantic equality (structural, provenance-independent) |
| 41 | +cas_put(v) -> key store in the content-addressed heap (persists to ~/.omc/cas) |
| 42 | +cas_get(key) -> value retrieve (memory, then disk) |
| 43 | +cas_has(key) -> bool |
| 44 | +``` |
| 45 | + |
| 46 | +`@memo` is transparent, content-addressed, **persistent across runs** memoization. The cache key |
| 47 | +includes a hash of the function body, so editing the function invalidates stale results. |
| 48 | + |
| 49 | +```omc |
| 50 | +@memo |
| 51 | +fn fib(n) { if n < 2 { return n; } return fib(n-1) + fib(n-2); } |
| 52 | +print(fib(90)); // 2880067194370816120 — instant; naive recursion is ~2.9e18 calls |
| 53 | +``` |
| 54 | + |
| 55 | +**Honest limits:** `@memo`/`@dualband` require purity (the body must not call I/O, random, time, |
| 56 | +etc.) — impure functions are refused at definition. Persistence is a plain on-disk pool under |
| 57 | +`~/.omc/cas`; share it via your own sync if you want cross-machine. |
| 58 | + |
| 59 | +--- |
| 60 | + |
| 61 | +## 3. Content-similarity — `locality_fp` |
| 62 | + |
| 63 | +`haddr` is uniform but has **no** content locality (a one-character change scrambles the address). |
| 64 | +For "find the most similar X", use the locality fingerprint: a normalized byte histogram, so |
| 65 | +similar content → similar vector. |
| 66 | + |
| 67 | +``` |
| 68 | +locality_fp(text, [bigram]) -> float[] (unigram 256-dim; bigram=1 → 4096-dim) |
| 69 | +locality_sim(a, b, [bigram]) -> float in [0,1] |
| 70 | +locality_nearest(query, candidates) -> index of the most similar candidate |
| 71 | +nearest_fn(need) -> name of the closest-by-locality defined function |
| 72 | +call_nearest(need, args) -> dispatch to that function and call it |
| 73 | +``` |
| 74 | + |
| 75 | +**Verified:** on a corrupted-retrieval task, recall@1 = 0.99 (locality) vs 0.02 (φ/haddr). |
| 76 | +**Two fingerprints, two jobs:** `haddr` for keys, `locality_fp` for similarity. |
| 77 | + |
| 78 | +**Honest limit:** locality matches *character distribution*, so it is typo/variant-tolerant |
| 79 | +(`"quicksrt"` → `quicksort`) but is **not** semantic NL→code (`"greatest common divisor"` will not |
| 80 | +find `gcd`) — that needs a learned encoder. |
| 81 | + |
| 82 | +--- |
| 83 | + |
| 84 | +## 4. Verify-gated self-modification |
| 85 | + |
| 86 | +The interpreter is the gate. A candidate is installed and tested in a sandbox; it is kept only if |
| 87 | +it passes, otherwise rolled back. Nothing that fails its spec is ever accepted. |
| 88 | + |
| 89 | +``` |
| 90 | +fn_swap_verified(name, new_source, test_source) -> {accepted, error, result} |
| 91 | +fns_on_face(face) -> functions bucketed by name-address |
| 92 | +``` |
| 93 | + |
| 94 | +```omc |
| 95 | +fn slow(n) { return 0-1; } // a stub to improve |
| 96 | +h good = "fn slow(n) { return n*n; }"; |
| 97 | +h test = "slow(5) == 25"; |
| 98 | +print(fn_swap_verified("slow", good, test)); // {accepted: true, ...} |
| 99 | +``` |
| 100 | + |
| 101 | +--- |
| 102 | + |
| 103 | +## 5. Correct-by-construction synthesis |
| 104 | + |
| 105 | +A generator that emits only grammar-legal structure, so **every program parses**, and tracks |
| 106 | +declared variables + guards division + bounds loops, so (almost) every program runs. |
| 107 | + |
| 108 | +``` |
| 109 | +gen_omc([seed]) -> a valid-by-construction OMC program string |
| 110 | +gen_at(address_or_text) -> same address/need → the same valid program (deterministic) |
| 111 | +``` |
| 112 | + |
| 113 | +**Verified:** parse-rate 1.000, run-rate 1.000 over 300 seeds, checked by the real |
| 114 | +parser+interpreter. Pair with `code_parse_check` / `eval_omc` / `fn_swap_verified` for a |
| 115 | +generate → verify → accept loop. Standard LMs cannot guarantee parse-rate 1.0. |
| 116 | + |
| 117 | +**Honest limit:** the generator covers the executable core (functions, declarations, assignment, |
| 118 | +if/else, while, for, return, print, arithmetic, calls). It does not emit every construct (e.g. |
| 119 | +try/match/class) yet. |
| 120 | + |
| 121 | +--- |
| 122 | + |
| 123 | +## 6. HBit dual-band computation |
| 124 | + |
| 125 | +Two bands run together: **α** (the exact value) and **β** (its harmonic *shadow* — the |
| 126 | +"what if we'd stayed on the Fibonacci attractor lattice" companion). α is always the exact answer; |
| 127 | +β only records how far a computation has drifted from the lattice. The drift is the *gate*: trust |
| 128 | +a fast/addressed path while in tune, fall back to exact when dissonant. |
| 129 | + |
| 130 | +### Per-value (pervasive) |
| 131 | +Ordinary values are single-band and behave exactly as before. `phi_shadow` attaches β; it then |
| 132 | +rides through arithmetic. |
| 133 | + |
| 134 | +``` |
| 135 | +phi_shadow(v) -> v with β = nearest Fibonacci attractor of α |
| 136 | +bands(v) -> [α, β] |
| 137 | +harmony(v) -> 0..1000 (1000 = in tune; reads the carried bands) |
| 138 | +value_divergence(v) -> 0..1000 (0 = on the lattice, high = dissonant) |
| 139 | +hbit_harmony(a, b) -> 0..1000 two-band resonance of explicit a, b |
| 140 | +hbit_divergence(a,b) -> 0..1000 the gate value (0 = in tune) |
| 141 | +``` |
| 142 | + |
| 143 | +```omc |
| 144 | +h s = phi_shadow(10); // bands [10, 8] |
| 145 | +h t = (s + 1) * 3; // α: 33 ; β: (8+1)*3 = 27 |
| 146 | +print(bands(t)); // [33, 27] |
| 147 | +print(value_divergence(t)); // drift of 33 vs 27 |
| 148 | +``` |
| 149 | + |
| 150 | +### Per-function (opt-in) |
| 151 | +```omc |
| 152 | +@dualband |
| 153 | +fn sq(n) { return n*n; } |
| 154 | +print(sq(8)); // 64 (exact α — always) |
| 155 | +print(band_divergence("sq")); // 0 on-lattice, high when dissonant |
| 156 | +print(band_route("sq")); // fast-substrate / cached-exact / linear |
| 157 | +``` |
| 158 | + |
| 159 | +`@dualband` also takes the exact-memo fast path (the A→Z skip) when an exact result is cached. |
| 160 | + |
| 161 | +**Honest limits:** today the dual band is a *coherence monitor and exact-skip router* — α is always |
| 162 | +computed as ground truth, so a strict speedup comes from the exact-memo hit, not yet from skipping |
| 163 | +α on the strength of the gate. The snap-to-Fibonacci gate measures **lattice-coherence**, which is |
| 164 | +the right signal for "is this on the harmonic lattice" but (measured) **not** a predictor of |
| 165 | +interpolation safety on arbitrary functions. Approximate skipping is viable only on *smooth* |
| 166 | +domains (near inputs → near outputs); on discrete/chaotic functions it is not (this is a measured |
| 167 | +result, not a hope). |
| 168 | + |
| 169 | +--- |
| 170 | + |
| 171 | +## How it scales (and why it's CPU, not GPU) |
| 172 | + |
| 173 | +A dense transformer gains capability by adding parameters → more FLOPs/query → GPU. The substrate |
| 174 | +gains capability by adding **addressed content** + composition + a **verify** step — all CPU. |
| 175 | + |
| 176 | +Measured (1896-function corpus, real interpreter as oracle): |
| 177 | +- correctness rises with coverage `0.04 → 1.00`, |
| 178 | +- per-query exact-key retrieval is flat `0.059µs → 0.060µs` across 100× more content, |
| 179 | +- the verify step is constant (one interpreter run, independent of store size), |
| 180 | +- the O(N) similarity scan is what addressing (O(1)) removes. |
| 181 | + |
| 182 | +So capability scales at flat per-query CPU cost. The "ceiling" is coverage + composition, both |
| 183 | +CPU-scalable. (Scope: verified code synthesis over a corpus; generalizing beyond stored content is |
| 184 | +bounded by generator quality — but that too is CPU.) |
| 185 | + |
| 186 | +--- |
| 187 | + |
| 188 | +## Reproduce |
| 189 | + |
| 190 | +``` |
| 191 | +cargo test -p omnimcode-core --lib # 172 tests incl. address/cas/locality/synth |
| 192 | +cargo build -p omnimcode-cli --release |
| 193 | +./target/release/omnimcode-standalone experiments/transformerless_lm/valueband_demo.omc |
| 194 | +``` |
| 195 | + |
| 196 | +Full evidence ledger: `experiments/transformerless_lm/AUTONOMOUS_LOG.md` and |
| 197 | +`experiments/transformerless_lm/SUBSTRATE_INTEGRATION_ROADMAP.md`. |
0 commit comments