|
1 | 1 | # OMC Roadmap |
2 | 2 |
|
3 | | -Current chapter: **v0.6-fibtier-memory** (shipped 2026-05-17). |
4 | | -Next chapter: GPU Prometheus scaffold (in flight). The six-chapter symbolic-context arc (v0.3 → v0.6) has landed. |
| 3 | +Current release: **v1.8.5** (2026-05-30). The **substrate-into-core** arc has landed: the proven |
| 4 | +φ-substrate discoveries are now first-class language primitives (content-addressing, an addressable |
| 5 | +heap, persistent `@memo`, locality similarity, verify-gated self-modification, correct-by-construction |
| 6 | +synthesis, and HBit dual-band real at the Value level). See [SUBSTRATE.md](SUBSTRATE.md) for the full |
| 7 | +reference with verified numbers, and `experiments/transformerless_lm/SUBSTRATE_INTEGRATION_ROADMAP.md` |
| 8 | +for the phase plan + evidence ledger. |
5 | 9 |
|
6 | | -See [CHANGELOG.md](CHANGELOG.md) and [GitHub Releases](https://github.com/RandomCoder-lab/OMC/releases) for the chapter-by-chapter history of how OMC got here. This file describes what's on the path going forward. |
| 10 | +See [CHANGELOG.md](CHANGELOG.md) and [GitHub Releases](https://github.com/RandomCoder-lab/OMC/releases) |
| 11 | +for how OMC got here. This file describes what's on the path forward. |
7 | 12 |
|
8 | 13 | --- |
9 | 14 |
|
10 | | -## Post-v0.5 candidates (none committed yet) |
| 15 | +## Shipped (v1.8.x) — substrate into the core language |
11 | 16 |
|
12 | | -### v0.6 candidate A — fibtier-bounded memory |
| 17 | +| Area | Primitives | Verified | |
| 18 | +|---|---|---| |
| 19 | +| Content-addressing | `haddr`, `haddr_face`, `haddr_distance` | face χ² ≈ 9 (uniform) | |
| 20 | +| Addressable heap | `cas_put/get/has`, `value_addr`, `value_hash`, `same_value` | dedup + O(1) semantic equality | |
| 21 | +| Memoization | `@memo` (transparent, persistent across runs, body-aware) | `fib(90)` instant; cross-process | |
| 22 | +| Similarity | `locality_fp/sim/nearest`, `nearest_fn`, `call_nearest` | recall 0.99 vs φ 0.02 | |
| 23 | +| Self-modification | `fn_swap_verified`, `fns_on_face` | verify-gated accept/rollback | |
| 24 | +| Synthesis | `gen_omc`, `gen_at` | parse/run 1.000 over 300 seeds | |
| 25 | +| Dual-band (Value-level) | `phi_shadow`, `bands`, `harmony`, `value_divergence`, `@dualband`, `hbit_*`, `band_*` | β rides through arithmetic; α always exact | |
13 | 26 |
|
14 | | -v0.5 ships substrate-keyed memory but the store grows unbounded. Long-running agents need pruning. Wire fibtier's tier-bounded eviction into `MemoryStore`: |
15 | | - |
16 | | -- Each namespace gets a tier-state file alongside the index |
17 | | -- Stores cascade into higher tiers via the fibtier fold mechanism |
18 | | -- Old entries get summarized/aggregated as they fold upward |
19 | | -- Bounded total entries across all tiers (default ~4180 = Fib(18)) |
20 | | - |
21 | | -### v0.6 candidate B — Prometheus rerank pass |
22 | | - |
23 | | -The substrate-ranked predict candidates can be reranked by a learned probability overlay. Train a small Prometheus model on the corpus, score top-k candidates' next-token probabilities, blend with the substrate distance. |
24 | | - |
25 | | -### v0.6 candidate C — substrate-attention follow-ups |
26 | | - |
27 | | -- Substrate-modulated Q projection. Q hasn't been swapped yet; the V resample recipe (post-projection modulation) may generalize. |
28 | | -- Substrate FF: dampen off-attractor activations in the feed-forward residual. |
29 | | -- Substrate LayerNorm: substrate-distance-weighted variance computation. |
30 | | -- Larger-scale validation: every substrate-attention claim was made at TinyShakespeare scale (1.1MB). Need to verify the stack holds at 10-100MB corpora. |
31 | | - |
32 | | -### Other deferred items |
33 | | - |
34 | | -- **Stateful corpus API** — `omc_corpus_build` returns a handle, `omc_predict_from(handle, prefix, top_k)` reuses it. Saves the corpus-rebuild cost on repeated queries. |
35 | | -- **Streaming queries** — incremental updates as the prefix grows token-by-token. |
36 | | -- **Cross-corpus weighted blending** — give different paths different priority in the ranking. |
37 | | -- **Conversation-aware predict** — `omc_predict(..., context_hash=H)` where H references prior reasoning state, biasing the ranking by which fns the agent has already touched. |
| 27 | +Scaling result (NEXT-7, on CPU): capability rises with addressed content while per-query cost stays |
| 28 | +flat (exact-key O(1) + constant verify). The substrate's scaling axis is content + verify (CPU), not |
| 29 | +parameters (GPU). 267 tests pass (172 lib + 95 integration, incl. `tests/substrate_v18.rs`). |
38 | 30 |
|
39 | 31 | --- |
40 | 32 |
|
41 | | -## v0.7+ candidates |
42 | | - |
43 | | -### Substrate-attention follow-ups |
44 | | - |
45 | | -- Substrate-modulated Q projection. Q hasn't been swapped yet; the V resample recipe (post-projection modulation) may generalize. |
46 | | -- Substrate FF: dampen off-attractor activations in the feed-forward residual. |
47 | | -- Substrate LayerNorm: substrate-distance-weighted variance computation. |
48 | | -- Larger-scale validation: every substrate-attention claim was made at TinyShakespeare scale (1.1MB). Need to verify the stack holds at 10-100MB corpora. |
49 | | - |
50 | | -### Beyond (rough) |
51 | | - |
52 | | -### Transformerless LLM |
53 | | - |
54 | | -The substrate-attention components stack to −8.94% inside one block. The path forward is a top-to-bottom harmonic-only architecture trained competitively. Open: how to handle non-integer-coherent quantities at this scale (the substrate metric only applies to integer-valued quantities, per the rule derived from the HBit-gate falsification). |
55 | | - |
56 | | -### JIT path expansion |
57 | | - |
58 | | -- AVX-512 widening — blocked on array-processing OMC fns to fill the wider lanes. |
59 | | -- JIT for float-returning harmonic primitives — `returns_float` dispatch flag mirroring `returns_array_int`. |
60 | | -- JIT for dict ops — currently pure tree-walk for string-keyed data; the L1 array-of-hashed-int rewrite avoided this for hot paths. |
61 | | - |
62 | | -### Tooling polish |
63 | | - |
64 | | -- Improved formatter (`--fmt`) — preserve comments, configurable line width. |
65 | | -- LSP improvements: completion (uses the v0.3 predict engine), hover with substrate signature. |
66 | | -- VS Code extension: snippet library, inline hint UI for the heal pass. |
| 33 | +## Next — grounded, no new compute required |
| 34 | + |
| 35 | +- **Synthesis coverage (Phase 4.2):** extend `gen_omc` to the remaining run-safe constructs |
| 36 | + (`for` over expressions, `try`/`match`, nested blocks). The valid-by-construction guarantee is in; |
| 37 | + this widens what it can emit. |
| 38 | +- **In-core grammar derivation (Phase 4.1):** derive the generator's operator/keyword/construct set |
| 39 | + from the live AST/parser at build time (the Python `derive_grammar.py` already does this at the |
| 40 | + toolchain level; bring it in-core so the generator can't drift from the language). |
| 41 | +- **Assistant unification (Phase 0.1):** make the verified `SubstrateLM` (always-valid, correct-with- |
| 42 | + intent) the assistant's canonical generator; demote the FibRec neural net to an explicit cold-start prior. |
| 43 | + |
| 44 | +## Next — needs compute (model / GPU / training) |
| 45 | + |
| 46 | +- **Inference-time weight compression at scale (Phase 5.1):** the address-bucketed / Zeckendorf bet |
| 47 | + ("big model in small memory"). Small-scale is settled (address-bucketed sharing is real ~4× free / |
| 48 | + 14× @ 85%; the φ-vs-modulo advantage is null). Needs a real model to test at scale. |
| 49 | +- **Track-B science (Phase 5.4):** compositionality-coherence field and weight-substrate views — |
| 50 | + derived, not yet trained. |
| 51 | +- **Substrate-generator generalization:** raising held-out (compose-beyond-coverage) correctness — |
| 52 | + CPU-scalable in principle (grammar-gen + verify), bounded by generator quality. |
| 53 | + |
| 54 | +## Horizon — the dual-band dream |
| 55 | + |
| 56 | +- **Value-granular skip:** today the dual band is a coherence monitor + exact-memo router (α is always |
| 57 | + computed as ground truth). A strict speedup *from the gate* is safe only on smooth domains |
| 58 | + (measured: interpolation works for smooth functions, not discrete ones) — pursue it opt-in there. |
| 59 | +- **Kernel / microcode HBit:** the long-horizon goal — α/β as a hardware-level dual band, short-cutting |
| 60 | + computation through resonance/dissonance at the CPU level. The JIT already packs both bands into |
| 61 | + SSE2 `<2 x i64>` and elides branches by harmony; pushing that into a true microcoded skip is the |
| 62 | + frontier this whole substrate program is paving toward. |
67 | 63 |
|
68 | 64 | --- |
69 | 65 |
|
70 | | -## Done (linked to chapter releases) |
71 | | - |
72 | | -| Chapter | Key shipped items | |
73 | | -|---|---| |
74 | | -| [v0.5-substrate-memory](https://github.com/RandomCoder-lab/OMC/releases/tag/v0.5-substrate-memory) | `omc_memory_store/recall/list/stats` + filesystem persistence + **10.61× LLM context-budget reduction** measured on a 20-turn agent task | |
75 | | -| [v0.4-substrate-context](https://github.com/RandomCoder-lab/OMC/releases/tag/v0.4-substrate-context) | `omc_compress_context` / `omc_decompress` tools + `format=codec` thumbnails + directory ingest + measured 1.85×-2.81× LLM context-budget reduction | |
76 | | -| [v0.3.1-symbolic-compression](https://github.com/RandomCoder-lab/OMC/releases/tag/v0.3.1-symbolic-compression) | `omc_predict` gains `format=hash`/`signature`/`full` (3.8× compression default) + `omc_fetch_by_hash` for on-demand recovery | |
77 | | -| [v0.3-symbolic-prediction](https://github.com/RandomCoder-lab/OMC/releases/tag/v0.3-symbolic-prediction) | `omc_predict_files(paths, prefix, top_k)` returns ranked provenance-tracked continuations from a content-addressed corpus | |
78 | | -| [v0.2-ergonomics](https://github.com/RandomCoder-lab/OMC/releases/tag/v0.2-ergonomics) | `+=` / `-=` / `*=` / `/=` / `%=`, `len`/`range`/`getenv`/`to_hex`/`parse_int`, negative array indexing, did-you-mean, traced errors, 11 heal classes | |
79 | | -| [v0.1-substrate-attention](https://github.com/RandomCoder-lab/OMC/releases/tag/v0.1-substrate-attention) | Substrate-K + S-MOD softmax + substrate-V resample → −8.94% val on TinyShakespeare | |
80 | | -| [v0.0.6-prometheus](https://github.com/RandomCoder-lab/OMC/releases/tag/v0.0.6-prometheus) | Tape autograd, AdamW, Embedding, LayerNorm, multi-block transformer, first substrate-K wins | |
81 | | -| [v0.0.5-codec-kernel-protocol](https://github.com/RandomCoder-lab/OMC/releases/tag/v0.0.5-codec-kernel-protocol) | Substrate codec, `omc-kernel`, `omc-grep`, OMC-PROTOCOL v1, substrate-aware tokenizer | |
82 | | -| [v0.0.4-jit-and-dual-band](https://github.com/RandomCoder-lab/OMC/releases/tag/v0.0.4-jit-and-dual-band) | LLVM JIT, dual-band SSE2 codegen, harmony-gated branch elision, array support | |
83 | | -| [v0.0.3-substrate-and-stdlib](https://github.com/RandomCoder-lab/OMC/releases/tag/v0.0.3-substrate-and-stdlib) | Heal pass, substrate-routed search family, stdlib expansion, `--check` / `--fmt` | |
84 | | -| [v0.0.2-language-core](https://github.com/RandomCoder-lab/OMC/releases/tag/v0.0.2-language-core) | Parser, two-engine interpreter, HInt, bytecode VM, self-hosting fixpoint | |
85 | | -| V0.0.1 | Genesis: circuit evolution engine, FFI, Unity/Unreal bindings | |
86 | | - |
87 | | -`ROADMAP.json` is preserved for archaeology — it captured the state through v0.0.4. This file supersedes it as the canonical forward plan. |
| 66 | +## Method (unchanged) |
| 67 | + |
| 68 | +Every architectural change is a **pre-registered A/B**, reported even when it loses. Substrate goes |
| 69 | +where it provably helps — identity / addressing / positions, attenuable — and stays off the |
| 70 | +learned-float scoring path, where it was falsified. Validity is guaranteed by construction; correctness |
| 71 | +is verified by execution. Honest limits travel with every claim. |
0 commit comments