Skip to content

Commit 559965c

Browse files
The Architectclaude
andcommitted
docs: refresh ROADMAP.md for the v1.8 substrate-into-core reality
Was stale at "v0.6-fibtier-memory". Now reflects shipped v1.8.x primitives (table), the CPU scaling result, and an honest forward path split into: grounded / no-new-compute, needs-compute (model/GPU/training), and the dual-band horizon (value-granular skip, kernel/microcode HBit). Method section restated (pre-registered A/Bs, substrate on identity not float-scoring, validity by construction + correctness by execution). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
1 parent 1561d35 commit 559965c

1 file changed

Lines changed: 57 additions & 73 deletions

File tree

ROADMAP.md

Lines changed: 57 additions & 73 deletions
Original file line numberDiff line numberDiff line change
@@ -1,87 +1,71 @@
11
# OMC Roadmap
22

3-
Current chapter: **v0.6-fibtier-memory** (shipped 2026-05-17).
4-
Next chapter: GPU Prometheus scaffold (in flight). The six-chapter symbolic-context arc (v0.3 → v0.6) has landed.
3+
Current release: **v1.8.5** (2026-05-30). The **substrate-into-core** arc has landed: the proven
4+
φ-substrate discoveries are now first-class language primitives (content-addressing, an addressable
5+
heap, persistent `@memo`, locality similarity, verify-gated self-modification, correct-by-construction
6+
synthesis, and HBit dual-band real at the Value level). See [SUBSTRATE.md](SUBSTRATE.md) for the full
7+
reference with verified numbers, and `experiments/transformerless_lm/SUBSTRATE_INTEGRATION_ROADMAP.md`
8+
for the phase plan + evidence ledger.
59

6-
See [CHANGELOG.md](CHANGELOG.md) and [GitHub Releases](https://github.com/RandomCoder-lab/OMC/releases) for the chapter-by-chapter history of how OMC got here. This file describes what's on the path going forward.
10+
See [CHANGELOG.md](CHANGELOG.md) and [GitHub Releases](https://github.com/RandomCoder-lab/OMC/releases)
11+
for how OMC got here. This file describes what's on the path forward.
712

813
---
914

10-
## Post-v0.5 candidates (none committed yet)
15+
## Shipped (v1.8.x) — substrate into the core language
1116

12-
### v0.6 candidate A — fibtier-bounded memory
17+
| Area | Primitives | Verified |
18+
|---|---|---|
19+
| Content-addressing | `haddr`, `haddr_face`, `haddr_distance` | face χ² ≈ 9 (uniform) |
20+
| Addressable heap | `cas_put/get/has`, `value_addr`, `value_hash`, `same_value` | dedup + O(1) semantic equality |
21+
| Memoization | `@memo` (transparent, persistent across runs, body-aware) | `fib(90)` instant; cross-process |
22+
| Similarity | `locality_fp/sim/nearest`, `nearest_fn`, `call_nearest` | recall 0.99 vs φ 0.02 |
23+
| Self-modification | `fn_swap_verified`, `fns_on_face` | verify-gated accept/rollback |
24+
| Synthesis | `gen_omc`, `gen_at` | parse/run 1.000 over 300 seeds |
25+
| Dual-band (Value-level) | `phi_shadow`, `bands`, `harmony`, `value_divergence`, `@dualband`, `hbit_*`, `band_*` | β rides through arithmetic; α always exact |
1326

14-
v0.5 ships substrate-keyed memory but the store grows unbounded. Long-running agents need pruning. Wire fibtier's tier-bounded eviction into `MemoryStore`:
15-
16-
- Each namespace gets a tier-state file alongside the index
17-
- Stores cascade into higher tiers via the fibtier fold mechanism
18-
- Old entries get summarized/aggregated as they fold upward
19-
- Bounded total entries across all tiers (default ~4180 = Fib(18))
20-
21-
### v0.6 candidate B — Prometheus rerank pass
22-
23-
The substrate-ranked predict candidates can be reranked by a learned probability overlay. Train a small Prometheus model on the corpus, score top-k candidates' next-token probabilities, blend with the substrate distance.
24-
25-
### v0.6 candidate C — substrate-attention follow-ups
26-
27-
- Substrate-modulated Q projection. Q hasn't been swapped yet; the V resample recipe (post-projection modulation) may generalize.
28-
- Substrate FF: dampen off-attractor activations in the feed-forward residual.
29-
- Substrate LayerNorm: substrate-distance-weighted variance computation.
30-
- Larger-scale validation: every substrate-attention claim was made at TinyShakespeare scale (1.1MB). Need to verify the stack holds at 10-100MB corpora.
31-
32-
### Other deferred items
33-
34-
- **Stateful corpus API**`omc_corpus_build` returns a handle, `omc_predict_from(handle, prefix, top_k)` reuses it. Saves the corpus-rebuild cost on repeated queries.
35-
- **Streaming queries** — incremental updates as the prefix grows token-by-token.
36-
- **Cross-corpus weighted blending** — give different paths different priority in the ranking.
37-
- **Conversation-aware predict**`omc_predict(..., context_hash=H)` where H references prior reasoning state, biasing the ranking by which fns the agent has already touched.
27+
Scaling result (NEXT-7, on CPU): capability rises with addressed content while per-query cost stays
28+
flat (exact-key O(1) + constant verify). The substrate's scaling axis is content + verify (CPU), not
29+
parameters (GPU). 267 tests pass (172 lib + 95 integration, incl. `tests/substrate_v18.rs`).
3830

3931
---
4032

41-
## v0.7+ candidates
42-
43-
### Substrate-attention follow-ups
44-
45-
- Substrate-modulated Q projection. Q hasn't been swapped yet; the V resample recipe (post-projection modulation) may generalize.
46-
- Substrate FF: dampen off-attractor activations in the feed-forward residual.
47-
- Substrate LayerNorm: substrate-distance-weighted variance computation.
48-
- Larger-scale validation: every substrate-attention claim was made at TinyShakespeare scale (1.1MB). Need to verify the stack holds at 10-100MB corpora.
49-
50-
### Beyond (rough)
51-
52-
### Transformerless LLM
53-
54-
The substrate-attention components stack to −8.94% inside one block. The path forward is a top-to-bottom harmonic-only architecture trained competitively. Open: how to handle non-integer-coherent quantities at this scale (the substrate metric only applies to integer-valued quantities, per the rule derived from the HBit-gate falsification).
55-
56-
### JIT path expansion
57-
58-
- AVX-512 widening — blocked on array-processing OMC fns to fill the wider lanes.
59-
- JIT for float-returning harmonic primitives — `returns_float` dispatch flag mirroring `returns_array_int`.
60-
- JIT for dict ops — currently pure tree-walk for string-keyed data; the L1 array-of-hashed-int rewrite avoided this for hot paths.
61-
62-
### Tooling polish
63-
64-
- Improved formatter (`--fmt`) — preserve comments, configurable line width.
65-
- LSP improvements: completion (uses the v0.3 predict engine), hover with substrate signature.
66-
- VS Code extension: snippet library, inline hint UI for the heal pass.
33+
## Next — grounded, no new compute required
34+
35+
- **Synthesis coverage (Phase 4.2):** extend `gen_omc` to the remaining run-safe constructs
36+
(`for` over expressions, `try`/`match`, nested blocks). The valid-by-construction guarantee is in;
37+
this widens what it can emit.
38+
- **In-core grammar derivation (Phase 4.1):** derive the generator's operator/keyword/construct set
39+
from the live AST/parser at build time (the Python `derive_grammar.py` already does this at the
40+
toolchain level; bring it in-core so the generator can't drift from the language).
41+
- **Assistant unification (Phase 0.1):** make the verified `SubstrateLM` (always-valid, correct-with-
42+
intent) the assistant's canonical generator; demote the FibRec neural net to an explicit cold-start prior.
43+
44+
## Next — needs compute (model / GPU / training)
45+
46+
- **Inference-time weight compression at scale (Phase 5.1):** the address-bucketed / Zeckendorf bet
47+
("big model in small memory"). Small-scale is settled (address-bucketed sharing is real ~4× free /
48+
14× @ 85%; the φ-vs-modulo advantage is null). Needs a real model to test at scale.
49+
- **Track-B science (Phase 5.4):** compositionality-coherence field and weight-substrate views —
50+
derived, not yet trained.
51+
- **Substrate-generator generalization:** raising held-out (compose-beyond-coverage) correctness —
52+
CPU-scalable in principle (grammar-gen + verify), bounded by generator quality.
53+
54+
## Horizon — the dual-band dream
55+
56+
- **Value-granular skip:** today the dual band is a coherence monitor + exact-memo router (α is always
57+
computed as ground truth). A strict speedup *from the gate* is safe only on smooth domains
58+
(measured: interpolation works for smooth functions, not discrete ones) — pursue it opt-in there.
59+
- **Kernel / microcode HBit:** the long-horizon goal — α/β as a hardware-level dual band, short-cutting
60+
computation through resonance/dissonance at the CPU level. The JIT already packs both bands into
61+
SSE2 `<2 x i64>` and elides branches by harmony; pushing that into a true microcoded skip is the
62+
frontier this whole substrate program is paving toward.
6763

6864
---
6965

70-
## Done (linked to chapter releases)
71-
72-
| Chapter | Key shipped items |
73-
|---|---|
74-
| [v0.5-substrate-memory](https://github.com/RandomCoder-lab/OMC/releases/tag/v0.5-substrate-memory) | `omc_memory_store/recall/list/stats` + filesystem persistence + **10.61× LLM context-budget reduction** measured on a 20-turn agent task |
75-
| [v0.4-substrate-context](https://github.com/RandomCoder-lab/OMC/releases/tag/v0.4-substrate-context) | `omc_compress_context` / `omc_decompress` tools + `format=codec` thumbnails + directory ingest + measured 1.85×-2.81× LLM context-budget reduction |
76-
| [v0.3.1-symbolic-compression](https://github.com/RandomCoder-lab/OMC/releases/tag/v0.3.1-symbolic-compression) | `omc_predict` gains `format=hash`/`signature`/`full` (3.8× compression default) + `omc_fetch_by_hash` for on-demand recovery |
77-
| [v0.3-symbolic-prediction](https://github.com/RandomCoder-lab/OMC/releases/tag/v0.3-symbolic-prediction) | `omc_predict_files(paths, prefix, top_k)` returns ranked provenance-tracked continuations from a content-addressed corpus |
78-
| [v0.2-ergonomics](https://github.com/RandomCoder-lab/OMC/releases/tag/v0.2-ergonomics) | `+=` / `-=` / `*=` / `/=` / `%=`, `len`/`range`/`getenv`/`to_hex`/`parse_int`, negative array indexing, did-you-mean, traced errors, 11 heal classes |
79-
| [v0.1-substrate-attention](https://github.com/RandomCoder-lab/OMC/releases/tag/v0.1-substrate-attention) | Substrate-K + S-MOD softmax + substrate-V resample → −8.94% val on TinyShakespeare |
80-
| [v0.0.6-prometheus](https://github.com/RandomCoder-lab/OMC/releases/tag/v0.0.6-prometheus) | Tape autograd, AdamW, Embedding, LayerNorm, multi-block transformer, first substrate-K wins |
81-
| [v0.0.5-codec-kernel-protocol](https://github.com/RandomCoder-lab/OMC/releases/tag/v0.0.5-codec-kernel-protocol) | Substrate codec, `omc-kernel`, `omc-grep`, OMC-PROTOCOL v1, substrate-aware tokenizer |
82-
| [v0.0.4-jit-and-dual-band](https://github.com/RandomCoder-lab/OMC/releases/tag/v0.0.4-jit-and-dual-band) | LLVM JIT, dual-band SSE2 codegen, harmony-gated branch elision, array support |
83-
| [v0.0.3-substrate-and-stdlib](https://github.com/RandomCoder-lab/OMC/releases/tag/v0.0.3-substrate-and-stdlib) | Heal pass, substrate-routed search family, stdlib expansion, `--check` / `--fmt` |
84-
| [v0.0.2-language-core](https://github.com/RandomCoder-lab/OMC/releases/tag/v0.0.2-language-core) | Parser, two-engine interpreter, HInt, bytecode VM, self-hosting fixpoint |
85-
| V0.0.1 | Genesis: circuit evolution engine, FFI, Unity/Unreal bindings |
86-
87-
`ROADMAP.json` is preserved for archaeology — it captured the state through v0.0.4. This file supersedes it as the canonical forward plan.
66+
## Method (unchanged)
67+
68+
Every architectural change is a **pre-registered A/B**, reported even when it loses. Substrate goes
69+
where it provably helps — identity / addressing / positions, attenuable — and stays off the
70+
learned-float scoring path, where it was falsified. Validity is guaranteed by construction; correctness
71+
is verified by execution. Honest limits travel with every claim.

0 commit comments

Comments
 (0)