Releases: RandomCoder-lab/OMC
v0.0.2 — Language core: parser, VM, self-hosting
WHAT CHANGED
- Phase A-G: HFloat, phi.X modules, pragmas, types, HSingularity as
Value variant, stdlib + conformance tests, triple-quoted strings,
imports, real module resolution. - Phase H-M: bytecode VM with tree-walk parity, bitwise ops, optimizer
(constant folding + peephole), resonance caching, typed HIR. - Phase N: Phi-Field LLM kernel demo with OMNIweights.
- Phase O: ONN self-healing primitives (Fibonacci alignment auto-repair).
- Phase P-U: bytecode disassembler, VM inline cache, source positions,
criterion bench suite. - Phase V (V.1 → V.9b): self-hosting lexer → parser → codegen → SELF-
HOSTING FIXPOINT (OMC compiles its own compiler) → bytecode bootstrap
fixpoint → UTF-8 safety → gen2 == gen3 byte-identical.
WHY IT MATTERS
This chapter is the foundation: a language exists, with two execution
engines kept byte-identical, a self-hosting compiler that's reflexively
stable (gen2 == gen3), HInt as the substrate primitive carrying
φ-resonance at construction, and conformance tests locking the semantics.
NOW POSSIBLE
- Write OMC programs that exercise both engines and get identical
results. - The compiler can recompile itself indefinitely without drift.
- Everything that comes next builds on this — JIT, substrate
algorithms, ML framework — all assume the core is stable.
See CHANGELOG.md#v0.0.2-language-core for the chapter index.
v0.8 — Substrate-Q: 4th attention component, -16.7% cumulative
THE PATH
v0.1 shipped K+S-MOD+V stacked for -8.94%. Q was the obvious 4th.
First attempt (Q1 = same post-projection resample as V) lost on
3 seeds — substrate-V's recipe doesn't generalize.
Per the user's hint that "Possible outcomes may relate to
different integral pieces to phi_pi_fib", broader sweep over
Q3-Q6 with different substrate operations. Q6 (phi_pi log-distance
scaling) wins decisively.
6-SEED Q6 CONFIRMATION
mean: Q0 3.128 vs Q6 2.748 (-12.15%, 6/6 seeds beat baseline)
THE RECIPE
log_d = log(|q|+1) / (π · ln φ)
modulation = exp(-γ · log_d) with γ=0.5
q_full = (x @ W_q) * modulation
Smooth magnitude regularizer keyed on phi_pi_fib structure.
NOT a snap-to-attractor — that's V's recipe and breaks Q.
PRINCIPLE
- snap-to-attractor: helps quantities being AGGREGATED (V, K)
- log-distance scaling: helps quantities that STEER (Q)
CUMULATIVE STACK
L0 vanilla: 3.301
L1-MH + S-MOD α=1.0: 3.084
- V1 (v0.1): 3.006
- Q6 (v0.8): 2.748 (-16.7% vs L0)
NOW POSSIBLE
- 4 substrate-attention components stack at TinyShakespeare scale
- Different phi_pi_fib operations for different roles in attention
- Production attention block: K from CRT-Fib + S-MOD softmax +
V resample + Q log-distance scaling
NOT IN v0.8
- OMC-side cross-validation (needs tape_abs + tape_log)
- Larger-scale verification (TinyShakespeare 1.1MB only)
- γ tuning
See CHANGELOG.md#v0.8-substrate-q + SUBSTRATE_Q_WINS.md.
v0.7 — GPU scaffold: omnimcode-gpu + 4.04× on RX 580 via wgpu Vulkan
ARCHITECTURE
- omnimcode-gpu crate (new):
- ComputeBackend trait (one method: matmul)
- CpuBackend (always-available ground truth)
- WgpuBackend (feature wgpu) — Vulkan/Metal/DX12/OpenGL
- pick_backend() — feature + OMC_GPU_BACKEND env override
- Naive WGSL matmul (16x16 workgroup, no tiling)
- 11/11 tests pass, including wgpu parity on real GPU
MEASURED ON AMD RX 580 (RADV POLARIS10 / Vulkan)
64x64: 0.23x (overhead dominates)
128x128: 0.83x (~crossover)
256x256: 2.24x
512x512: 3.39x
1024x1024: 4.04x
WHY WGPU OVER ROCM
- Official ROCm dropped Polaris (gfx803) at version 4.0
- Unofficial Polaris ROCm builds are fragile (Ollama "gets fussy")
- wgpu via Vulkan works out of the box on RADV driver
- Trait ready for ROCm/CUDA/Metal plug-ins when on supported hw
NOW POSSIBLE
- Prometheus tape_matmul can route through this backend (v0.8 work)
- Cross-vendor GPU compute via one trait + feature flag
- The user's existing hardware actually gets GPU acceleration
without driver pain
WHAT'S NOT IN v0.7
- Prometheus integration (next chapter)
- GPU backward pass
- Tiled/shared-memory kernels
- f16/bf16
See CHANGELOG.md#v0.7-gpu-scaffold + omnimcode-gpu/README.md.
v0.6 — Fibtier-memory: bounded eviction with hash-recoverable evictions
WHAT CHANGED
- MemoryStore::max_entries_per_namespace: Option
- FIBTIER_DEFAULT_SIZES mirrors fibtier.omc
- FIBTIER_DEFAULT_MAX_ENTRIES = 232 (sum of first 10 tier sizes)
- OMC_MEMORY_MAX_ENTRIES env var (0 = unbounded)
- evict_to_cap(namespace, keep) helper
- Index-only eviction — body files stay on disk so recall(hash)
still works for entries that fell out of the chronological list
NEW MCP TOOL
- omc_memory_evict(namespace, keep) → {namespace, dropped, kept}
- omc_memory_stats now includes fibtier_cap
TESTS
32/32 MCP integration tests pass. 15/15 memory module unit tests.
WHY IT MATTERS
A 100-turn agent session now uses BOUNDED memory rather than the
10MB+ it would otherwise accumulate. The default 232-entry cap
covers ~hour-long conversations; v0.5's 10x context compression
holds across arbitrarily long sessions as a result.
HONEST FRAMING
Index-only eviction, not full deletion. Long-running agents would
benefit from external file cleanup. v0.6.1 candidate: physical
eviction with optional cold-storage archival.
See CHANGELOG.md#v0.6-fibtier-memory for the chapter.
v0.5 — Substrate-memory: 10.61× LLM context-budget reduction (target hit)
WHAT CHANGED
- New module omnimcode-core/src/memory.rs:
- MemoryStore { root } — filesystem at ~/.omc/memory//.txt
- store / recall / list / stats
- Namespace sanitization (alphanumeric + _-) prevents path traversal
- OMC_MEMORY_ROOT env for isolation
- Four new MCP tools:
- omc_memory_store(text, namespace?) → {content_hash, namespace, bytes}
- omc_memory_recall(content_hash, namespace?) → {found, text, bytes}
- omc_memory_list(namespace?, limit?) → entries with preview, no body
- omc_memory_stats(namespace?) → diagnostics
MEASURED COMPRESSION (20-turn agent task, top_k=10, examples/lib)
Baseline (full transcript inline): 869,761 B (100%)
v0.4 only (compressed predict + transcript): 423,030 B (48.6%, 2.06x)
v0.5 full (memory hashes + compressed): 82,008 B ( 9.4%, 10.61x)
Baseline grows quadratically; v0.5 grows linearly. Crossover at
turn ~5, 10x by turn 20.
WHY IT COMPOSES
The substrate's identity primitive (tokenizer::fnv1a_64) is shared
across all chapters — v0.3 predict, v0.3.1 fetch, v0.4
compress/decompress, v0.5 memory. An LLM agent mixes tools freely;
no tool needs to know which other tool produced a hash. That's
what makes the 10x win COMPOSE across the chapters instead of
being an isolated effect.
NOW POSSIBLE
- LLM agents can run multi-turn conversations at ~10% of baseline
context budget. - Each turn's content survives MCP process restart (filesystem
persistence) — agents can be paused/resumed without losing
substrate-keyed state. - Different conversation threads stay isolated via namespaces.
HONEST FRAMING
- The 10x is the COMBINED v0.4 + v0.5 stack. Either alone tops
at 2-3x. - Win scales with conversation length; at 5 turns v0.5 is at
parity, 10x kicks in around turn 15+. - Memory grows unbounded — long-running agents need pruning
(v0.6 candidate: wire fibtier's tier-bounded eviction).
TESTS
27/27 MCP integration + 10/10 memory unit tests.
See CHANGELOG.md#v0.5-substrate-memory +
experiments/substrate_context/FINDING_v05.md for the chapter.
v0.4 — Substrate-context: symbolic compression end-to-end, 2-3× LLM budget reduction
WHAT CHANGED
- omc_compress_context(text, every_n?) — substrate-keyed codec
payload for arbitrary OMC source. - omc_decompress(paths, codec | canonical_hash) — generalization
of omc_fetch_by_hash. Recovers source via library lookup
against corpus (alpha-rename invariant). - omc_predict format=codec — bounded substrate-thumbnail (≤16
sampled tokens + canonical hash). Sits between signature
(text-only) and full (everything). - paths can now be DIRECTORIES — recursively walked for *.omc
files. Cross-corpus blending: ["examples/lib"] ingests 320
fns across 16 files. - Hash unification: omc_predict's canonical_hash and
omc_compress_context's content_hash use the same primitive
(tokenizer::code_hash) and are interchangeable.
MEASURED COMPRESSION
10-task representative LLM workflow against examples/lib (320 fns):
top_k=5, 1 fetch: 14142 B → 6864 B (2.06x smaller)
top_k=10, 1 fetch: 27828 B → 10318 B (2.70x smaller)
top_k=20, 1 fetch: 39902 B → 14188 B (2.81x smaller)
The win amplifies with browse depth — per-candidate cost stays
at the substrate floor (~50 B for the hash) while bodies stay
un-paid-for unless committed to.
WHY IT MATTERS
Three primitives already in OMC compose without modification:
canonicalize (alpha-rename invariance), tokenizer::encode +
code_hash (substrate-aware identity), the substrate codec from
v0.0.5 (library-lookup recovery). v0.4 wires them through the
MCP surface so an LLM client has them as first-class tools.
NOW POSSIBLE
- LLM agents can hold 20 candidate continuations in context for
the byte cost previously required for 7 full bodies. - Branching is free at the context-budget level — agents can
explore wider without burning their window. - Cross-corpus queries (project + stdlib + registry) cost the
same as single-file queries because hashes are global. - LLM "remembers" arbitrary code chunks via omc_compress_context,
getting them back losslessly via library lookup.
HONEST LIMITS
The original ask was 10% of the context budget (~10x). The
structural ceiling for hash-browse + on-demand-fetch alone is
closer to 3x; the 10x claim requires v0.5 (substrate-keyed
conversation memory). v0.4 ships the primitives; v0.5 wires
conversation transcripts through the same substrate.
TESTS
20/20 MCP integration tests pass.
See CHANGELOG.md#v0.4-substrate-context +
experiments/substrate_context/FINDING.md for the full chapter.
v0.3 — Symbolic prediction: substrate-indexed code completion
The synthesis of two earlier substrates — tokenizer::encode
(symbol stream) and canonical_hash + attractor_distance (substrate
metric) — into one primitive that LLM agents (and humans) can use
while writing OMC to find out "what could come next here?" with
each result carrying a substrate-distance score and a pointer back
to the source function it came from. Branching is first-class:
every result is a viable continuation.
WHAT CHANGED
- New module omnimcode-core/src/predict.rs (~370 lines):
- CorpusEntry { fn_name, source, file, symbol_stream,
canonical_hash, attractor } - PrefixTrie — each node accumulates corpus indices whose stream
passes through it - CodeCorpus — entries + trie; ingest_fn and ingest_file
- predict_continuations(corpus, prefix_source, top_k)
- Ranking: (longest prefix match desc, smallest substrate
distance asc, corpus index asc)
- CorpusEntry { fn_name, source, file, symbol_stream,
- Two new builtins:
- omc_predict_files(paths_array, prefix_source, top_k) → array
of dicts (stateless) - omc_corpus_size(paths_array) → int (diagnostic)
- omc_predict_files(paths_array, prefix_source, top_k) → array
- Result dict fields: fn_name, source, file, canonical_hash,
attractor, prefix_match_len, substrate_distance, query_attractor. - 10 Rust unit tests + 11 OMC end-to-end tests.
WIN CONDITION (verified)
Prefix fn prom_linear_ against the 70-fn Prometheus corpus
returns exactly the three prom_linear_* fns ranked by substrate
distance. Wider prefix fn prom_attention_ surfaces 5 attention
fns with substrate distances ~3 orders of magnitude tighter than
the linear namespace — substrate distance reflects code-shape
similarity inside a namespace.
WHY IT MATTERS
Three primitives already in OMC — canonicalize (alpha-rename
invariance), tokenizer::encode (substrate-aware symbol stream),
code_hash (substrate-routed identity) — combine without modification.
The trie is a 50-line data structure on top. No embedding model,
no neural inference. Deterministic: same corpus + same prefix
→ same top-k, every run.
NOW POSSIBLE
- An LLM agent can query "what previous code came next at this
shape?" as a single tool call. - Branching is first-class — each result is a viable continuation.
- Provenance is content-addressed: every suggestion includes its
source file path AND its canonical hash, so a downstream agent
can verify integrity by recompute. - The corpus is just file paths; no index-build step, no
maintenance overhead.
TESTS
223 Rust pass, 1087/1087 OMC pass (was 213/1076).
DEFERRED
- Prometheus rerank pass (structural substrate ranking + learned
probability overlay) - Stateful corpus API for repeated queries
- MCP tool surface
- Streaming + cross-corpus blending
See CHANGELOG.md#v0.3-symbolic-prediction +
experiments/symbolic_prediction/FINDING.md for the chapter detail.
v0.2 — Ergonomics: OMC becomes forgiving
WHAT CHANGED
- Python-idiom builtins: len() polymorphic over array/string/dict/null;
range(start, end, step) with negative step; getenv(name, default);
to_hex / from_hex round-trip; parse_int / parse_float aliases. - Negative array indexing (Python-style): xs[-1], arr_get(xs, -1),
arr_set(xs, -1, v) all work. Out-of-bounds errors name the array,
report length, hint at safe_arr_get for wrap-around. - Compound assignment: +=, -=, *=, /=, %= desugared at parse time.
- For-loop iterables expanded: for k in dict iterates keys; for c in
string iterates chars. Anything else errors instead of no-op'ing. - Self-healing pass: two new classes — null_arith (null + 5 → 0 + 5)
and if_numeric (if 0 flagged as constant branch). 11 heal classes
total. - Did-you-mean for undefined variables (substrate-bucketed close-name
lookup over current scope). - Cross-container hints: arr_get(some_dict, k) suggests dict_get;
symmetric for dict_get(arr, k). - Parser hints:
h h = 1→ "'h' is a reserved keyword; tryhval".
if x = 5→ "did you mean ==?". Friendlier unexpected-token msgs. - Runtime errors carry call-stack traces in the CLI.
- Type-mismatch errors report received type with hint.
WHY IT MATTERS
The most common bites a Python user hit on first contact — cryptic
{:?} token names in parser errors, no +=, silent no-op for-loops over
dicts, undefined-variable errors with no suggestion — are gone. The
language now lives up to its "forgiving by default" pitch instead of
just promising it.
NOW POSSIBLE
- A new user can write OMC reaching for Python intuitions (len(d),
range(0, 10, 2), x += 1, for key in scores) and have it Just Work. - Runtime errors debuggable from the message alone, including call chain.
- Mistakes surface at the right layer (parser vs heal-pass vs runtime).
TESTS
+29 new Rust tests, +28 new OMC tests. Final: 213 Rust pass,
1073/1076 OMC pass.
See CHANGELOG.md#v0.2-ergonomics for the chapter index.
v0.1 — Substrate attention: K + S-MOD + V stack to -8.94% val on TinyShakespeare
The substrate-attention thesis — that the K matrix, attention
softmax, and V projection can each be replaced by substrate-derived
alternatives that match or beat learned components — finally lands
as a STACK. None of these wins are individually new; the chapter's
point is that they stack inside one transformer block at real scale.
WHAT CHANGED
- Substrate-K (1462d45, SUBSTRATE_K_FINDING.md): replace learned W_K
with CRT-Fibonacci positional table. K structurally pre-built; Q
and V stay learned. -6.3% val at multi-head TinyShakespeare scale
(2/3 seeds), ~10% fewer attention parameters. - S-MOD softmax (761180f, SUBSTRATE_SOFTMAX_FINDING.md): replace
softmax(s) with softmax(s) × 1/(1 + α·attractor_distance(s)),
then renormalize. Off-attractor weights dampened. 3-seed α sweep
found α=1.0 wins -6.57% vs vanilla softmax. - Substrate-V resample (1080da2, SUBSTRATE_V_FINDING.md): apply
substrate_resample(x @ W_v) to V post-projection (W_v stays
learned). Off-attractor V-magnitudes dampened. -2.52% on top of
L1-MH + S-MOD (3/3 seeds).
CUMULATIVE RESULT
L0 (vanilla softmax + learned V): 3.301
L1-MH + S-MOD α=1.0: 3.084
L1-MH + S-MOD α=1.0 + V1 (production): 3.006 = -8.94% val
WHY IT MATTERS
Each substrate replacement is a MODULATION, not a wholesale swap of
the learned projection. The substrate composes with task learning
instead of replacing it. The opposite recipe (substrate-V with no
learned W_v and no S-MOD) lost decisively the day prior. The
principle: substrate modulation works when applied to a quantity
that already has integer-coherent structure; substrate replacement
of learned projections does not.
NOW POSSIBLE
- Substrate-aware attention is the production default in Prometheus.
- Three substrate-component wins now stack in a single transformer
block on real data (TinyShakespeare 1.1MB). - Future component swaps (Q, FF, layernorm) measured against this
stacked baseline rather than vanilla. - Cross-runtime parity: every result reproduced in pure-OMC
Prometheus AND PyTorch.
See CHANGELOG.md#v0.1-substrate-attention for the chapter index.
V0.0.1
OMNIcode is a native, standalone Rust implementation of a harmonic computing language designed for genetic circuit evolution. It compiles to a single 509 KB portable binary with zero external dependencies, making it
ideal for embedded systems and game engines. Tiers 1-4 complete: circuit evaluation, optimization, Fibonacci search, and LRU caching. 49/49 tests passing. Real benchmarks show 50-230× speedup over Python DEAP depending
on circuit complexity.
GitHub Repository Description (60 chars max)
Fast native circuit evolution. Zero deps. 509 KB binary.
For Technical Audiences
OMNIcode is a zero-dependency genetic algorithm framework for evolving Boolean circuits. Written in pure Rust with LTO optimization, it delivers 4.64M fitness evaluations/second with no interpreter overhead. Features
constant folding, algebraic simplification, and O(log φ n) search. Portable across Linux/Unix systems via single static binary.
For Game Developers
OMNIcode - Embeddable circuit evolution engine for game AI and procedural generation. 509 KB binary, zero dependencies, no runtime overhead. Evolve logic circuits 50-230× faster than Python frameworks. Perfect for game
mods, tools, and offline processing pipelines.
For Researchers
A benchmarked implementation of genetic programming for Boolean circuits, with real Criterion data confirming native performance advantages. Honest documentation, transparent limitations, reproducible results. Suitable
for research and experimentation; not production-grade.
Full Changelog: https://github.com/RandomCoder-lab/OMC/commits/V0.0.1