v0.3.1 — Symbolic compression: 3.8× smaller predict default + omc_fetch_by_hash
WHAT CHANGED
- omc_predict gains a format parameter:
- hash (NEW DEFAULT, ~50 bytes/suggestion): fn_name + file +
canonical_hash + prefix_match_len + substrate_distance - signature (~100 bytes): adds the fn signature line
- full: complete source (previous default behavior)
- hash (NEW DEFAULT, ~50 bytes/suggestion): fn_name + file +
- omc_fetch_by_hash(paths, canonical_hash) — companion tool.
Recovers a function body by alpha-rename-invariant canonical hash.
Returns {found, fn_name, file, source} or {found: false}.
MEASURED COMPRESSION
Same query fn prom_attention_ x top_k=5 against prometheus.omc:
format=hash 1253 bytes 26.2% (3.8x smaller)
format=signature 1622 bytes 33.9%
format=full 4783 bytes 100% (v0.3 behavior)
The ratio widens on longer fns — top_k=5 over fns averaging 60
lines compresses ~10x.
WHY IT MATTERS
Canonical hash is alpha-rename invariant — recovery via
fetch_by_hash works even if the fn was renamed after the predict
call. The LLM workflow becomes: predict cheaply (hash), reason
over candidates, fetch only the body it commits to using.
Branching is now ~free at the context-budget level: 50 candidates
fit in the LLM's mind for the cost of 6-7 full bodies.
NOW POSSIBLE
- LLM agents can hold 5-10x more candidate fns "in mind" per query.
- Repeated browsing across a corpus stays cheap.
- The substrate's content-addressed identity becomes a first-class
context-compression mechanism.
TESTS
13/13 MCP integration tests pass. 231 Rust pass, 1087/1087 OMC.
DEFERRED TO v0.4
- Wire substrate codec (omc_codec_encode 10-50x ratio) into the
predict response path for full library-lookup compression. - Substrate-keyed conversation memory via fibtier.
- omc_compress_context(text) MCP tool.
- Cross-corpus blending.
See CHANGELOG.md#v0.3.1-symbolic-compression for the chapter index.