Skip to content

v0.3.1 — Symbolic compression: 3.8× smaller predict default + omc_fetch_by_hash

Choose a tag to compare

@RandomCoder-lab RandomCoder-lab released this 17 May 17:43
· 295 commits to master since this release

WHAT CHANGED

  • omc_predict gains a format parameter:
    • hash (NEW DEFAULT, ~50 bytes/suggestion): fn_name + file +
      canonical_hash + prefix_match_len + substrate_distance
    • signature (~100 bytes): adds the fn signature line
    • full: complete source (previous default behavior)
  • omc_fetch_by_hash(paths, canonical_hash) — companion tool.
    Recovers a function body by alpha-rename-invariant canonical hash.
    Returns {found, fn_name, file, source} or {found: false}.

MEASURED COMPRESSION
Same query fn prom_attention_ x top_k=5 against prometheus.omc:
format=hash 1253 bytes 26.2% (3.8x smaller)
format=signature 1622 bytes 33.9%
format=full 4783 bytes 100% (v0.3 behavior)

The ratio widens on longer fns — top_k=5 over fns averaging 60
lines compresses ~10x.

WHY IT MATTERS
Canonical hash is alpha-rename invariant — recovery via
fetch_by_hash works even if the fn was renamed after the predict
call. The LLM workflow becomes: predict cheaply (hash), reason
over candidates, fetch only the body it commits to using.
Branching is now ~free at the context-budget level: 50 candidates
fit in the LLM's mind for the cost of 6-7 full bodies.

NOW POSSIBLE

  • LLM agents can hold 5-10x more candidate fns "in mind" per query.
  • Repeated browsing across a corpus stays cheap.
  • The substrate's content-addressed identity becomes a first-class
    context-compression mechanism.

TESTS
13/13 MCP integration tests pass. 231 Rust pass, 1087/1087 OMC.

DEFERRED TO v0.4

  • Wire substrate codec (omc_codec_encode 10-50x ratio) into the
    predict response path for full library-lookup compression.
  • Substrate-keyed conversation memory via fibtier.
  • omc_compress_context(text) MCP tool.
  • Cross-corpus blending.

See CHANGELOG.md#v0.3.1-symbolic-compression for the chapter index.