Skip to content

Commit 4032571

Browse files
docs: README — document the web-native addressed LM + lossless web compression
- experiments/transformerless_lm/README.md: new lead section for the web-native LM (WHAT/HOW oracles, heal→resolve, realize, create, self-improve, 45% lossless compression), CRT-PE kept as the measured origin - main README: repo-layout entry updated to reflect the addressed-LM line Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
1 parent 3612c13 commit 4032571

2 files changed

Lines changed: 50 additions & 2 deletions

File tree

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -291,7 +291,7 @@ If you're trying to understand how OMC got here, **read the [GitHub Releases](ht
291291
| `omnimcode-gdextension/` | Godot 4 GDExtension binding |
292292
| `omnimcode-python/` | Python bindings via PyO3 |
293293
| `experiments/prometheus_parity/` | Substrate-attention A/B harness — pure OMC vs PyTorch |
294-
| `experiments/transformerless_lm/` | PyTorch CRT-PE vs sinusoidal training |
294+
| `experiments/transformerless_lm/` | CRT-PE vs sinusoidal training, **and** a web-native *addressed* LM (no token-prediction model — execution-over-the-web + lossless 45% web compression). See its `README.md`. |
295295
| `experiments/hybrid_llm/` | Per-component substrate substitution experiments |
296296
| `experiments/substrate_primitives/` | Substrate vs native vs OMC search benchmarks |
297297
| `examples/lib/` | `prometheus.omc`, `fibtier.omc`, `substrate.omc`, `harmonic_anomaly`, np/pd/sklearn/torch interop wrappers |

experiments/transformerless_lm/README.md

Lines changed: 49 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,52 @@
1-
# Transformerless LM — first end-to-end measurement
1+
# Transformerless LM
2+
3+
This directory began as a transformer **positional-encoding** experiment (CRT-PE, below) and grew into a
4+
fully **web-native addressed language model** — an LM with *no token-prediction model at all*. Both lines
5+
live here; the web-native LM is the current frontier, the CRT-PE result is its measured origin.
6+
7+
---
8+
9+
## The web-native addressed LM (current line)
10+
11+
**Thesis:** *addressing is execution.* Concepts are addressed nodes in a knowledge web (passages + weighted
12+
co-occurrence edges); a sentence "executes" by traversing its concept-addresses, and a thought is *created*
13+
by recombining distant addresses across sources. No weights are trained for generation — the web is the model.
14+
15+
Two oracles, both derived from the web and both empirically validated (held-out):
16+
17+
| layer | question | how | measured |
18+
|---|---|---|---|
19+
| **WHAT** (`langexec.py`) | does a thought *resolve*? | traverse concept-addresses over **hub-damped PMI** edges | AUC **0.91–0.98** real-vs-salad (survives a common-word steelman) |
20+
| **HOW** (`fluency.py`) | does it read like the web speaks? | transition stats counted from the corpus (agnostic, trigram) | AUC **0.86** fluent-vs-scrambled, held-out |
21+
22+
Built on those:
23+
- `thinkloop.py` — heal a faulting thought *up* the resolve gradient to a coherent fixpoint.
24+
- `realize.py` — resolved concepts → fluent grounded sentence (template / compose / **hybrid**).
25+
- `engine.py` + `agent.py` — a router (recall | relate | decline) and tool-use (exact char-counting,
26+
arithmetic, cross-source bridging). Frame-word detection is **agnostic** (interrogative-context, not a
27+
hand-coded keyword list).
28+
- `create.py` — recombine *distant* concepts across sources into a thought no single passage states,
29+
3-gated (coherence + support + meaning) and graded WARRANTED/SUGGESTIVE/WEAK with its weakest link shown.
30+
- `selfimprove.py` — write-don't-train of self-*verified* thoughts (a MAPE-K loop); measured cold→warm
31+
improvement (mean confidence 0.48→0.79, instant-recalls 0→16/20 on a fixed probe set).
32+
- `webmind.py` — the unified mind. `python3 webmind.py --demo` (showcase) · `--report` · `--think "…"`.
33+
34+
**Lossless 45% compression** (`compress_web.py` + `finalize_compressed.py`): the LM was asked how to
35+
compress *itself*; it surfaced dictionary-encoding + entropy-coding, which we applied — node-string
36+
**interning** (TEXT→int32) + **zlib** passages → **9.78 GB → 5.33 GB, verified lossless**. Presented back
37+
to all readers via SQLite **views** (`kdb.py` registers an `unzip` fn + detects the schema), so nothing
38+
else changed. The interned INT index fits in cache, which in turn unblocks fast corpus folding
39+
(`ghost_fold.py` reconnected 17.7k orphaned nodes; `se_fold_i.py` folds science into the compact DB).
40+
41+
Honest limits: fluency is a trigram floor (a small neural model lifts it); recall can answer with a full
42+
passage span; coverage is corpus-bound (science is ~1% of the current web — folding more is the lever, not
43+
hand-tuning). Nothing here claims to rival a frontier LLM at open generation — the claim is a **grounded,
44+
hallucination-resistant, self-calibrating** knowledge-and-synthesis engine that runs on near-zero compute.
45+
See `MORNING.md` for an orientation written for a returning reader.
46+
47+
---
48+
49+
## Origin: CRT-PE — first end-to-end measurement
250

351
**The headline:** the harmonic CRT-PE substitution beats the standard sinusoidal-PE transformer on a tiny char-level LM with **mean −19.9% validation loss across 5 seeds**, winning 4 of 5 seeds. This is the first end-to-end empirical evidence that the harmonic substrate substitutions identified by the experiments-0–12 series carry over to a real LM training task.
452

0 commit comments

Comments
 (0)