You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(research): web-native addressed LM — language executes over the knowledge web
A transformerless LM built entirely on the addressed knowledge web — no token-prediction
model anywhere. In experiments/transformerless_lm/:
- langexec.py — the EXECUTION oracle: a sentence "executes" by traversing its concept-
addresses over hub-damped (PMI) weighted edges; resolves iff coherent (AUC 0.91–0.98
vs word-salad; survives a common-word steelman via PMI not raw edge count)
- fluency.py — the HOW oracle: fluency learned from the web's own transitions (trigram,
AUC 0.86 held-out); thinkloop.py — heal a faulting thought up the resolve gradient
- realize.py — concepts → fluent grounded sentence (template / compose / hybrid)
- engine.py / agent.py — recall|relate|decline router + tool-use (exact charcount,
compute, cross-source bridge); agnostic frame-word detection (interrogative-context)
- create.py — recombine DISTANT concepts across sources, 3-gate (coherence+support+meaning)
- selfimprove.py — write-don't-train of self-verified thoughts; webmind.py — unified mind
- compress_web.py + finalize_compressed.py + kdb shim — LOSSLESS 45% web compression
(node→int interning + zlib passages), presented via SQLite views so readers are unchanged
- ghost_fold.py — reconnect 17.7k ghost nodes; se_fold_i.py — interned science fold
- extract_chat.py — fold the project's own dialogue so the web holds its genesis
Method discipline throughout: ground in real code, gate every claim, report honestly.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Far stronger than web.py (fields-only, weak) — the dictionary IS the connective tissue. PERSISTED:
603
+
KnowledgeWeb.save/load, .kwebcache (gitignored), reload ~1s vs ~3min build. Adding a new field = append a
604
+
corpus + rebuild (or incrementally extend). The agnostic substrate scales to "all knowledge piece by
605
+
piece" — each field a data layer, none hardcoded. HONEST scope: 6,000-node subset; field corpora are
606
+
single public-domain books (not full textbooks); meaning is distributional (definitional+domain), not
607
+
understanding. It's the structure of knowledge made navigable — a different cognition than a human mind
608
+
(unbiased, exhaustive, grounded, but no leap-beyond-data, no qualia) — the complement to a reasoner.
609
+
610
+
-[growth]rebuild#1 over 75 texts/13 fields: 6,000 nodes, 6.95M edges (vs 1.66M at 8 books), saved .kwebcache 301MB. Multi-hop cross-field chains added (deep_connect): war→law→justice (religion+science), light→meaning→truth (science+religion). Honest: denser web = shallower paths + some generic bridges; broader not always sharper.
611
+
612
+
-[growth] NO-CAP accumulation (user): ingest --seq (unlimited sequential Gutenberg, subject auto-labeled from metadata); soft cap removed. HARD limit = DISK (3.7GB free; ~2.5GB held by old *.pt indexes NOT deleted). Disk guards baked into ingest (stop <1.2GB) + kweb (skip rebuild <1.5GB) so growth never crashes the box. Generic connections embraced as valid (human-like association). Loop continues seq-ingest + periodic rebuild until disk-guard or user stop.
613
+
614
+
-[growth] INCREMENTAL "stack then integrate" built (user insight): kweb.add_field appends a field's passages+edges to the saved web in O(new text) — no retrain (6,000 dict nodes are fixed, vectors stay valid). stack.py = incremental driver (tracks .kwebcache/stacked.json, adds only new library texts, re-saves). Full kweb --rebuild becomes RARE (only to refresh cross-verification/embedding). Growth cost: O(new) not O(total). Disk freed to 42G (user removed old *.pt indexes).
# Good morning — the web-native MIND speaks, thinks, uses tools, and improved itself overnight
2
+
3
+
You asked me to use the new way-of-speaking to make the LM **improve itself without human intervention**,
4
+
and have it **speaking, thinking, and using tools** by the time you wake up. It does. No token-prediction
5
+
model anywhere — everything runs over the addressed knowledge web.
6
+
7
+
## Run these two things first
8
+
9
+
```bash
10
+
cd~/OMC/experiments/transformerless_lm
11
+
python3 webmind.py --report # what it learned overnight (instant, reads the ledger + store)
12
+
python3 webmind.py --ab # COLD vs WARM proof it improved itself (~3 min)
13
+
python3 webmind.py --demo # ~90s: showcases all four capabilities in one run
14
+
python3 webmind.py # talk to it yourself (REPL)
15
+
python3 webmind.py --think "how do war and disease relate"# one multi-step reasoning chain
16
+
```
17
+
18
+
## The proof it improved itself (cold vs warm A/B, measured)
19
+
20
+
Same 20 relate-questions, answered with **no memory** (cold — must re-derive each multi-hop bridge) vs
21
+
with the **overnight-accumulated verified memory** (warm — instant recall of what it reasoned out):
22
+
23
+
```
24
+
mean confidence : COLD 0.48 -> WARM 0.79
25
+
instant recalls : COLD 0/20 -> WARM 16/20
26
+
```
27
+
28
+
That delta *is* the self-improvement: bridges it once derived slowly, it now answers instantly and with
29
+
higher confidence. (`webmind.py --ab` reproduces it.) The four questions that didn't change routed to
30
+
single-topic recall both ways — shown honestly, not hidden.
31
+
32
+
## What got built (all new tonight, all tested)
33
+
34
+
| file | capability | what it does |
35
+
|---|---|---|
36
+
|`agent.py`|**TOOLS**| addresses each query to the right tool: `charcount` (exact letter-counting — what token-LLMs get wrong), `compute` (arithmetic), `relate` (cross-source bridge), `recall` (single topic), `memory` (recall a self-verified thought) |
37
+
|`selfimprove.py`|**SELF-IMPROVEMENT**| the engine self-probes, reasons out multi-hop connections, **gates** them, and records the verified ones — then recalls them instantly. Ran all night. |
0 commit comments