|
| 1 | +# Wiki Compiler |
| 2 | + |
| 3 | +> The compiled view on demand layer for Open Brain: a recipe that turns structured thoughts and graph data into regenerable wiki artifacts you can run daily, weekly, or on demand. |
| 4 | +
|
| 5 | +## What This Is |
| 6 | + |
| 7 | +This recipe is the composition layer Nate described in the video. |
| 8 | + |
| 9 | +It does **not** replace the Open Brain database. It sits on top of it and produces a browsable synthesis layer: |
| 10 | + |
| 11 | +- the SQL database stays the source of truth |
| 12 | +- the graph and wiki pages are generated artifacts |
| 13 | +- if a wiki page is wrong, you fix the underlying data and regenerate |
| 14 | +- the wiki is never the canonical store |
| 15 | + |
| 16 | +That gives you the Karpathy-style "readable compiled understanding" layer without turning markdown pages into the system of record. |
| 17 | + |
| 18 | +## What It Does |
| 19 | + |
| 20 | +`compile-wiki.mjs` orchestrates the graph/wiki stack that now exists on `main`: |
| 21 | + |
| 22 | +1. Triggers the **entity extraction worker** so new thoughts become entities, links, and evidence rows. |
| 23 | +2. Runs the **typed edge classifier** so the system can capture reasoning relations like `supports`, `contradicts`, and `supersedes`. |
| 24 | +3. Batch-generates **entity wiki pages** from linked thoughts and graph edges. |
| 25 | +4. Generates **topic wiki pages** from the core `thoughts` table. |
| 26 | +5. Optionally backfills **Gmail thread wiki pages**. |
| 27 | + |
| 28 | +The result is a compiled wiki directory plus a manifest of what ran. |
| 29 | + |
| 30 | +## Why This Matches The Promise |
| 31 | + |
| 32 | +This is the bridge between "Open Brain as durable structured memory" and "LLM wiki as compiled understanding." |
| 33 | + |
| 34 | +It gives you: |
| 35 | + |
| 36 | +- **compiled views on demand** via a single wrapper command |
| 37 | +- **scheduled compilation** via cron, Claude Code scheduled tasks, or any job runner |
| 38 | +- **graph-backed synthesis** because entity extraction + typed edges land before wiki generation |
| 39 | +- **filterable synthesis** because the underlying source is SQL, not raw files |
| 40 | +- **regenerable outputs** because the database remains authoritative |
| 41 | + |
| 42 | +That is the architecture Nate described: structured capture first, compiled wiki second. |
| 43 | + |
| 44 | +## Contributor Credit |
| 45 | + |
| 46 | +This recipe is intentionally a **composition layer over contributor work by Alan Shurafa**. The underlying graph/wiki components it orchestrates were authored in the following merged contributions: |
| 47 | + |
| 48 | +- [#197](https://github.com/NateBJones-Projects/OB1/pull/197) — entity extraction schema |
| 49 | +- [#199](https://github.com/NateBJones-Projects/OB1/pull/199) — entity extraction worker |
| 50 | +- [#208](https://github.com/NateBJones-Projects/OB1/pull/208) — typed reasoning edges + classifier |
| 51 | +- [#213](https://github.com/NateBJones-Projects/OB1/pull/213) — entity wiki pages |
| 52 | +- [#222](https://github.com/NateBJones-Projects/OB1/pull/222) — wiki synthesis |
| 53 | + |
| 54 | +This wrapper recipe packages those pieces into one reproducible workflow. It is not a rewrite of Alan's work. |
| 55 | + |
| 56 | +## Architecture |
| 57 | + |
| 58 | +```text |
| 59 | +Open Brain thoughts (SQL) |
| 60 | + | |
| 61 | + v |
| 62 | +entity-extraction trigger + worker |
| 63 | + | |
| 64 | + v |
| 65 | +entities / thought_entities / edges |
| 66 | + | |
| 67 | + +--------------------+ |
| 68 | + | | |
| 69 | + v v |
| 70 | +typed-edge-classifier entity-wiki |
| 71 | + | | |
| 72 | + v v |
| 73 | +thought_edges per-entity wiki pages |
| 74 | + | |
| 75 | + +--------------------+ |
| 76 | + | | |
| 77 | + v v |
| 78 | +wiki-synthesis Gmail thread wikis |
| 79 | + | |
| 80 | + v |
| 81 | +compiled-wiki/ + compile-manifest.json |
| 82 | +``` |
| 83 | + |
| 84 | +## Prerequisites |
| 85 | + |
| 86 | +- A working Open Brain install |
| 87 | +- The merged graph/wiki stack on `main` |
| 88 | +- Node.js 18+ |
| 89 | +- A valid `.env.local` or shell env for the underlying recipes |
| 90 | +- A deployed `entity-extraction-worker` Edge Function if you want the wrapper to trigger extraction automatically |
| 91 | + |
| 92 | +### Required environment |
| 93 | + |
| 94 | +At minimum, the downstream recipes expect: |
| 95 | + |
| 96 | +```text |
| 97 | +OPEN_BRAIN_URL |
| 98 | +OPEN_BRAIN_SERVICE_KEY |
| 99 | +LLM_API_KEY |
| 100 | +``` |
| 101 | + |
| 102 | +Additional environment varies by phase: |
| 103 | + |
| 104 | +- `LLM_MODEL`, `LLM_BASE_URL` for wiki synthesis |
| 105 | +- `ANTHROPIC_API_KEY` for typed-edge classification |
| 106 | +- `EMBEDDING_API_KEY` if you want semantic expansion or thought-mode entity dossiers |
| 107 | +- `MCP_ACCESS_KEY` or `ENTITY_EXTRACTION_MCP_ACCESS_KEY` if you want this wrapper to trigger the entity extraction worker automatically |
| 108 | +- `ENTITY_EXTRACTION_WORKER_URL` if you do not want the wrapper to derive it from `OPEN_BRAIN_URL` |
| 109 | + |
| 110 | +## Install |
| 111 | + |
| 112 | +Nothing extra to install. This recipe is a wrapper around merged in-repo scripts. |
| 113 | + |
| 114 | +Run it from the Open Brain repo root so the downstream recipes can read your root `.env.local`: |
| 115 | + |
| 116 | +```bash |
| 117 | +node recipes/wiki-compiler/compile-wiki.mjs --help |
| 118 | +``` |
| 119 | + |
| 120 | +Done when: the help text prints and the referenced component scripts exist. |
| 121 | + |
| 122 | +## On-Demand Usage |
| 123 | + |
| 124 | +### 1. Standard compile pass |
| 125 | + |
| 126 | +This is the default "build the compiled understanding layer" run: |
| 127 | + |
| 128 | +```bash |
| 129 | +node recipes/wiki-compiler/compile-wiki.mjs |
| 130 | +``` |
| 131 | + |
| 132 | +By default this: |
| 133 | + |
| 134 | +- tries to trigger entity extraction |
| 135 | +- runs typed-edge classification |
| 136 | +- generates entity wiki pages |
| 137 | +- generates the built-in `autobiography` topic wiki |
| 138 | +- writes outputs under `./compiled-wiki/` |
| 139 | + |
| 140 | +### 2. Dry run |
| 141 | + |
| 142 | +Use this before your first real compile: |
| 143 | + |
| 144 | +```bash |
| 145 | +node recipes/wiki-compiler/compile-wiki.mjs --dry-run |
| 146 | +``` |
| 147 | + |
| 148 | +### 3. Compile entity pages only |
| 149 | + |
| 150 | +```bash |
| 151 | +node recipes/wiki-compiler/compile-wiki.mjs \ |
| 152 | + --skip-extraction \ |
| 153 | + --skip-edges \ |
| 154 | + --skip-topic-wiki |
| 155 | +``` |
| 156 | + |
| 157 | +### 4. Compile topic pages only |
| 158 | + |
| 159 | +```bash |
| 160 | +node recipes/wiki-compiler/compile-wiki.mjs \ |
| 161 | + --skip-extraction \ |
| 162 | + --skip-edges \ |
| 163 | + --skip-entity-wiki \ |
| 164 | + --topic autobiography |
| 165 | +``` |
| 166 | + |
| 167 | +### 5. Include Gmail thread wikis |
| 168 | + |
| 169 | +```bash |
| 170 | +node recipes/wiki-compiler/compile-wiki.mjs --gmail --gmail-limit 10 |
| 171 | +``` |
| 172 | + |
| 173 | +## Scheduling |
| 174 | + |
| 175 | +This is designed to run on demand **or** on a schedule. |
| 176 | + |
| 177 | +### Daily light compile |
| 178 | + |
| 179 | +```bash |
| 180 | +node recipes/wiki-compiler/compile-wiki.mjs \ |
| 181 | + --edge-limit 25 \ |
| 182 | + --entity-batch-limit 15 \ |
| 183 | + --topic autobiography \ |
| 184 | + --gmail \ |
| 185 | + --gmail-limit 5 |
| 186 | +``` |
| 187 | + |
| 188 | +### Weekly deep compile |
| 189 | + |
| 190 | +```bash |
| 191 | +node recipes/wiki-compiler/compile-wiki.mjs \ |
| 192 | + --edge-limit 150 \ |
| 193 | + --entity-batch-limit 75 \ |
| 194 | + --topic autobiography \ |
| 195 | + --gmail \ |
| 196 | + --re-evaluate |
| 197 | +``` |
| 198 | + |
| 199 | +### Cron example |
| 200 | + |
| 201 | +```cron |
| 202 | +0 6 * * * cd /path/to/OB1 && /usr/bin/env node recipes/wiki-compiler/compile-wiki.mjs --edge-limit 25 --entity-batch-limit 15 --topic autobiography >> logs/wiki-compiler.log 2>&1 |
| 203 | +``` |
| 204 | + |
| 205 | +### Agent-driven schedule |
| 206 | + |
| 207 | +If you prefer Claude Code / Codex style scheduled runs, point the agent at: |
| 208 | + |
| 209 | +```bash |
| 210 | +node recipes/wiki-compiler/compile-wiki.mjs --edge-limit 25 --entity-batch-limit 15 --topic autobiography |
| 211 | +``` |
| 212 | + |
| 213 | +The important contract is the same regardless of scheduler: |
| 214 | + |
| 215 | +- write new information into SQL first |
| 216 | +- regenerate the compiled wiki from the source tables |
| 217 | +- do not manually edit generated wiki pages |
| 218 | + |
| 219 | +## Output |
| 220 | + |
| 221 | +By default the wrapper writes to: |
| 222 | + |
| 223 | +```text |
| 224 | +compiled-wiki/ |
| 225 | + entities/ # entity-wiki output when using file mode |
| 226 | + topics/ # wiki-synthesis topic output |
| 227 | + compile-manifest.json # run summary and phase statuses |
| 228 | +``` |
| 229 | + |
| 230 | +Other outputs are owned by the underlying recipes: |
| 231 | + |
| 232 | +- `data/wiki-synthesis-state.jsonl` for Gmail thread synthesis state |
| 233 | +- `public.thought_edges` for typed reasoning links |
| 234 | +- `public.entities`, `public.edges`, `public.thought_entities` for graph extraction |
| 235 | + |
| 236 | +## Important Design Rule |
| 237 | + |
| 238 | +Do **not** treat generated wiki pages as the source of truth. |
| 239 | + |
| 240 | +This recipe is built around the opposite rule: |
| 241 | + |
| 242 | +- capture raw facts and source material in Open Brain first |
| 243 | +- compile the wiki from that source |
| 244 | +- regenerate instead of manually patching summaries |
| 245 | + |
| 246 | +That is what prevents wiki drift. |
| 247 | + |
| 248 | +## Expected Outcome |
| 249 | + |
| 250 | +After a successful run you should have: |
| 251 | + |
| 252 | +- a refreshed entity graph fed by structured Open Brain data |
| 253 | +- reasoning edges in `thought_edges` |
| 254 | +- per-entity wiki pages |
| 255 | +- topic synthesis pages |
| 256 | +- optionally Gmail thread wiki artifacts |
| 257 | +- a manifest recording which phases ran and whether they succeeded |
| 258 | + |
| 259 | +At that point you can browse the compiled layer like a wiki while keeping SQL as the authority underneath it. |
| 260 | + |
| 261 | +## Troubleshooting |
| 262 | + |
| 263 | +**Issue: entity extraction phase skips itself** |
| 264 | +Solution: set `ENTITY_EXTRACTION_WORKER_URL` plus `ENTITY_EXTRACTION_MCP_ACCESS_KEY`, or define `OPEN_BRAIN_URL` plus `MCP_ACCESS_KEY` so the wrapper can derive the worker endpoint. |
| 265 | + |
| 266 | +**Issue: typed-edge classification fails** |
| 267 | +Solution: confirm `ANTHROPIC_API_KEY` is available and that the `typed-reasoning-edges` schema is installed on the target brain. |
| 268 | + |
| 269 | +**Issue: entity pages fail with missing relation/table errors** |
| 270 | +Solution: confirm the entity extraction schema and worker have been applied and allowed to populate `entities`, `edges`, and `thought_entities`. |
| 271 | + |
| 272 | +**Issue: Gmail wiki backfill fails** |
| 273 | +Solution: confirm your email thoughts use the expected Gmail metadata shape and that `thought_edges` exists. |
| 274 | + |
| 275 | +**Issue: topic synthesis writes the wrong pages** |
| 276 | +Solution: pass explicit `--topic` and `--scope key=value` flags so the wrapper only runs the slice you want. |
0 commit comments