Skip to content

Commit e06aa9d

Browse files
[recipes] Add wiki compiler orchestration recipe
1 parent 2507c13 commit e06aa9d

4 files changed

Lines changed: 715 additions & 0 deletions

File tree

recipes/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ Step-by-step builds that add a new capability to your Open Brain. Follow the ins
1010
| [Email History Import](email-history-import/) | Pull your Gmail archive into searchable thoughts |
1111
| [ChatGPT Conversation Import](chatgpt-conversation-import/) | Ingest your ChatGPT data export |
1212
| [Daily Digest](daily-digest/) | Automated summary of recent thoughts via email or Slack |
13+
| [Wiki Compiler](wiki-compiler/) | Compiles graph-backed entity pages and topic synthesis into a regenerable wiki layer you can run on demand or on a schedule |
1314
| [Work Operating Model Activation](work-operating-model-activation/) | Interview-driven workflow that stores how you actually work and generates agent-ready operating files |
1415
| [World Model Diagnostic Activation](world-model-diagnostic-activation/) | Lightweight activation path for a 20-minute world-model diagnostic that uses the base OB1 connector and a direct-paste fallback |
1516
| [Research-to-Decision Workflow](research-to-decision-workflow/) | Compose canonical skills into operator and investor paths for analysis, synthesis, meetings, and decision documents |

recipes/wiki-compiler/README.md

Lines changed: 276 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,276 @@
1+
# Wiki Compiler
2+
3+
> The compiled view on demand layer for Open Brain: a recipe that turns structured thoughts and graph data into regenerable wiki artifacts you can run daily, weekly, or on demand.
4+
5+
## What This Is
6+
7+
This recipe is the composition layer Nate described in the video.
8+
9+
It does **not** replace the Open Brain database. It sits on top of it and produces a browsable synthesis layer:
10+
11+
- the SQL database stays the source of truth
12+
- the graph and wiki pages are generated artifacts
13+
- if a wiki page is wrong, you fix the underlying data and regenerate
14+
- the wiki is never the canonical store
15+
16+
That gives you the Karpathy-style "readable compiled understanding" layer without turning markdown pages into the system of record.
17+
18+
## What It Does
19+
20+
`compile-wiki.mjs` orchestrates the graph/wiki stack that now exists on `main`:
21+
22+
1. Triggers the **entity extraction worker** so new thoughts become entities, links, and evidence rows.
23+
2. Runs the **typed edge classifier** so the system can capture reasoning relations like `supports`, `contradicts`, and `supersedes`.
24+
3. Batch-generates **entity wiki pages** from linked thoughts and graph edges.
25+
4. Generates **topic wiki pages** from the core `thoughts` table.
26+
5. Optionally backfills **Gmail thread wiki pages**.
27+
28+
The result is a compiled wiki directory plus a manifest of what ran.
29+
30+
## Why This Matches The Promise
31+
32+
This is the bridge between "Open Brain as durable structured memory" and "LLM wiki as compiled understanding."
33+
34+
It gives you:
35+
36+
- **compiled views on demand** via a single wrapper command
37+
- **scheduled compilation** via cron, Claude Code scheduled tasks, or any job runner
38+
- **graph-backed synthesis** because entity extraction + typed edges land before wiki generation
39+
- **filterable synthesis** because the underlying source is SQL, not raw files
40+
- **regenerable outputs** because the database remains authoritative
41+
42+
That is the architecture Nate described: structured capture first, compiled wiki second.
43+
44+
## Contributor Credit
45+
46+
This recipe is intentionally a **composition layer over contributor work by Alan Shurafa**. The underlying graph/wiki components it orchestrates were authored in the following merged contributions:
47+
48+
- [#197](https://github.com/NateBJones-Projects/OB1/pull/197) — entity extraction schema
49+
- [#199](https://github.com/NateBJones-Projects/OB1/pull/199) — entity extraction worker
50+
- [#208](https://github.com/NateBJones-Projects/OB1/pull/208) — typed reasoning edges + classifier
51+
- [#213](https://github.com/NateBJones-Projects/OB1/pull/213) — entity wiki pages
52+
- [#222](https://github.com/NateBJones-Projects/OB1/pull/222) — wiki synthesis
53+
54+
This wrapper recipe packages those pieces into one reproducible workflow. It is not a rewrite of Alan's work.
55+
56+
## Architecture
57+
58+
```text
59+
Open Brain thoughts (SQL)
60+
|
61+
v
62+
entity-extraction trigger + worker
63+
|
64+
v
65+
entities / thought_entities / edges
66+
|
67+
+--------------------+
68+
| |
69+
v v
70+
typed-edge-classifier entity-wiki
71+
| |
72+
v v
73+
thought_edges per-entity wiki pages
74+
|
75+
+--------------------+
76+
| |
77+
v v
78+
wiki-synthesis Gmail thread wikis
79+
|
80+
v
81+
compiled-wiki/ + compile-manifest.json
82+
```
83+
84+
## Prerequisites
85+
86+
- A working Open Brain install
87+
- The merged graph/wiki stack on `main`
88+
- Node.js 18+
89+
- A valid `.env.local` or shell env for the underlying recipes
90+
- A deployed `entity-extraction-worker` Edge Function if you want the wrapper to trigger extraction automatically
91+
92+
### Required environment
93+
94+
At minimum, the downstream recipes expect:
95+
96+
```text
97+
OPEN_BRAIN_URL
98+
OPEN_BRAIN_SERVICE_KEY
99+
LLM_API_KEY
100+
```
101+
102+
Additional environment varies by phase:
103+
104+
- `LLM_MODEL`, `LLM_BASE_URL` for wiki synthesis
105+
- `ANTHROPIC_API_KEY` for typed-edge classification
106+
- `EMBEDDING_API_KEY` if you want semantic expansion or thought-mode entity dossiers
107+
- `MCP_ACCESS_KEY` or `ENTITY_EXTRACTION_MCP_ACCESS_KEY` if you want this wrapper to trigger the entity extraction worker automatically
108+
- `ENTITY_EXTRACTION_WORKER_URL` if you do not want the wrapper to derive it from `OPEN_BRAIN_URL`
109+
110+
## Install
111+
112+
Nothing extra to install. This recipe is a wrapper around merged in-repo scripts.
113+
114+
Run it from the Open Brain repo root so the downstream recipes can read your root `.env.local`:
115+
116+
```bash
117+
node recipes/wiki-compiler/compile-wiki.mjs --help
118+
```
119+
120+
Done when: the help text prints and the referenced component scripts exist.
121+
122+
## On-Demand Usage
123+
124+
### 1. Standard compile pass
125+
126+
This is the default "build the compiled understanding layer" run:
127+
128+
```bash
129+
node recipes/wiki-compiler/compile-wiki.mjs
130+
```
131+
132+
By default this:
133+
134+
- tries to trigger entity extraction
135+
- runs typed-edge classification
136+
- generates entity wiki pages
137+
- generates the built-in `autobiography` topic wiki
138+
- writes outputs under `./compiled-wiki/`
139+
140+
### 2. Dry run
141+
142+
Use this before your first real compile:
143+
144+
```bash
145+
node recipes/wiki-compiler/compile-wiki.mjs --dry-run
146+
```
147+
148+
### 3. Compile entity pages only
149+
150+
```bash
151+
node recipes/wiki-compiler/compile-wiki.mjs \
152+
--skip-extraction \
153+
--skip-edges \
154+
--skip-topic-wiki
155+
```
156+
157+
### 4. Compile topic pages only
158+
159+
```bash
160+
node recipes/wiki-compiler/compile-wiki.mjs \
161+
--skip-extraction \
162+
--skip-edges \
163+
--skip-entity-wiki \
164+
--topic autobiography
165+
```
166+
167+
### 5. Include Gmail thread wikis
168+
169+
```bash
170+
node recipes/wiki-compiler/compile-wiki.mjs --gmail --gmail-limit 10
171+
```
172+
173+
## Scheduling
174+
175+
This is designed to run on demand **or** on a schedule.
176+
177+
### Daily light compile
178+
179+
```bash
180+
node recipes/wiki-compiler/compile-wiki.mjs \
181+
--edge-limit 25 \
182+
--entity-batch-limit 15 \
183+
--topic autobiography \
184+
--gmail \
185+
--gmail-limit 5
186+
```
187+
188+
### Weekly deep compile
189+
190+
```bash
191+
node recipes/wiki-compiler/compile-wiki.mjs \
192+
--edge-limit 150 \
193+
--entity-batch-limit 75 \
194+
--topic autobiography \
195+
--gmail \
196+
--re-evaluate
197+
```
198+
199+
### Cron example
200+
201+
```cron
202+
0 6 * * * cd /path/to/OB1 && /usr/bin/env node recipes/wiki-compiler/compile-wiki.mjs --edge-limit 25 --entity-batch-limit 15 --topic autobiography >> logs/wiki-compiler.log 2>&1
203+
```
204+
205+
### Agent-driven schedule
206+
207+
If you prefer Claude Code / Codex style scheduled runs, point the agent at:
208+
209+
```bash
210+
node recipes/wiki-compiler/compile-wiki.mjs --edge-limit 25 --entity-batch-limit 15 --topic autobiography
211+
```
212+
213+
The important contract is the same regardless of scheduler:
214+
215+
- write new information into SQL first
216+
- regenerate the compiled wiki from the source tables
217+
- do not manually edit generated wiki pages
218+
219+
## Output
220+
221+
By default the wrapper writes to:
222+
223+
```text
224+
compiled-wiki/
225+
entities/ # entity-wiki output when using file mode
226+
topics/ # wiki-synthesis topic output
227+
compile-manifest.json # run summary and phase statuses
228+
```
229+
230+
Other outputs are owned by the underlying recipes:
231+
232+
- `data/wiki-synthesis-state.jsonl` for Gmail thread synthesis state
233+
- `public.thought_edges` for typed reasoning links
234+
- `public.entities`, `public.edges`, `public.thought_entities` for graph extraction
235+
236+
## Important Design Rule
237+
238+
Do **not** treat generated wiki pages as the source of truth.
239+
240+
This recipe is built around the opposite rule:
241+
242+
- capture raw facts and source material in Open Brain first
243+
- compile the wiki from that source
244+
- regenerate instead of manually patching summaries
245+
246+
That is what prevents wiki drift.
247+
248+
## Expected Outcome
249+
250+
After a successful run you should have:
251+
252+
- a refreshed entity graph fed by structured Open Brain data
253+
- reasoning edges in `thought_edges`
254+
- per-entity wiki pages
255+
- topic synthesis pages
256+
- optionally Gmail thread wiki artifacts
257+
- a manifest recording which phases ran and whether they succeeded
258+
259+
At that point you can browse the compiled layer like a wiki while keeping SQL as the authority underneath it.
260+
261+
## Troubleshooting
262+
263+
**Issue: entity extraction phase skips itself**
264+
Solution: set `ENTITY_EXTRACTION_WORKER_URL` plus `ENTITY_EXTRACTION_MCP_ACCESS_KEY`, or define `OPEN_BRAIN_URL` plus `MCP_ACCESS_KEY` so the wrapper can derive the worker endpoint.
265+
266+
**Issue: typed-edge classification fails**
267+
Solution: confirm `ANTHROPIC_API_KEY` is available and that the `typed-reasoning-edges` schema is installed on the target brain.
268+
269+
**Issue: entity pages fail with missing relation/table errors**
270+
Solution: confirm the entity extraction schema and worker have been applied and allowed to populate `entities`, `edges`, and `thought_entities`.
271+
272+
**Issue: Gmail wiki backfill fails**
273+
Solution: confirm your email thoughts use the expected Gmail metadata shape and that `thought_edges` exists.
274+
275+
**Issue: topic synthesis writes the wrong pages**
276+
Solution: pass explicit `--topic` and `--scope key=value` flags so the wrapper only runs the slice you want.

0 commit comments

Comments
 (0)