|
| 1 | +--- |
| 2 | +name: mini-context-graph |
| 3 | +description: | |
| 4 | + A persistent, compounding knowledge base combining Karpathy's LLM Wiki pattern |
| 5 | + with a structured knowledge graph. Ingest documents once — the LLM writes wiki |
| 6 | + pages, extracts entities/relations into the graph, and stores raw content for |
| 7 | + evidence retrieval. Knowledge accumulates and cross-references; it is never |
| 8 | + re-derived from scratch. |
| 9 | +--- |
| 10 | + |
| 11 | +# Mini Context Graph Skill |
| 12 | + |
| 13 | +## The Core Idea |
| 14 | + |
| 15 | +Standard RAG re-discovers knowledge from scratch on every query. This skill is different: |
| 16 | + |
| 17 | +1. **Wiki layer** — The LLM writes and maintains persistent markdown pages (summaries, entity pages, topic syntheses). Cross-references are already there. The wiki gets richer with every ingest. |
| 18 | +2. **Graph layer** — Entities and relations are extracted once and stored as a navigable knowledge graph. BFS traversal answers structural queries without re-reading sources. |
| 19 | +3. **Raw source layer** — Original documents are stored immutably with chunks. Provenance links tie every graph node and edge back to the exact text that supports it. |
| 20 | + |
| 21 | +> The LLM writes; the Python tools handle all bookkeeping. |
| 22 | +
|
| 23 | +--- |
| 24 | + |
| 25 | +## Three Layers |
| 26 | + |
| 27 | +| Layer | Where | What the LLM does | What Python does | |
| 28 | +|-------|-------|-------------------|-----------------| |
| 29 | +| **Raw Sources** | `data/documents.json` | Reads (never modifies) | Stores chunks + metadata | |
| 30 | +| **Wiki** | `wiki/` (markdown) | Writes/updates pages | Manages index.md + log.md | |
| 31 | +| **Graph** | `data/graph.json` | Extracts entities + relations | Persists, deduplicates, traverses | |
| 32 | + |
| 33 | +--- |
| 34 | + |
| 35 | +## ⚡ Quick Start for Agents |
| 36 | + |
| 37 | +```python |
| 38 | +from scripts.contextgraph import ContextGraphSkill |
| 39 | +from scripts.tools import wiki_store |
| 40 | + |
| 41 | +skill = ContextGraphSkill() |
| 42 | + |
| 43 | +# ===== INGEST WITH FULL RAG + WIKI ===== |
| 44 | +# 1. Read references/ingestion.md and references/ontology.md first |
| 45 | +# 2. Extract entities and relations (LLM reasoning step) |
| 46 | +entities = [ |
| 47 | + {"name": "memory leak", "type": "issue", "supporting_text": "memory leaks cause crashes"}, |
| 48 | + {"name": "system crash", "type": "issue", "supporting_text": "system crashes due to memory leaks"}, |
| 49 | +] |
| 50 | +relations = [ |
| 51 | + {"source": "memory leak", "target": "system crash", "type": "causes", |
| 52 | + "confidence": 1.0, "supporting_text": "System crashes due to memory leaks."}, |
| 53 | +] |
| 54 | + |
| 55 | +result = skill.ingest_with_content( |
| 56 | + doc_id="doc_001", |
| 57 | + title="System Crash Analysis", |
| 58 | + source="/docs/incident_report.pdf", |
| 59 | + raw_content="System crashes due to memory leaks. Memory leaks occur when objects are not released.", |
| 60 | + entities=entities, |
| 61 | + relations=relations, |
| 62 | +) |
| 63 | +# result = {"doc_id": "doc_001", "chunk_count": 1, "nodes_added": 2, "edges_added": 1} |
| 64 | + |
| 65 | +# 3. Write a wiki summary page for this document |
| 66 | +wiki_store.write_page( |
| 67 | + category="summary", |
| 68 | + title="System Crash Analysis Summary", |
| 69 | + content="""--- |
| 70 | +title: System Crash Analysis |
| 71 | +source_document: doc_001 |
| 72 | +tags: [summary, incident] |
| 73 | +--- |
| 74 | +
|
| 75 | +# System Crash Analysis |
| 76 | +
|
| 77 | +**Source:** incident_report.pdf |
| 78 | +
|
| 79 | +## Key Claims |
| 80 | +
|
| 81 | +- [[memory-leak]] causes [[system-crash]] (confidence: 1.0) |
| 82 | +
|
| 83 | +## Entities |
| 84 | +
|
| 85 | +- [[memory-leak]] (issue) |
| 86 | +- [[system-crash]] (issue) |
| 87 | +""", |
| 88 | + summary="Incident report: memory leaks cause system crashes.", |
| 89 | +) |
| 90 | + |
| 91 | +# ===== QUERY WITH EVIDENCE ===== |
| 92 | +result = skill.query_with_evidence("Why does the system crash?") |
| 93 | +# Returns: {"query": ..., "subgraph": ..., "supporting_documents": [...], "evidence_chain": ...} |
| 94 | + |
| 95 | +# ===== WIKI SEARCH (read wiki before answering) ===== |
| 96 | +pages = wiki_store.search_wiki("memory leak") |
| 97 | +# Returns: [{slug, category, path, snippet}, ...] |
| 98 | +``` |
| 99 | + |
| 100 | +--- |
| 101 | + |
| 102 | +## Operations |
| 103 | + |
| 104 | +### Ingest |
| 105 | + |
| 106 | +When a user provides a new document: |
| 107 | + |
| 108 | +1. Read `references/ingestion.md` — entity/relation extraction rules. |
| 109 | +2. Read `references/ontology.md` — type normalization rules. |
| 110 | +3. Extract entities and relations using your LLM reasoning. |
| 111 | +4. Call `skill.ingest_with_content(...)` — stores raw content + chunks + graph nodes + provenance. |
| 112 | +5. **Write a wiki summary page** using `wiki_store.write_page(category="summary", ...)`. |
| 113 | +6. **Update entity pages** — for each new/updated entity, write or update `wiki_store.write_page(category="entity", ...)`. |
| 114 | +7. **Update topic pages** if the document touches an existing synthesis topic. |
| 115 | +8. A single document ingest will typically touch 3–10 wiki pages. |
| 116 | + |
| 117 | +### Query |
| 118 | + |
| 119 | +When a user asks a question: |
| 120 | + |
| 121 | +1. **Check the wiki first** — `wiki_store.search_wiki(query)` to find relevant pages. Read them. |
| 122 | +2. If the wiki has a good answer, synthesize from wiki pages (fast path). |
| 123 | +3. If deeper graph traversal is needed, call `skill.query_with_evidence(query)`. |
| 124 | +4. Return the answer with evidence citations from `supporting_documents`. |
| 125 | +5. If the answer is valuable, file it back as a new wiki topic page. |
| 126 | + |
| 127 | +### Lint |
| 128 | + |
| 129 | +Periodically health-check the wiki: |
| 130 | + |
| 131 | +```python |
| 132 | +from scripts.tools import wiki_store |
| 133 | +issues = wiki_store.lint_wiki() |
| 134 | +# Returns: {orphan_pages, missing_pages, broken_wikilinks, isolated_pages} |
| 135 | +``` |
| 136 | + |
| 137 | +Ask the LLM to review and fix: broken links, orphan pages, stale claims, missing cross-references. See `references/lint.md` for full lint workflow. |
| 138 | + |
| 139 | +--- |
| 140 | + |
| 141 | +## Ingestion Constraints |
| 142 | + |
| 143 | +- ❌ Do NOT hallucinate entities not present in the text |
| 144 | +- ❌ Do NOT add relations without explicit textual evidence |
| 145 | +- ❌ Do NOT add edges with confidence < 0.6 |
| 146 | +- ✅ Provide `supporting_text` for every entity and relation — this enables provenance |
| 147 | +- ✅ Write a wiki summary page for every ingested document |
| 148 | +- ✅ Update existing entity pages when new information arrives |
| 149 | +- ✅ Flag contradictions in wiki pages when new data conflicts with old claims |
| 150 | + |
| 151 | +--- |
| 152 | + |
| 153 | +## Retrieval Constraints |
| 154 | + |
| 155 | +- 🔒 Traversal depth MUST NOT exceed 2 (config: MAX_GRAPH_DEPTH) |
| 156 | +- 🔒 Only edges with confidence ≥ 0.6 (config: MIN_CONFIDENCE) |
| 157 | +- 🔒 Maximum 50 nodes returned (config: MAX_NODES) |
| 158 | +- ❌ Do NOT fabricate nodes or edges not in the graph |
| 159 | + |
| 160 | +--- |
| 161 | + |
| 162 | +## Full Python API Reference |
| 163 | + |
| 164 | +| Method | Purpose | When to Use | |
| 165 | +|--------|---------|-------------| |
| 166 | +| `skill.ingest_with_content(doc_id, title, source, raw_content, entities, relations)` | Full RAG ingest: raw docs + graph + provenance | Every new document | |
| 167 | +| `skill.add_node(name, node_type)` | Add single entity (no provenance) | Quick additions without a source doc | |
| 168 | +| `skill.add_edge(source_name, target_name, relation, confidence)` | Add single relation | Quick additions without a source doc | |
| 169 | +| `skill.query(query)` | Graph-only retrieval → subgraph | Structural queries | |
| 170 | +| `skill.query_with_evidence(query)` | Graph + provenance → subgraph + source chunks | Queries requiring citations | |
| 171 | +| `wiki_store.write_page(category, title, content, summary)` | Write/update a wiki page | After every ingest; after answering queries | |
| 172 | +| `wiki_store.read_page(category, title)` | Read a wiki page | Before answering; for cross-referencing | |
| 173 | +| `wiki_store.search_wiki(query)` | Keyword search across wiki | Fast path before graph traversal | |
| 174 | +| `wiki_store.list_pages(category)` | List all wiki pages | Getting an overview | |
| 175 | +| `wiki_store.get_log(last_n)` | Read recent operations | Understanding wiki history | |
| 176 | +| `wiki_store.lint_wiki()` | Health check | Periodic maintenance | |
| 177 | +| `documents_store.list_documents()` | List all ingested raw sources | Audit / provenance checking | |
| 178 | +| `documents_store.search_chunks(query)` | Chunk-level search | Finding specific evidence | |
| 179 | + |
| 180 | +--- |
| 181 | + |
| 182 | +## Design Philosophy |
| 183 | + |
| 184 | +> "The wiki is a persistent, compounding artifact. The cross-references are already there. The synthesis already reflects everything you've read." — Karpathy |
| 185 | +
|
| 186 | +| Layer | What Happens | Who Owns It | |
| 187 | +|-------|-----------|-------------| |
| 188 | +| **LLM Reasoning** | Extraction, synthesis, writing wiki pages | Agent (.md guidance files) | |
| 189 | +| **Wiki Persistence** | Index, log, file I/O | `wiki_store.py` | |
| 190 | +| **Graph Persistence** | Dedup, index, BFS traverse | `graph_store.py`, `retrieval_engine.py` | |
| 191 | +| **Raw Source Storage** | Immutable docs + chunks + provenance | `documents_store.py` | |
| 192 | + |
| 193 | +The human curates sources and asks questions. The LLM writes the wiki, extracts the graph, and answers with citations. Python handles all bookkeeping. |
| 194 | + |
0 commit comments