Wiki Knowledge Base

Knowledge that compounds. A structured knowledge system where AI agents actively maintain, improve, and compound understanding over time — not just retrieve it.

The Problem We're Solving

Every AI application hits the same wall: knowledge doesn't accumulate.

RAG retrieves the same documents every time. Chat history gets compressed and forgotten. The agent answers the same question differently each session. Nothing compounds.

Karpathy named this in April 2026: "LLMs re-derive knowledge from scratch on every query. There's no accumulation." His LLM Wiki proposed a shift — compile knowledge at write-time, not retrieve at query-time — and his GitHub Gist got 14,000 stars in a week.

Garry Tan built gbrain around the same insight: every entity gets a Markdown file with a compiled truth (current best understanding) on top and an append-only timeline (evidence chain) on the bottom. When truth and evidence conflict, evidence wins. Structure defeats hallucination.

We started from these ideas. Then we spent three months running a production knowledge base — 58 pages created, automated scripts built, schema versions iterated — and learned what actually works and what doesn't.

This project is what we learned.

What Makes Wiki KB Different

1. Born from Production, Not Theory

Most knowledge management projects are designed first, deployed never. We ran Wiki KB in production for months with a real AI agent (Hermes) maintaining knowledge from daily conversations, research sessions, and project work. The v4 architecture is the result of cutting everything that sounded good but didn't produce value:

We built	What happened	Decision
Dream Cycle (LLM audit)	Cron ran but produced surface-level feedback. Real quality came from targeted wiki_review on individual pages.	Killed. Manual review > automated audit.
memory_to_wiki sync	OpenViking search API doesn't index memories. Script ran but wrote nothing.	Killed. Fix the upstream first.
auto_index (knowledge graph)	graph.json generated but nothing consumed it. Relations in Markdown wikilinks are sufficient.	Killed. YAGNI.
9 page types	Agents confused about routing. `tool` vs `entity` vs `project` — same thing, different label.	3 types. concept, entity, person.
Cron pipeline (7 jobs)	Only 2 STARTED entries in cron.log after weeks.	No cron. Agent triggers on demand.

The schema IS the product. Fewer types means the agent classifies correctly more often. Fewer scripts means less maintenance. Fewer cron jobs means less dead code.

2. Compiled Truth + Timeline — Structure Beats Prompts

Every page has two layers:

┌─────────────────────────────┐
│  COMPILED TRUTH (rewritable) │  ← Current best understanding
│  Executive Summary           │     Rewritten as a whole, never appended
│  Key Facts                   │
│  Relations                   │
├─────────────────────────────┤
│  TIMELINE (append-only)      │  ← Evidence chain
│  - 2026-04-12 | Event...     │     Timestamped, sourced, immutable
│  - 2026-04-15 | Update...    │
└─────────────────────────────┘

When the summary contradicts the timeline, timeline wins. This isn't a prompt engineering trick — it's a structural constraint that makes AI-generated knowledge auditable and self-correcting.

Quality gates enforce this: wiki_review promotes draft → active only when summary ≥50 chars and key_facts ≥2 items. In our v4 refactoring, this took quality from 13% (8/58 pages passing) to 100% (40/40).

3. Three Types. Not Nine.

Previous versions had concept, entity, person, project, meeting, idea, comparison, query, tool. The agent constantly misrouted pages. The real insight: the type system should reduce cognitive load, not increase it.

Type	Directory	What goes here
`concept`	`concepts/`	Everything that's not a concrete entity — frameworks, analyses, methodologies, comparisons, meeting notes, ideas. The default.
`entity`	`entities/`	Things with clear boundaries — products, tools, platforms, organizations. Anything the entity registry can manage.
`person`	`people/`	People. Read-only — agents don't auto-create person pages.

Simple rule: if you can register it in the entity registry, it's an entity. Otherwise, it's a concept.

4. Graceful Degradation Everywhere

Production taught us that dependencies fail. OpenViking goes down, NAS python breaks, MCP sessions expire. Every component has a fallback:

Search: OpenViking semantic search → local file search (automatic, zero config)
Auth: API key set → Bearer auth. Unset → open access (safe for LAN)
Session: MCP StreamableHTTP idle timeout → 24 hours (monkey-patched)
Quality: OpenViking returns garbage "Untitled" results → filter + fall back to local

The system is designed to be maximally useful with minimal dependencies. Docker + Markdown files = everything works. OpenViking = search gets smarter. That's it.

5. MCP Native — Knowledge as a Tool

Wiki KB is a standard MCP Server with 15 tools. Any MCP-compatible agent can call it directly — no SDK, no adapter, no framework lock-in. The agent doesn't "query a knowledge base" — it uses knowledge tools alongside every other tool it has.

Agent tool belt: web_search, file_edit, terminal, ... wiki_search, wiki_create, wiki_update

This means switching from Claude to GPT to an open-source model costs nothing — your knowledge stays, the tools stay, only the model changes.

Architecture

┌─────────────────────────────────────────────────┐
│                   AI Agent                       │
│  (Claude / GPT / GLM / any MCP-compatible)      │
│                                                  │
│  ┌──────────┐  ┌──────────┐  ┌───────────────┐  │
│  │ wiki_    │  │ wiki_    │  │ entity_       │  │
│  │ search   │  │ create   │  │ resolve       │  │
│  │ wiki_get │  │ wiki_    │  │ entity_register│  │
│  │ wiki_    │  │ update   │  │ entity_list   │  │
│  │ list     │  │ wiki_    │  │ entity_merge  │  │
│  │ wiki_    │  │ review   │  │               │  │
│  │ health   │  │ wiki_    │  │               │  │
│  │ wiki_    │  │ undo     │  │               │  │
│  │ stats    │  │ wiki_log │  │               │  │
│  └────┬─────┘  └────┬─────┘  └───────┬───────┘  │
│       │              │                │          │
└───────┼──────────────┼────────────────┼──────────┘
        │         MCP (HTTP)            │
        ▼              ▼                ▼
┌─────────────────────────────────────────────────┐
│              Wiki KB MCP Server                   │
│                                                  │
│  ┌──────────────┐  ┌──────────────────────────┐  │
│  │   Markdown   │  │     Entity Registry      │  │
│  │   Wiki Pages │  │  (ID + Alias Index)      │  │
│  │              │  │                          │  │
│  │ concepts/    │  │  openviking → ent_a1b2c3 │  │
│  │ entities/    │  │  ov → ent_a1b2c3         │  │
│  │ people/      │  │  hermes → ent_d4e5f6     │  │
│  └──────┬───────┘  └──────────────────────────┘  │
│         │                                        │
│    ┌────┴────┐                                   │
│    │  Git    │  Auto-commit on every write       │
│    └─────────┘                                   │
└─────────────────┬───────────────────────────────┘
                  │ (optional)
                  ▼
┌─────────────────────────────────────────────────┐
│              OpenViking                          │
│         Semantic Search Backend                  │
│    (vector search + local fallback)              │
└─────────────────────────────────────────────────┘

Quick Start

Minimal (just Docker)

git clone https://github.com/SonicBotMan/wiki-kb.git
cd wiki-kb
cp .env.example .env
docker compose up -d --build

# Verify
curl -s -X POST http://localhost:8764/mcp \
  -H "Content-Type: application/json" \
  -H "Accept: application/json, text/event-stream" \
  -d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"1.0"}}}'

That's it. Wiki CRUD and file search work immediately. No LLM API key, no vector database, no external services.

With Semantic Search (+ OpenViking)

Add to .env:

OPENVIKING_ENDPOINT=http://localhost:1933
OPENVIKING_API_KEY=your-key
OPENVIKING_ACCOUNT=hermes
OPENVIKING_USER=default

Search upgrades from substring matching to vector semantic search with automatic local fallback.

MCP Tools (15)

Knowledge Operations (10)

Tool	What it does
`wiki_search`	Semantic search (OpenViking) with automatic local fallback
`wiki_get`	Read full page — summary, facts, relations, timeline
`wiki_create`	Create page → auto-route directory → register entity → draft
`wiki_update`	Update a section (summary / facts / relations)
`wiki_append_timeline`	Add timestamped, sourced event to timeline
`wiki_list`	List pages with type/status filtering
`wiki_health`	System health: registry, disk, OpenViking connectivity
`wiki_review`	Quality gate: promotes draft → active (summary ≥50 chars, facts ≥2)
`wiki_undo`	Revert last N auto-commits (only `[wiki-brain]` prefixed)
`wiki_log`	View commit history

Entity Registry (4)

Tool	What it does
`entity_resolve`	Name/alias → entity (fuzzy matching)
`entity_register`	Register entity with ID + aliases
`entity_list`	List/filter entities
`entity_merge`	Deduplicate: merge entity A into B

System (1)

Tool	What it does
`wiki_stats`	Page counts, type breakdown, registry stats

Page Format

---
title: Hermes Agent
type: entity
status: active
created: 2026-04-12
updated: 2026-04-18
tags: [ai-agent, self-evolving]
---

# Hermes Agent

## Executive Summary
Self-evolving AI agent framework. Open-source CLI agent that improves its own
skills through conversation-driven learning and scheduled autonomous tasks.

## Key Facts
- 65k+ GitHub stars, active community
- Skill system: procedural memory that compounds over sessions
- Memory: built-in persistent memory + external providers (OpenViking, Mem0)
- MCP native: extensible via Model Context Protocol servers

## Relations
| Relation | Target | Description |
|----------|--------|-------------|
| uses | [[openviking]] | Semantic search backend |
| related | [[gbrain]] | Inspired compiled truth pattern |

---

## Timeline

- **2026-04-12** | Page created
  [Source: wiki_mcp_server]
- **2026-04-18** | v4 refactoring: 58→40 pages, 9→3 types
  [Source: Hermes session]

The Rules

Zone	Rule	Why
Executive Summary	Rewrite as a whole	Current best understanding, not a changelog
Key Facts	Structured, referenceable	Agents can cite individual facts
Relations	Typed wikilinks	`uses`, `part-of`, `contrasts`, `evolved-from`
`---` separator	Never remove	The line between truth and evidence
Timeline	Append only, never edit	Evidence chain must be immutable

Directory Structure

wiki-kb/                          # Code repository
├── scripts/                      # MCP server + utilities (bind-mounted)
│   ├── wiki_mcp_server.py        # 15 MCP tools
│   ├── wiki_config.py            # Centralized configuration
│   ├── wiki_utils.py             # Frontmatter, relations, path routing
│   ├── entity_registry.py        # Entity ID + alias management
│   ├── wiki_health_monitor.py    # Health monitoring
│   ├── wiki_quality_check.py     # Page quality validation
│   ├── wiki-to-notion.py         # Optional Notion sync
│   ├── wiki-backup.sh            # Backup script
│   └── wiki-cron-wrapper.sh      # Cron job wrapper
├── tests/                        # Unit tests
├── Dockerfile
├── docker-compose.yml
├── SCHEMA.md                     # Full schema specification (v4)
└── README.md

wiki/                             # Data (bind-mounted into container)
├── concepts/                     # Default type — frameworks, analyses, ideas
├── entities/                     # Products, tools, platforms, orgs
├── people/                       # People (read-only)
├── system/                       # System pages (wiki-health)
├── raw/                          # Source materials (excluded from search)
├── logs/                         # Runtime logs
└── registry.json                 # Entity registry

Environment Variables

Variable	Required	Description
`WIKI_ROOT`	No	Wiki data root inside container (default: `/data`)
`MCP_PORT`	No	Server port (default: `8764`)
`MCP_API_KEY`	No	Auth key. Unset = open access (safe for LAN)
`OPENVIKING_ENDPOINT`	No	Enables semantic search
`OPENVIKING_API_KEY`	No	OpenViking auth
`NOTION_API_KEY`	No	Enables Notion sync

What We Learned (Design Rationale)

Why 3 types instead of 9

We started with 9 types (concept, entity, person, project, meeting, idea, comparison, query, tool). The agent constantly misrouted: is a "product comparison" a comparison or a concept? Is a "project update" a project or entity? Every ambiguous page was a classification error.

3 types eliminate ambiguity: things you can register in the entity registry → entity. People → person. Everything else → concept. Classification accuracy went from ~70% to ~95%.

Why we killed Dream Cycle

Dream Cycle was supposed to be the killer feature — an LLM that audits all pages nightly, detects contradictions, fills gaps. In practice, the cron job either didn't run (dead pipeline) or produced surface-level feedback ("this page could use more detail"). The real quality improvement came from wiki_review — targeted, on-demand quality checks with concrete pass/fail criteria.

Why no cron pipeline

We had 7 cron jobs. After weeks, only 2 had ever started. The agent triggers wiki operations on demand during conversations — that's when context is available and decisions are meaningful. Scheduled automation without context produces noise.

Why data stays local

We initially pushed wiki data to a private GitHub repo for "backup." But the NAS has RAID, the data contains personal information, and adding a remote just meant another thing to maintain. Git history on the NAS provides rollback. That's enough.

Why MCP and not a REST API

MCP makes knowledge a first-class tool in the agent's tool belt — not an external service it has to "call." The agent uses wiki_search the same way it uses web_search or file_edit. This changes the mental model from "query a database" to "use knowledge as a tool."

Inspired By

Karpathy's LLM Wiki — The original insight: compile knowledge at write-time, not retrieve at query-time
Garry Tan's gbrain — Compiled Truth + Timeline pattern, Dream Cycle automation, production-hardened at scale (10,000+ pages)
LLM Wiki v2 — Memory lifecycle, knowledge graph, hybrid search extensions
元Skill方法论 — "好东西 = 总结出来 ≠ 设计出来" — knowledge compounds through summarization, not design

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.githooks		.githooks
.github/workflows		.github/workflows
docs		docs
scripts		scripts
tests		tests
wiki		wiki
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
README_zh.md		README_zh.md
SCHEMA.md		SCHEMA.md
docker-compose.production.yml		docker-compose.production.yml
docker-compose.yml		docker-compose.yml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
wiki-kb-sync.sh		wiki-kb-sync.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wiki Knowledge Base

The Problem We're Solving

What Makes Wiki KB Different

1. Born from Production, Not Theory

2. Compiled Truth + Timeline — Structure Beats Prompts

3. Three Types. Not Nine.

4. Graceful Degradation Everywhere

5. MCP Native — Knowledge as a Tool

Architecture

Quick Start

Minimal (just Docker)

With Semantic Search (+ OpenViking)

MCP Tools (15)

Knowledge Operations (10)

Entity Registry (4)

System (1)

Page Format

The Rules

Directory Structure

Environment Variables

What We Learned (Design Rationale)

Why 3 types instead of 9

Why we killed Dream Cycle

Why no cron pipeline

Why data stays local

Why MCP and not a REST API

Inspired By

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors 2

Languages

Folders and files

Latest commit

History

Repository files navigation

Wiki Knowledge Base

The Problem We're Solving

What Makes Wiki KB Different

1. Born from Production, Not Theory

2. Compiled Truth + Timeline — Structure Beats Prompts

3. Three Types. Not Nine.

4. Graceful Degradation Everywhere

5. MCP Native — Knowledge as a Tool

Architecture

Quick Start

Minimal (just Docker)

With Semantic Search (+ OpenViking)

MCP Tools (15)

Knowledge Operations (10)

Entity Registry (4)

System (1)

Page Format

The Rules

Directory Structure

Environment Variables

What We Learned (Design Rationale)

Why 3 types instead of 9

Why we killed Dream Cycle

Why no cron pipeline

Why data stays local

Why MCP and not a REST API

Inspired By

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 2

Languages

Packages