Skip to content

feat(knowledge-base): self-curating knowledge base (OKF pages + folder missions)#2455

Draft
nicoloboschi wants to merge 3 commits into
mainfrom
feat/knowledge-pages-okf
Draft

feat(knowledge-base): self-curating knowledge base (OKF pages + folder missions)#2455
nicoloboschi wants to merge 3 commits into
mainfrom
feat/knowledge-pages-okf

Conversation

@nicoloboschi

Copy link
Copy Markdown
Collaborator

Draft / WIP — opening for review of the design and structure. A few items below are still open.

What

A server-side Knowledge Base: a hierarchy of folders and pages over mental models, with a mission-driven curator that maintains pages automatically, plus a control-plane UI (file-tree + constellation). Pages project to the Open Knowledge Format (markdown + YAML frontmatter).

Model

  • Folders carry a mission. Pages are backed by a mental model (mental_model_id) and marked managed (curator-owned) vs pinned (human).
  • Curator (per folder, independent): after each consolidation (and on folder/mission creation), it reads the new memories since the folder was last curated (a delta — not a recall) and emits ops: create_page / merge_pages / delete_page / create_subfolder (bounded: depth ≤ 3, ≤ 8 sub-folders).
  • Runs as an async operation (curate_folder task), same machinery as the mental-model refresh — no manual trigger.
  • Curator pages use trigger {mode: delta, fact_types: [observation], exclude_mental_models: true, refresh_after_consolidation: true} so they synthesize from consolidated observations and stay current.

Changes

Server (hindsight-api-slim)

  • New knowledge_pages table (PG + Oracle) + mission / managed / last_curated_at columns; added to BACKUP_TABLES; partial unique index on (folder, page-name) to make concurrent curator runs dedup-safe.
  • api/okf.py — OKF serializer (frontmatter + body, index.md/log.md, tag/folder constellation graph).
  • engine/knowledge_curator.py — the curator (LLM op plan + safe apply).
  • MemoryEngine — folder/page CRUD, tree, curate, async submit + worker handler.
  • /v1/default/banks/{bank}/knowledge-base/* endpoints (tree, folders, pages, nodes, graph, export).

Control plane

  • knowledge-base-view.tsx — folder/file tree (create folder/page, missions, delete) + constellation toggle, OKF page panel, OKF bundle export. Sidebar entry + proxies + i18n (all locales).

Generated — OpenAPI + SDK clients + docs-skill regenerated.

Tests

  • test_okf.py (unit), test_knowledge_base.py (HTTP: tree/move/cascade/graph/export), test_knowledge_curator.py (deterministic apply ops + dedup guard), test_knowledge_curator_e2e.py (hs_llm_core, full pipeline).

Open items (why draft)

  • Page content quality with a weak model: the mental-model refresh's agentic reflect occasionally doesn't recall well → thin/placeholder content. The observation-only + exclude_mental_models trigger mitigates it; a forced recall+synthesize path is the likely follow-up.
  • Off-mission page creation is reduced (prompt + per-folder new-memory delta) but still LLM-judgment-dependent.
  • Curator concurrency is dedup-safe at the DB level (unique index); the per-folder in-process lock is best-effort.

…r missions)

Server-side knowledge base: a hierarchy of folders and pages over mental
models, projected to the Open Knowledge Format, with a mission-driven curator
that maintains pages automatically after each consolidation.

- knowledge_pages table (PG + Oracle): parent_id tree, kind folder/page,
  mission, managed, last_curated_at; partial unique index on (folder, name)
  for concurrency-safe dedup; added to BACKUP_TABLES.
- api/okf.py: OKF serializer (frontmatter + body, index/log, constellation graph).
- engine/knowledge_curator.py: folder curator (LLM op plan + safe apply); reads
  new memories since last curation (delta, not recall); ops create/merge/delete
  page + spawn sub-folder (bounded depth<=3, <=8). Runs as an async curate_folder
  task on folder/mission create and after consolidation. Curator pages use an
  observation-only delta trigger with exclude_mental_models.
- MemoryEngine: folder/page CRUD, tree, curate, async submit + worker handler.
- /v1/default/banks/{bank}/knowledge-base/* endpoints.
- Control plane: knowledge-base tree view + constellation toggle, missions,
  OKF page panel + bundle export; proxies, client, sidebar, i18n.
- Tests: okf unit, knowledge-base HTTP, curator apply + dedup guard, hs_llm_core e2e.
- Regenerated OpenAPI + SDK clients + docs-skill.
Add @vectorize-io/hindsight-fs, a CLI under hindsight-tools/ that mirrors a
Hindsight bank's mental models as real markdown files (YAML frontmatter + body)
in a local directory, refreshed from the API on an interval. Once mounted,
ordinary shell tools (ls, cat, grep, find, ...) work against current memory.

- Pull-based sync engine: full list each tick, write changed/new/tampered
  files, skip unchanged (content-hashed), prune deleted models. Atomic writes;
  a transient API error never wipes the mirror.
- One-way mirror enforced two ways: files are read-only (0444) so agent edits
  fail with EACCES, plus a tamper-revert backstop that compares on-disk bytes
  and overwrites drift on the next pass. --writable opts out.
- Commands: mount/start/stop/restart/sync/status/list/logs/unmount. Background
  daemon via detached process + pidfile; per-mount config is remembered.
- status doubles as a healthcheck: --json report and a non-zero exit when the
  mount is dead/failed/stale (--stale-after overrides the threshold).
- Tests: unit (sync engine, frontmatter, health) + e2e that spawns the real
  CLI against a mock API and exercises real bash commands. 26 tests.
…dels

Re-point hindsight-fs at the knowledge base so it projects a bank's folder/page
hierarchy as nested directories + .md files, instead of a flat list of mental
models.

- client: fetch GET /knowledge-base/tree + /export (two calls, any bank size)
  and join by page id; replaces the paginated mental-models list.
- format: planMirror() walks the tree into folder dirs + page files at nested
  paths (slug per segment, collision-safe); pages render the page's OKF doc.
- sync: create folder dirs, write pages at nested paths, prune removed pages and
  emptied folders; state keyed by relative path + tracked dirs.
- config/cli: drop the mental-model `detail` flag; `list` prints folders+pages;
  help/README updated. Tests rewritten for the tree/export model.

Verified live against a bank's knowledge base: the `people` folder mirrors to
people/anna.md + people/marco.md with OKF frontmatter.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant