Skip to content

Commit 6df4d79

Browse files
committed
explore(agent-wiki): add example wikis (companion to #268, rebased on main)
The four benchmark-derived example wikis built by the agent-wiki skills: wiki-twobatch {base, skills, both, pruned}. Rebuilt on merged main so the diff is wikis-only and conflict-free (the prior companion branch was based on the pre-merge code branch). Refreshed against #268's merged fixes: - regenerated each wiki via the merged builder's `catalog` - refreshed every wiki's AGENTS.md from the updated `_default_agents.md` seed (now lists the per-section index.md entries) Also fixes a builder bug the wikis surfaced: render-summary emitted `key_turns` bullets verbatim, so a truncated tool command ending in a space left a trailing-whitespace line (git diff --check). Now rstrips each key-turn; stripped the 30 already-generated summary pages to match. Generated artifacts — provenance back-links shown as trajectories/<id>.json.
1 parent 3e26154 commit 6df4d79

254 files changed

Lines changed: 10275 additions & 1 deletion

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

explorations/agent-wiki/skills/scripts/build_agent_wiki.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -426,7 +426,9 @@ def _render_summary_md(summary: dict, recalled: list[dict], arc_slug: str = "",
426426
body.append("## Key turns")
427427
body.append("")
428428
for kt in key_turns:
429-
body.append(f"- {kt}")
429+
# rstrip: a truncated tool command can end in whitespace, which
430+
# otherwise leaves a trailing-whitespace line (git diff --check).
431+
body.append(f"- {str(kt).rstrip()}")
430432
body.append("")
431433
if recalled:
432434
body.append("## Recalled guidelines")
Lines changed: 182 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,182 @@
1+
# AGENTS.md — how an agent should read this wiki
2+
3+
This wiki is **evidence-grounded guidelines distilled from agent
4+
trajectories**. Every page links back to the trajectory it came from, so any
5+
recommendation is auditable and revisable.
6+
7+
You — the agent — should consult this wiki **once you know the task or
8+
sub-task you are about to do**. Not at session start (too vague), not as a
9+
last resort when stuck (too late). The right moment is after the user states
10+
their request and you've decided what task family it belongs to, before you
11+
start writing code.
12+
13+
## When to read me
14+
15+
Trigger conditions, any one of which should prompt a wiki check:
16+
17+
- You're about to author non-trivial code in a problem space the wiki has
18+
documented (build a CLI, parse a structured file format, automate a
19+
browser flow, design a TUI, run an experiment, ship a PR through review).
20+
- The user mentions a topic that resembles entries in `_index.jsonl`'s
21+
`tags` or `trigger` fields.
22+
- You're about to make an architectural choice (mode-as-subcommand vs
23+
options, env-var vs flag, cluster duplicates vs leave-as-is).
24+
- A sub-task has been identified (you're now in the middle of a
25+
multi-step plan and the next step has its own narrow scope).
26+
27+
Don't read for trivial tasks (typo fix, single-line refactor) or topics
28+
clearly outside the wiki's scope (the corpus is finite — see
29+
`guidelines/index.md` for the topical surface).
30+
31+
## Structure
32+
33+
The wiki has three top-level sections, all under the wiki root:
34+
35+
```
36+
<wiki-root>/
37+
├── AGENTS.md ← this file
38+
├── index.md ← human-friendly overview
39+
├── _config.yaml ← taxonomy: tags, clusters, tasks, family overrides
40+
├── _index.jsonl ← agent retrieval index (one row per page)
41+
├── summaries/
42+
│ ├── index.md ← section index (regenerated by catalog)
43+
│ ├── <session_id>.md ← single summary per session
44+
│ └── <session_id>__<arc-slug>.md ← multi-arc session split
45+
├── guidelines/
46+
│ ├── index.md ← section index (regenerated by catalog)
47+
│ ├── <slug>__<gid>.md ← atomic guideline (one rule); `<gid>` matches the `id:` frontmatter
48+
│ ├── <slug>__cluster.md ← themed aggregator (recall-preferred)
49+
│ └── _id_index.json ← guideline id → relpath
50+
├── skills/
51+
│ ├── index.md ← section index (regenerated by catalog)
52+
│ ├── <slug>/SKILL.md ← callable workflow page (recall-preferred over guidelines)
53+
│ ├── <slug>/scripts/<file> ← optional supporting scripts (run via Bash)
54+
│ └── _id_index.json ← skill slug → relpath
55+
└── tasks/
56+
├── index.md ← section index (regenerated by catalog)
57+
├── <slug>__task.md ← cross-session comparison
58+
└── <slug>__subtask.md ← per-session workstream
59+
```
60+
61+
**Filename suffixes are the navigation contract.** A page's role is decided
62+
by its suffix; the wiki's tooling and other agents rely on it. Don't edit
63+
the suffix.
64+
65+
## The retrieval index — read this first
66+
67+
`_index.jsonl` has one JSON object per line, one line per
68+
guideline/cluster/skill/task/subtask page. The schema:
69+
70+
```json
71+
{
72+
"kind": "guideline" | "cluster" | "skill" | "task" | "subtask",
73+
"id": "<12-hex-char content hash, OR cluster:<slug>, OR skill:<slug>, OR task:<slug>, OR subtask:<slug>>",
74+
"title": "<short title>",
75+
"tags": ["...", "..."],
76+
"trigger": "<situational context when this applies — empty for clusters and tasks>",
77+
"summary": "<one-paragraph snippet, ≤240 chars>",
78+
"link": "<relative path inside the wiki>",
79+
"cluster": "<slug if this guideline is a cluster member, else null>",
80+
"superseded_by": "<cluster page name when this atomic is part of a cluster>",
81+
"priority": "<\"high\" on cluster rows>",
82+
"members": ["<id>", "..."] // on cluster rows
83+
}
84+
```
85+
86+
Rows are sorted **clusters first, then skills, then atomic guidelines, then
87+
tasks**. Cluster pages are *aggregators* — when a cluster matches your
88+
query, it references its member atomic guidelines; you usually don't need
89+
to read the members directly unless you want the original wording or its
90+
source trajectory.
91+
92+
**Skills** (`kind: "skill"`) live at `<wiki>/skills/<slug>/SKILL.md`.
93+
They're callable workflow pages: a structured Overview / When To Use /
94+
Workflow / (optional) supporting scripts under `<slug>/scripts/`. When a
95+
skill row matches your task, prefer it over a same-trigger guideline —
96+
the SKILL.md tells you exactly what to do (and may point at sibling
97+
scripts you can run via Bash). Skills are **recall-preferred over
98+
guidelines** because they're directly executable; an atomic guideline is
99+
free-text advice you have to interpret.
100+
101+
## How to retrieve (advisory)
102+
103+
There's no mandated scoring algorithm. A reasonable recipe:
104+
105+
1. **Parse the user's request + your current task plan** for keywords +
106+
topical tags.
107+
2. **Read `_index.jsonl`** end-to-end. It's small (typically 50–200 rows).
108+
3. **Filter** rows whose `tags` overlap your topical tags, OR whose
109+
`trigger` substring-matches your task description.
110+
4. **Prefer cluster pages** when both a cluster and its members match —
111+
the cluster gives you the consolidated rule plus links down. Each
112+
member's `superseded_by:` field tells you which cluster supersedes it.
113+
5. **Read the top 2–5** matches (clusters + standalone atomics not
114+
superseded by any matched cluster). For each, follow the `link` and
115+
read the page body.
116+
6. **Decide** which guidelines apply to your current task. State them
117+
briefly to the user before acting if helpful, especially when a
118+
guideline overrides what they asked for.
119+
120+
Your judgment is the scoring function. Don't read every row.
121+
122+
## Provenance
123+
124+
Every page links back to its source. When you cite a guideline in your
125+
response or stake a non-trivial decision on one, the chain to follow is:
126+
127+
```
128+
guideline.md
129+
↓ frontmatter `related_summary:`
130+
summaries/<session_id>[.md or __<arc>.md]
131+
↓ frontmatter `sources:` (normalized JSON path + raw transcript path)
132+
trajectories/<session_id>.json
133+
↓ source.transcript_path
134+
~/.claude/projects/.../<session_id>.jsonl
135+
```
136+
137+
Cluster pages list their member atomic guidelines in their frontmatter
138+
`members:` list and in the body's "## Members" section. Each member has
139+
its own provenance — clusters don't replace member-level provenance, they
140+
aggregate it.
141+
142+
## Worked example
143+
144+
User asks: *"I'm building a CLI tool with two modes (read and write)
145+
plus a bunch of options. Should each mode be a subcommand or a flag?"*
146+
147+
Procedure:
148+
149+
1. **Task tags**: `cli`, `ux`, `architecture`, `subcommands`.
150+
2. **Read `_index.jsonl`**. Filter for any row tagged `cli`, `ux`, or
151+
`workspace`.
152+
3. Top hits (hypothetical):
153+
- `cluster:multi-subproject-workspace-conventions` (priority high; tags
154+
include `workspace`, `cli`, `conventions`).
155+
- `474bb2ba1076` "Promote a feature mode to a top-level flag, not an
156+
option" (atomic; tags include `cli`, `ux`, `workspace`).
157+
4. **Prefer the cluster** — it consolidates several conventions including
158+
the mode-as-subcommand rule. Read
159+
`guidelines/multi-subproject-workspace-conventions__cluster.md`.
160+
5. **Decide**: this confirms the user's question — promote each mode to a
161+
subcommand; demote everything else to options under it.
162+
6. **Cite**: respond with the recommendation and (optionally) link the
163+
cluster page.
164+
165+
Total wiki tokens read: ~3 KB (one cluster page, plus a glance at one
166+
atomic). Not a session-start preload; consult on-demand once the task is
167+
clear.
168+
169+
## Bootstrapping notes
170+
171+
If `AGENTS.md` does not exist in a wiki, run
172+
`uv run python explorations/agent-wiki/skills/scripts/build_agent_wiki.py
173+
--wiki-root <wiki-root> catalog` — the bootstrap pass copies the template
174+
in. After bootstrap, this file is yours to edit; subsequent catalog runs
175+
do not overwrite an existing `AGENTS.md`.
176+
177+
## Skill wrapper
178+
179+
`agent-wiki:agent-wiki-consult` is a thin wrapper that asks the agent to
180+
follow this file's recipe against a given wiki root. Use the skill when
181+
you want a one-step "consult the wiki" entry point; read this file
182+
directly when you want to understand the contract.
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
schema_version: 1
2+
3+
# Tags applied to atomic guideline pages, keyed by stable 12-hex content id
4+
# (the `id:` frontmatter on each guideline page; mirrors `_id_index.json`).
5+
tags:
6+
guideline: {}
7+
# When you author guidelines, add entries like:
8+
# 04474b0794e6: [exif, stdlib, fallback, minimal-env]
9+
10+
# Themed groupings of related atomic guidelines. Members listed here get
11+
# `cluster:` and `superseded_by:` frontmatter pointing at the cluster page.
12+
clusters: {}
13+
# exif-stdlib-fallback:
14+
# title: EXIF stdlib parser fallback
15+
# description: |
16+
# When system EXIF tools and Python EXIF libraries are all unavailable,
17+
# parse the JPEG bytes directly with stdlib `struct`.
18+
# takeaway: |
19+
# If the first one or two metadata tools fail, switch to a direct
20+
# stdlib parse.
21+
# members: [04474b0794e6, de04f5adde2e, 4746bf445108, 88989680a36a]
22+
# tags: [exif, stdlib, fallback, minimal-env]
23+
24+
# Cross-trajectory comparison pages: one per task family. The `family_match`
25+
# rules classify summaries; sessions named in `session_family_overrides`
26+
# override the rules.
27+
tasks: {}
28+
# extract-focal-length:
29+
# title: Extract focal length from JPEG EXIF
30+
# family: focal-length
31+
# family_match:
32+
# goal_substring: [focal length]
33+
# intro: |
34+
# Question template: *what focal length was used to take @sample.jpg?*
35+
# findings: |
36+
# ...
37+
# tags: [exif, focal-length, comparison]
38+
39+
# Optional: pin a session to a specific task family / trial / condition when
40+
# the family_match rules are insufficient.
41+
session_family_overrides: {}
42+
# 00000000-0000-0000-0000-000000000000: {family: image-dims, trial: 0, condition: claude_md_strong}

0 commit comments

Comments
 (0)