Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion explorations/agent-wiki/skills/scripts/build_agent_wiki.py
Original file line number Diff line number Diff line change
Expand Up @@ -426,7 +426,9 @@ def _render_summary_md(summary: dict, recalled: list[dict], arc_slug: str = "",
body.append("## Key turns")
body.append("")
for kt in key_turns:
body.append(f"- {kt}")
# rstrip: a truncated tool command can end in whitespace, which
# otherwise leaves a trailing-whitespace line (git diff --check).
body.append(f"- {str(kt).rstrip()}")
body.append("")
if recalled:
body.append("## Recalled guidelines")
Expand Down
182 changes: 182 additions & 0 deletions explorations/agent-wiki/wikis/wiki-twobatch-both/AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,182 @@
# AGENTS.md — how an agent should read this wiki

This wiki is **evidence-grounded guidelines distilled from agent
trajectories**. Every page links back to the trajectory it came from, so any
recommendation is auditable and revisable.

You — the agent — should consult this wiki **once you know the task or
sub-task you are about to do**. Not at session start (too vague), not as a
last resort when stuck (too late). The right moment is after the user states
their request and you've decided what task family it belongs to, before you
start writing code.

## When to read me

Trigger conditions, any one of which should prompt a wiki check:

- You're about to author non-trivial code in a problem space the wiki has
documented (build a CLI, parse a structured file format, automate a
browser flow, design a TUI, run an experiment, ship a PR through review).
- The user mentions a topic that resembles entries in `_index.jsonl`'s
`tags` or `trigger` fields.
- You're about to make an architectural choice (mode-as-subcommand vs
options, env-var vs flag, cluster duplicates vs leave-as-is).
- A sub-task has been identified (you're now in the middle of a
multi-step plan and the next step has its own narrow scope).

Don't read for trivial tasks (typo fix, single-line refactor) or topics
clearly outside the wiki's scope (the corpus is finite — see
`guidelines/index.md` for the topical surface).

## Structure

The wiki has three top-level sections, all under the wiki root:

```
<wiki-root>/
├── AGENTS.md ← this file
├── index.md ← human-friendly overview
├── _config.yaml ← taxonomy: tags, clusters, tasks, family overrides
├── _index.jsonl ← agent retrieval index (one row per page)
├── summaries/
│ ├── index.md ← section index (regenerated by catalog)
│ ├── <session_id>.md ← single summary per session
│ └── <session_id>__<arc-slug>.md ← multi-arc session split
├── guidelines/
│ ├── index.md ← section index (regenerated by catalog)
│ ├── <slug>__<gid>.md ← atomic guideline (one rule); `<gid>` matches the `id:` frontmatter
│ ├── <slug>__cluster.md ← themed aggregator (recall-preferred)
│ └── _id_index.json ← guideline id → relpath
├── skills/
│ ├── index.md ← section index (regenerated by catalog)
│ ├── <slug>/SKILL.md ← callable workflow page (recall-preferred over guidelines)
│ ├── <slug>/scripts/<file> ← optional supporting scripts (run via Bash)
│ └── _id_index.json ← skill slug → relpath
└── tasks/
├── index.md ← section index (regenerated by catalog)
├── <slug>__task.md ← cross-session comparison
└── <slug>__subtask.md ← per-session workstream
```

**Filename suffixes are the navigation contract.** A page's role is decided
by its suffix; the wiki's tooling and other agents rely on it. Don't edit
the suffix.

## The retrieval index — read this first

`_index.jsonl` has one JSON object per line, one line per
guideline/cluster/skill/task/subtask page. The schema:

```json
{
"kind": "guideline" | "cluster" | "skill" | "task" | "subtask",
"id": "<12-hex-char content hash, OR cluster:<slug>, OR skill:<slug>, OR task:<slug>, OR subtask:<slug>>",
"title": "<short title>",
"tags": ["...", "..."],
"trigger": "<situational context when this applies — empty for clusters and tasks>",
"summary": "<one-paragraph snippet, ≤240 chars>",
"link": "<relative path inside the wiki>",
"cluster": "<slug if this guideline is a cluster member, else null>",
"superseded_by": "<cluster page name when this atomic is part of a cluster>",
"priority": "<\"high\" on cluster rows>",
"members": ["<id>", "..."] // on cluster rows
}
```

Rows are sorted **clusters first, then skills, then atomic guidelines, then
tasks**. Cluster pages are *aggregators* — when a cluster matches your
query, it references its member atomic guidelines; you usually don't need
to read the members directly unless you want the original wording or its
source trajectory.

**Skills** (`kind: "skill"`) live at `<wiki>/skills/<slug>/SKILL.md`.
They're callable workflow pages: a structured Overview / When To Use /
Workflow / (optional) supporting scripts under `<slug>/scripts/`. When a
skill row matches your task, prefer it over a same-trigger guideline —
the SKILL.md tells you exactly what to do (and may point at sibling
scripts you can run via Bash). Skills are **recall-preferred over
guidelines** because they're directly executable; an atomic guideline is
free-text advice you have to interpret.

## How to retrieve (advisory)

There's no mandated scoring algorithm. A reasonable recipe:

1. **Parse the user's request + your current task plan** for keywords +
topical tags.
2. **Read `_index.jsonl`** end-to-end. It's small (typically 50–200 rows).
3. **Filter** rows whose `tags` overlap your topical tags, OR whose
`trigger` substring-matches your task description.
4. **Prefer cluster pages** when both a cluster and its members match —
the cluster gives you the consolidated rule plus links down. Each
member's `superseded_by:` field tells you which cluster supersedes it.
5. **Read the top 2–5** matches (clusters + standalone atomics not
superseded by any matched cluster). For each, follow the `link` and
read the page body.
6. **Decide** which guidelines apply to your current task. State them
briefly to the user before acting if helpful, especially when a
guideline overrides what they asked for.

Your judgment is the scoring function. Don't read every row.

## Provenance

Every page links back to its source. When you cite a guideline in your
response or stake a non-trivial decision on one, the chain to follow is:

```
guideline.md
↓ frontmatter `related_summary:`
summaries/<session_id>[.md or __<arc>.md]
↓ frontmatter `sources:` (normalized JSON path + raw transcript path)
trajectories/<session_id>.json
↓ source.transcript_path
~/.claude/projects/.../<session_id>.jsonl
```

Cluster pages list their member atomic guidelines in their frontmatter
`members:` list and in the body's "## Members" section. Each member has
its own provenance — clusters don't replace member-level provenance, they
aggregate it.

## Worked example

User asks: *"I'm building a CLI tool with two modes (read and write)
plus a bunch of options. Should each mode be a subcommand or a flag?"*

Procedure:

1. **Task tags**: `cli`, `ux`, `architecture`, `subcommands`.
2. **Read `_index.jsonl`**. Filter for any row tagged `cli`, `ux`, or
`workspace`.
3. Top hits (hypothetical):
- `cluster:multi-subproject-workspace-conventions` (priority high; tags
include `workspace`, `cli`, `conventions`).
- `474bb2ba1076` "Promote a feature mode to a top-level flag, not an
option" (atomic; tags include `cli`, `ux`, `workspace`).
4. **Prefer the cluster** — it consolidates several conventions including
the mode-as-subcommand rule. Read
`guidelines/multi-subproject-workspace-conventions__cluster.md`.
5. **Decide**: this confirms the user's question — promote each mode to a
subcommand; demote everything else to options under it.
6. **Cite**: respond with the recommendation and (optionally) link the
cluster page.

Total wiki tokens read: ~3 KB (one cluster page, plus a glance at one
atomic). Not a session-start preload; consult on-demand once the task is
clear.

## Bootstrapping notes

If `AGENTS.md` does not exist in a wiki, run
`uv run python explorations/agent-wiki/skills/scripts/build_agent_wiki.py
--wiki-root <wiki-root> catalog` — the bootstrap pass copies the template
in. After bootstrap, this file is yours to edit; subsequent catalog runs
do not overwrite an existing `AGENTS.md`.

## Skill wrapper

`agent-wiki:agent-wiki-consult` is a thin wrapper that asks the agent to
follow this file's recipe against a given wiki root. Use the skill when
you want a one-step "consult the wiki" entry point; read this file
directly when you want to understand the contract.
42 changes: 42 additions & 0 deletions explorations/agent-wiki/wikis/wiki-twobatch-both/_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
schema_version: 1

# Tags applied to atomic guideline pages, keyed by stable 12-hex content id
# (the `id:` frontmatter on each guideline page; mirrors `_id_index.json`).
tags:
guideline: {}
# When you author guidelines, add entries like:
# 04474b0794e6: [exif, stdlib, fallback, minimal-env]

# Themed groupings of related atomic guidelines. Members listed here get
# `cluster:` and `superseded_by:` frontmatter pointing at the cluster page.
clusters: {}
# exif-stdlib-fallback:
# title: EXIF stdlib parser fallback
# description: |
# When system EXIF tools and Python EXIF libraries are all unavailable,
# parse the JPEG bytes directly with stdlib `struct`.
# takeaway: |
# If the first one or two metadata tools fail, switch to a direct
# stdlib parse.
# members: [04474b0794e6, de04f5adde2e, 4746bf445108, 88989680a36a]
# tags: [exif, stdlib, fallback, minimal-env]

# Cross-trajectory comparison pages: one per task family. The `family_match`
# rules classify summaries; sessions named in `session_family_overrides`
# override the rules.
tasks: {}
# extract-focal-length:
# title: Extract focal length from JPEG EXIF
# family: focal-length
# family_match:
# goal_substring: [focal length]
# intro: |
# Question template: *what focal length was used to take @sample.jpg?*
# findings: |
# ...
# tags: [exif, focal-length, comparison]

# Optional: pin a session to a specific task family / trial / condition when
# the family_match rules are insufficient.
session_family_overrides: {}
# 00000000-0000-0000-0000-000000000000: {family: image-dims, trial: 0, condition: claude_md_strong}
Loading
Loading