Commit 3e26154
authored
explore(agent-wiki): trajectory-derived wiki — skills, builder, experiments (#268)
* explore(agent-wiki): self-contained, public-safe agent-wiki exploration
Adds explorations/agent-wiki/ — the agent-wiki skill family, builder, design
+ schema docs, the wiki-helps experiment reports, and benchmark-derived
example wikis, all under one tree suitable for a public PR.
Contents:
- skills/ 7 agent-wiki skills + build_agent_wiki.py (reference copy,
not plugin-wired)
- docs/ design.md + schema.md
- experiments/ RESULTS-SUMMARY + twobatch comparison reports +
pruned-index-hypothesis; metrics/ rollups (no raw
transcripts); harness/ runner + compare scripts
- wikis/ wiki-terminalbench-bob + the twobatch arms
(base / skills / both / pruned-corrected)
Public-safety scrub:
- Excluded all raw per-trial sandbox transcripts (kept only metric
rollups + narrative reports).
- Excluded wikis built from internal corpora (procedural-design,
consult-meta, iterative, retroactive, simple-claude, test-paired,
claude) and the build-pattern comparison that ran on them; §3-4 of
RESULTS-SUMMARY reduced to a portable-finding note.
- Rewrote all source-path frontmatter to the generic
trajectories/<session-id>.json form; genericized internal example
names and the benchmark-data dir convention in skills/docs.
- Leak gate (benchmark-data / internal corpus + wiki names / org paths)
passes with zero hits across the tree.
Branched off main; diff touches only explorations/agent-wiki/. Builder
catalog + comparison scripts verified runnable from the new location.
* explore(agent-wiki): drop wiki-terminalbench-bob example
Removes the terminal-bench example wiki from the exploration. Repoints the
README reading-order + layout to wiki-twobatch-skills, fixes the docs that
attributed worked examples to it (schema.md now points at the wiki-twobatch
arms; example index rows retagged), and corrects stale relative links the
docs carried from the original tree (../plugin-source → ../skills,
../WIKIS.md removed, ../experiments/wiki-build-comparison.md → RESULTS-SUMMARY
§3–4, design.md/schema.md cross-links to renamed filenames). Skill example
paths (consult, ingest) repointed off the removed wiki.
Remaining wikis: wiki-twobatch {base, skills, both, pruned}. All intra-doc
relative links resolve; leak gate clean.
* fix(explorations): make CI green for the agent-wiki exploration
CI (ruff, mypy, detect-secrets) was scanning explorations/agent-wiki/ as
project source — the first content under explorations/ to carry .py files
and high-entropy identifiers. Fixes, scoped so generated example artifacts
are treated like the already-excluded plugin-source/ and examples/ trees:
- ruff: lint + format fixes in the harness scripts + builder; exclude the
generated wiki scripts (explorations/agent-wiki/wikis/) via extend-exclude.
- mypy: add explorations/agent-wiki/wikis/ to exclude; add file-local
`# mypy: ignore-errors` to the exploration harness + the builder (a
verbatim copy of the mypy-excluded plugin-source/ original).
- detect-secrets: exclude explorations/agent-wiki/ in the pre-commit hook
and .secrets.baseline — the 53 findings are 12-hex guideline content
hashes and session-id UUIDs, not secrets.
No example-wiki content changed (scripts keep their original names).
Fixes failing CI checks: check-formatting, check-linting, check-typing,
tekton/pr-code-checks/code-detect-secrets.
* explore(agent-wiki): move example wikis to a follow-up PR
Drops explorations/agent-wiki/wikis/ (253 generated files, ~10k lines) from
this PR so the diff is the reviewable surface — skills, builder, docs, and
the experiment reports/harness (~34 files). The example wikis are machine-
generated output; bundling them buried the code and appears to have made
CodeRabbit skip deep review (summary only, zero inline findings).
The wikis land in a stacked follow-up PR. README/docs still reference
wikis/wiki-twobatch-* by path; those links resolve once the follow-up
merges. Root-config excludes (ruff/mypy/detect-secrets) are kept — the
detect-secrets exclude still covers example content hashes in docs/schema.md,
and the wiki excludes become live again when the follow-up lands.
* fix(agent-wiki): address PR review findings
P1 — fresh catalog bootstrap crash: cmd_catalog now creates summaries/,
guidelines/, tasks/, skills/ before any index writer runs. A `catalog`
on a bare wiki-root no longer FileNotFounds on summaries/index.md.
P1 — skill docs referenced non-existent paths: repointed all 23
build_agent_wiki.py invocations and the normalizer reference from
plugin-source/… and scripts/… to the in-tree
explorations/agent-wiki/skills/scripts/ and …/experiments/harness/ paths
(across the 7 skills + _default_agents.md).
P1 — harness reproducibility: experiment_wiki_consult.py is marked
REFERENCE ONLY (it needs project-level sandbox assets — docker image,
demo workspace, hint plugin, _format_samples — not shipped here); the
tasks-file path now resolves to the checked-in harness/wiki_consult_tasks.yaml.
README's "reproduce" wording split into re-runnable compare scripts vs the
reference-only A/B runner.
P2 — render-cluster --archive-members broke member links: archive members
BEFORE rendering the cluster page, and resolve each member to its real
location — sibling in guidelines/, or ../_archived/<name>.md when archived.
Links and titles now resolve in both modes.
P2 — README described moved-out wikis: the example wikis live in the
companion PR; README layout/reading-order/scope updated accordingly.
Also: stripped trailing EOF blank lines in twobatch-comparison.md and
twobatch-skills-comparison.md (git diff --check).
* fix(agent-wiki): address CodeRabbit review on the split-down diff
CodeRabbit re-reviewed the focused (code-only) PR and flagged 7 items; 3 were
already fixed by the prior commit (REPO_ROOT, tasks_file path, build-script
path — CodeRabbit confirmed resolved). The remaining 4:
- [major] _format_samples import: wrap the deferred import in a clear
RuntimeError explaining it's a project-level sandbox asset absent from this
reference-only runner, instead of a bare ImportError.
- [minor] median was durs[n//2] — wrong for even-length trial lists; now
averages the two middle values for even n (default --trials 3 unaffected).
- [minor] typo "byes" -> "bytes" in RESULTS-SUMMARY.md.
- [minor] _default_agents.md Structure tree: add the per-section index.md
entries (summaries/guidelines/skills/tasks) the catalog regenerates.
* fix(agent-wiki): address review feedback from visahak
1. Harness REPO_ROOT resolved to explorations/agent-wiki (parents[2]) instead
of the repo root, so the reference A/B runner couldn't find project assets
(demo/workspace, platform-integrations/, tests/e2e/_wiki_hint_plugin). The
script moved from tests/e2e/ (where parents[2] was the root) down two levels
to experiments/harness/; REPO_ROOT is now parents[4] (the real repo root),
matching the documented "run from the full project" usage.
2. detect-secrets exclude was over-broad (^explorations/agent-wiki/), disabling
the secret gate over all hand-written code/docs/harness there. Narrowed to
only the generated example-wiki tree and the schema doc's worked examples
(^explorations/agent-wiki/wikis/ + docs/schema.md) — the only paths whose
12-hex content hashes / session UUIDs trip the high-entropy detector. This
mirrors the ruff/mypy scoping (wikis/ only). Applied in both
.pre-commit-config.yaml and .secrets.baseline.
* chore(agent-wiki): remove REVIEW-FINDINGS.md working note
Accidentally added in the prior commit by `git add -A`; this is a local
review-notes scratch file, not part of the exploration.
* fix(agent-wiki): harness --out-root tolerant of absolute paths
experiment_wiki_consult.py rendered the summary footer with
runs_path.relative_to(REPO_ROOT) / transcripts_dir.relative_to(REPO_ROOT),
which raised ValueError at the very end of a run when --out-root pointed at
an absolute path outside the repo. Added a _display_path() helper that returns
the repo-relative form when the path is under REPO_ROOT and the absolute path
otherwise. In-repo out-roots still render relative; external ones no longer
crash.
(The other open finding in the review notes — over-broad detect-secrets
exclude — was already narrowed to wikis/ + docs/schema.md in d0e0850.)1 parent 6de3712 commit 3e26154
34 files changed
Lines changed: 8314 additions & 7 deletions
File tree
- explorations/agent-wiki
- docs
- experiments
- harness
- metrics
- skills
- agent-wiki-consolidate-guidelines
- agent-wiki-consult
- agent-wiki-extract-guidelines
- agent-wiki-ingest
- agent-wiki-summarize
- agent-wiki-synthesize-skill
- agent-wiki-tasks
- scripts
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
49 | 49 | | |
50 | 50 | | |
51 | 51 | | |
52 | | - | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
53 | 58 | | |
54 | 59 | | |
55 | 60 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
| 3 | + | |
4 | 4 | | |
5 | 5 | | |
6 | | - | |
| 6 | + | |
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
| |||
156 | 156 | | |
157 | 157 | | |
158 | 158 | | |
| 159 | + | |
159 | 160 | | |
160 | | - | |
| 161 | + | |
161 | 162 | | |
162 | | - | |
163 | | - | |
| 163 | + | |
164 | 164 | | |
165 | 165 | | |
166 | 166 | | |
| |||
223 | 223 | | |
224 | 224 | | |
225 | 225 | | |
226 | | - | |
| 226 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
0 commit comments