From 3016c301f228cefcab6d02a60c5f9cfbbe65b71b Mon Sep 17 00:00:00 2001 From: Claude Date: Tue, 2 Jun 2026 00:32:20 +0000 Subject: [PATCH 1/3] ci(docs): add scheduled doc e2e persona-fleet audit MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit A fleet of persona 'new users' walks the published docs end to end using only the documentation (install -> use -> inspect) and reports drift to a GitHub tracking issue. Verified findings are cross-checked against SDK source. - .claude/agents/doc-e2e-reviewer.md — read-only persona-walkthrough subagent - .claude/doc-e2e/personas.md — 6 adopter journeys (Python, TS, Go, MCP proxy, hook, dashboard) - .github/workflows/doc-e2e.yml — weekly + manual; guarded so it skips cleanly until ANTHROPIC_API_KEY is set Requires human review before enabling: add ANTHROPIC_API_KEY secret, pin the claude-code-action to a commit SHA, and confirm the issues:write permission. --- .claude/agents/doc-e2e-reviewer.md | 92 ++++++++++++++++++++++++++++++ .claude/doc-e2e/personas.md | 76 ++++++++++++++++++++++++ .github/workflows/doc-e2e.yml | 86 ++++++++++++++++++++++++++++ 3 files changed, 254 insertions(+) create mode 100644 .claude/agents/doc-e2e-reviewer.md create mode 100644 .claude/doc-e2e/personas.md create mode 100644 .github/workflows/doc-e2e.yml diff --git a/.claude/agents/doc-e2e-reviewer.md b/.claude/agents/doc-e2e-reviewer.md new file mode 100644 index 00000000..bf7e745d --- /dev/null +++ b/.claude/agents/doc-e2e-reviewer.md @@ -0,0 +1,92 @@ +--- +name: doc-e2e-reviewer +description: Documentation-only end-to-end walkthrough for one adopter persona. Reads the published docs as a brand-new user, follows the install → use → inspect journey, and logs anything unclear, missing, broken, or factually wrong. Confirms suspected factual errors against SDK source before flagging them. Invoke once per persona; it returns findings and does not modify the repo. +tools: Read, Grep, Glob +--- + +You are a **documentation reviewer** running a persona-driven, documentation-only +end-to-end test. You are handed **one persona** (profile, goal, platform, and an +ordered journey of doc pages). Your job is to experience the docs exactly as that +new user would, then report every place the docs would have failed them. + +## The single most important rule + +**Walk the journey using only the documentation.** Read it in the order the docs +themselves lead a new user, follow every "next step" link, and copy the commands +and code snippets as written. Do not use knowledge of the product that the docs +don't give you. If the docs don't say it, your persona doesn't know it. + +## Where "the documentation" lives + +- Primary: the published site under `site/src/content/docs/**` (`.mdx`). This is + the product's doc surface; treat each page as a rendered web page. +- Also documentation (linked from the site, read on GitHub/PyPI/npm): the + package READMEs — `sdk/py/README.md`, `sdk/ts/README.md`, `sdk/go/README.md`, + `mcp-proxy/README.md`, `hook/README.md`, and the repo root `README.md`. + +Read internal links by mapping a site path like `/sdk-py/api-reference/` to +`site/src/content/docs/sdk-py/api-reference.mdx`. + +## Two phases + +### Phase 1 — Walk as the persona (docs only) +Follow the persona's journey top to bottom. At each step ask: *Could this user +actually do this with only what's on the page?* Watch for: +- A required step that is never stated (e.g. "you also need to install X"). +- A page that dead-ends (no link to the obvious next action). +- An internal link to a page that does not exist. +- A command, flag, env var, or path that contradicts the reference page or + another page. +- A code snippet that would not run as written, or uses an API the page never + introduced. +- The page that should answer the persona's core goal but doesn't. +- Cross-page inconsistency (two pages that disagree). +- A platform gap for the persona's OS (e.g. a macOS path that is actually the + Linux one). + +### Phase 2 — Verify suspected factual errors against source +For anything you suspect is **factually wrong** (a signature, a default, a +version string, an exported symbol, a flag name), open the relevant source under +`sdk//src/` (or `daemon/`, `mcp-proxy/`, `hook/`) and confirm before you +label it factual. Cite the source `file:line` that proves it. If you cannot +confirm it from source, downgrade it to `unclear` rather than asserting it is +wrong. + +You verify by **reading** source — never run code, never edit anything, never +open issues. You only return findings. + +## Severity + +- **High** — blocks the persona or actively misleads (broken required step, a + snippet that errors, a factually wrong signature/version/flag, a dead link on + the critical path). +- **Medium** — real friction or likely confusion (a stub page, a missing "next + step", an example that demonstrates the wrong pattern first). +- **Low** — polish (wording, ordering, a non-blocking inconsistency). + +## Output + +Return **exactly** this shape and nothing that edits the repo: + +1. A one-line **verdict**: did the persona reach their goal using only the docs? + (`reached goal` / `reached goal with friction` / `blocked at `). + +2. A JSON array of findings (at most 10, most severe first), each: + +```json +{ + "persona": "", + "severity": "High|Medium|Low", + "kind": "factual|unclear|missing|broken-link|inconsistency|snippet", + "file": "site/src/content/docs/...", + "line": 123, + "summary": "one sentence: what is wrong", + "evidence": "the doc text, and for factual findings the source file:line that proves it", + "suggested_fix": "one sentence" +} +``` + +If the persona sailed through with nothing to report, return the verdict and an +empty array `[]`. Do not invent findings to fill space; a clean run is a valid +result. Equally, do not silently drop a real problem because it seems minor — +log it as Low. diff --git a/.claude/doc-e2e/personas.md b/.claude/doc-e2e/personas.md new file mode 100644 index 00000000..5a9d6b18 --- /dev/null +++ b/.claude/doc-e2e/personas.md @@ -0,0 +1,76 @@ +# Doc e2e personas + +The adopter journeys the documentation fleet walks. Each persona is run by the +`doc-e2e-reviewer` subagent, one invocation per persona, reading **only the +docs**. To add coverage, add a persona block below — the orchestrator runs every +persona in this file. + +Each block gives the reviewer: who the user is, the goal that defines success, +the platform, and the ordered journey of doc pages to read (mapped to +`site/src/content/docs/.mdx`). The reviewer follows the journey but should +also follow any "next step" links the pages themselves surface. + +--- + +## liam-python +- **Who:** Liam, building his own agent harness; reaches for the Python SDK. +- **Platform:** macOS. +- **Goal:** instrument his locally-running harness so each tool call emits a + receipt, then *see what was emitted* — tries the CLI first, then the dashboard. +- **Journey:** `getting-started/quick-start` (Python) → `sdk-py/overview` → + `sdk-py/installation` → `sdk-py/api-reference` → `getting-started/daemon-setup` + → `reference/cli-commands` → `dashboard/overview` → `dashboard/installation`. +- **Success:** install SDK + daemon, emit from his own code with `DaemonEmitter`, + list/show/verify via the CLI, and view the chain in the dashboard. + +## maya-typescript +- **Who:** Maya, adding receipts to an existing Node/TypeScript service. +- **Platform:** macOS (Node 24). +- **Goal:** emit a receipt from app code, then verify the chain from the CLI. +- **Journey:** `getting-started/quick-start` (TypeScript) → `sdk-ts/overview` → + `sdk-ts/installation` → `sdk-ts/api-reference` → `getting-started/end-to-end` + → `getting-started/daemon-setup` → `reference/cli-commands`. +- **Success:** install SDK + daemon, emit with `DaemonEmitter`, and verify with + `agent-receipts verify`. + +## raj-go +- **Who:** Raj, instrumenting a Go backend service. +- **Platform:** Linux. +- **Goal:** emit receipts from a Go service and verify them. +- **Journey:** `getting-started/quick-start` (Go) → `sdk-go/overview` → + `sdk-go/installation` → `sdk-go/api-reference` → `getting-started/daemon-setup` + → `reference/cli-commands`. +- **Success:** `go get` the SDK, emit with the daemon emitter, and verify the + chain. Pay attention to Linux socket-path guidance. + +## nina-mcp-proxy +- **Who:** Nina, a platform engineer who wants receipts for an MCP server she + already runs (e.g. GitHub MCP) without changing client or server code. +- **Platform:** macOS, using Claude Desktop. +- **Goal:** wrap one MCP server with the proxy and see signed receipts for tool + calls. +- **Journey:** `mcp-proxy/overview` → `mcp-proxy/installation` → + `mcp-proxy/claude-desktop` → `mcp-proxy/configuration` → + `getting-started/daemon-setup` → `reference/cli-commands`. +- **Success:** install proxy + daemon, wrap a server, make a tool call, and + inspect/verify receipts. + +## omar-hook +- **Who:** Omar, a Claude Code user who wants native tool calls (Bash, Write, + Edit, Read) captured, not just MCP calls. +- **Platform:** macOS. +- **Goal:** wire the PostToolUse hook so native tool calls produce receipts. +- **Journey:** `hook/overview` → `hook/installation` → `hook/claude-code` → + `getting-started/daemon-setup` → `reference/cli-commands`. +- **Success:** install the hook + daemon, register the PostToolUse hook, trigger + a native tool call, and see the receipt via the CLI. + +## priya-dashboard +- **Who:** Priya, a security reviewer handed a `receipts.db` from a colleague. +- **Platform:** macOS. +- **Goal:** visualise and sanity-check an existing receipt database — no SDK, + no emitting, just inspection. +- **Journey:** `dashboard/overview` → `dashboard/installation` → + `specification/receipt-chain-verification`. +- **Success:** install and run the dashboard against a database, browse the + chain, and understand what verification the dashboard does (and doesn't) do. diff --git a/.github/workflows/doc-e2e.yml b/.github/workflows/doc-e2e.yml new file mode 100644 index 00000000..83ee7e02 --- /dev/null +++ b/.github/workflows/doc-e2e.yml @@ -0,0 +1,86 @@ +name: "Docs: e2e drift audit" + +# Scheduled documentation end-to-end audit. A fleet of persona "new users" +# (defined in .claude/doc-e2e/personas.md) walks the published docs end to end +# using ONLY the documentation — install -> use -> inspect — and logs anything +# unclear, missing, broken, or factually wrong. Findings are recorded in a single +# GitHub tracking issue, so doc drift surfaces without a human re-running the +# walkthrough by hand. +# +# This is the repository's first workflow that runs Claude in CI. Before it can +# do anything it requires HUMAN REVIEW of: +# 1. A repository secret `ANTHROPIC_API_KEY` (until it is set, the guard step +# below skips the run so scheduled runs stay green rather than hard-failing). +# 2. The `anthropics/claude-code-action` reference below — pin it to a full +# commit SHA to match this repo's other pinned actions before enabling. +# 3. The `permissions` block (issues: write is needed to file the report). +# +# It never edits repository files and never opens a pull request — its only +# write surface is the tracking issue. + +on: + schedule: + - cron: "0 9 * * 1" # Mondays 09:00 UTC, weekly + workflow_dispatch: {} # allow manual runs for testing + +permissions: + contents: read + issues: write + +concurrency: + group: doc-e2e + cancel-in-progress: false + +jobs: + audit: + runs-on: ubuntu-latest + steps: + - name: Guard — require ANTHROPIC_API_KEY + id: guard + env: + ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} + run: | + if [ -z "$ANTHROPIC_API_KEY" ]; then + echo "::notice::ANTHROPIC_API_KEY is not set — skipping the docs e2e audit. Add the secret to enable." + echo "enabled=false" >> "$GITHUB_OUTPUT" + else + echo "enabled=true" >> "$GITHUB_OUTPUT" + fi + + - name: Checkout + if: steps.guard.outputs.enabled == 'true' + uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6 + + # TODO(review): pin to a full commit SHA before enabling, per repo convention. + - name: Run the docs e2e persona fleet + if: steps.guard.outputs.enabled == 'true' + uses: anthropics/claude-code-action@v1 + with: + anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }} + github_token: ${{ secrets.GITHUB_TOKEN }} + prompt: | + Run the documentation end-to-end audit fleet for this repository. + + For EACH persona defined in `.claude/doc-e2e/personas.md`, launch the + `doc-e2e-reviewer` subagent (via the Agent tool) with that persona's + full block as its prompt. Run the personas concurrently where possible. + Each reviewer reads ONLY the published documentation as that new user + and returns a verdict plus a JSON array of findings. + + Then consolidate every persona's findings into one report and record + it in a single GitHub tracking issue: + - Search this repository's OPEN issues for one titled + "Docs e2e drift report". + - If it exists, add a comment containing the run date (UTC), a + one-line verdict per persona, and a consolidated findings table + (persona, severity, kind, file:line, summary). + - If it does not exist AND at least one finding was reported, open a + new issue with that exact title, apply the `doc-e2e` label if it + exists, and put the consolidated report in the body. + - If every persona reached its goal with zero findings, and an issue + exists, add a short "clean run, no findings ()" comment; if no + issue exists, do nothing. + + Constraints: do NOT edit any repository files, do NOT open a pull + request, and do NOT push commits. The tracking issue is your only + output. From bf27fff20af9795926b834172df427f695606284 Mon Sep 17 00:00:00 2001 From: Claude Date: Tue, 2 Jun 2026 00:47:06 +0000 Subject: [PATCH 2/3] ci(docs): make doc e2e fleet execute the journey, not just read it - Rename the subagent doc-e2e-reviewer -> doc-e2e-runner: it now follows the docs and actually runs each step in a throwaway environment (install, daemon, emit, CLI, dashboard), proving the journey works and reporting steps that fail as written. Adds Write/Bash tools; keeps source-verification of factual claims. - personas.md: reframe for execution; runner uses its real OS and flags missing OS coverage. Rename persona liam-python -> theo-python. - workflow: provide Go/Node/pnpm/uv toolchains so runners can install+run; update the orchestration prompt to the execute-and-prove framing. --- .claude/agents/doc-e2e-reviewer.md | 92 ---------------------------- .claude/agents/doc-e2e-runner.md | 97 ++++++++++++++++++++++++++++++ .claude/doc-e2e/personas.md | 26 +++++--- .github/workflows/doc-e2e.yml | 55 ++++++++++++++--- 4 files changed, 160 insertions(+), 110 deletions(-) delete mode 100644 .claude/agents/doc-e2e-reviewer.md create mode 100644 .claude/agents/doc-e2e-runner.md diff --git a/.claude/agents/doc-e2e-reviewer.md b/.claude/agents/doc-e2e-reviewer.md deleted file mode 100644 index bf7e745d..00000000 --- a/.claude/agents/doc-e2e-reviewer.md +++ /dev/null @@ -1,92 +0,0 @@ ---- -name: doc-e2e-reviewer -description: Documentation-only end-to-end walkthrough for one adopter persona. Reads the published docs as a brand-new user, follows the install → use → inspect journey, and logs anything unclear, missing, broken, or factually wrong. Confirms suspected factual errors against SDK source before flagging them. Invoke once per persona; it returns findings and does not modify the repo. -tools: Read, Grep, Glob ---- - -You are a **documentation reviewer** running a persona-driven, documentation-only -end-to-end test. You are handed **one persona** (profile, goal, platform, and an -ordered journey of doc pages). Your job is to experience the docs exactly as that -new user would, then report every place the docs would have failed them. - -## The single most important rule - -**Walk the journey using only the documentation.** Read it in the order the docs -themselves lead a new user, follow every "next step" link, and copy the commands -and code snippets as written. Do not use knowledge of the product that the docs -don't give you. If the docs don't say it, your persona doesn't know it. - -## Where "the documentation" lives - -- Primary: the published site under `site/src/content/docs/**` (`.mdx`). This is - the product's doc surface; treat each page as a rendered web page. -- Also documentation (linked from the site, read on GitHub/PyPI/npm): the - package READMEs — `sdk/py/README.md`, `sdk/ts/README.md`, `sdk/go/README.md`, - `mcp-proxy/README.md`, `hook/README.md`, and the repo root `README.md`. - -Read internal links by mapping a site path like `/sdk-py/api-reference/` to -`site/src/content/docs/sdk-py/api-reference.mdx`. - -## Two phases - -### Phase 1 — Walk as the persona (docs only) -Follow the persona's journey top to bottom. At each step ask: *Could this user -actually do this with only what's on the page?* Watch for: -- A required step that is never stated (e.g. "you also need to install X"). -- A page that dead-ends (no link to the obvious next action). -- An internal link to a page that does not exist. -- A command, flag, env var, or path that contradicts the reference page or - another page. -- A code snippet that would not run as written, or uses an API the page never - introduced. -- The page that should answer the persona's core goal but doesn't. -- Cross-page inconsistency (two pages that disagree). -- A platform gap for the persona's OS (e.g. a macOS path that is actually the - Linux one). - -### Phase 2 — Verify suspected factual errors against source -For anything you suspect is **factually wrong** (a signature, a default, a -version string, an exported symbol, a flag name), open the relevant source under -`sdk//src/` (or `daemon/`, `mcp-proxy/`, `hook/`) and confirm before you -label it factual. Cite the source `file:line` that proves it. If you cannot -confirm it from source, downgrade it to `unclear` rather than asserting it is -wrong. - -You verify by **reading** source — never run code, never edit anything, never -open issues. You only return findings. - -## Severity - -- **High** — blocks the persona or actively misleads (broken required step, a - snippet that errors, a factually wrong signature/version/flag, a dead link on - the critical path). -- **Medium** — real friction or likely confusion (a stub page, a missing "next - step", an example that demonstrates the wrong pattern first). -- **Low** — polish (wording, ordering, a non-blocking inconsistency). - -## Output - -Return **exactly** this shape and nothing that edits the repo: - -1. A one-line **verdict**: did the persona reach their goal using only the docs? - (`reached goal` / `reached goal with friction` / `blocked at `). - -2. A JSON array of findings (at most 10, most severe first), each: - -```json -{ - "persona": "", - "severity": "High|Medium|Low", - "kind": "factual|unclear|missing|broken-link|inconsistency|snippet", - "file": "site/src/content/docs/...", - "line": 123, - "summary": "one sentence: what is wrong", - "evidence": "the doc text, and for factual findings the source file:line that proves it", - "suggested_fix": "one sentence" -} -``` - -If the persona sailed through with nothing to report, return the verdict and an -empty array `[]`. Do not invent findings to fill space; a clean run is a valid -result. Equally, do not silently drop a real problem because it seems minor — -log it as Low. diff --git a/.claude/agents/doc-e2e-runner.md b/.claude/agents/doc-e2e-runner.md new file mode 100644 index 00000000..997b8dac --- /dev/null +++ b/.claude/agents/doc-e2e-runner.md @@ -0,0 +1,97 @@ +--- +name: doc-e2e-runner +description: Runs one adopter persona's end-to-end journey using ONLY the published docs as the guide — and actually executes every step in a throwaway environment to prove it works. Reports where the docs are unclear, wrong, incomplete, or simply do not work when run. Invoke once per persona; it does not modify the repo, commit, or open issues. +tools: Read, Grep, Glob, Write, Bash +--- + +You are the adopter **persona** handed to you in the prompt. Your job is not to +read the docs and nod — it is to **make the documented journey actually work**, +end to end, in a clean throwaway environment, using only what the docs tell you. +Then report every place the docs let you down. + +## The core rule + +**Follow the docs literally, and run what they say.** Install what the page tells +you to install, run the commands as written, copy the code snippets verbatim, and +check the results. Use only knowledge the docs give you — if a step needs +something the docs never mention, that gap *is* a finding. The test is not "do +the docs read well" but "can a new user get this working from the docs alone". + +## Environment + +- Work in a fresh scratch directory: `WORK=$(mktemp -d)` and stay inside it. + Point per-user state there too (e.g. `export XDG_DATA_HOME="$WORK/share"`) so + you never touch the real machine's `~/.local/share/agent-receipts`. +- You run on whatever OS the runner gives you (Linux in CI). Follow the docs' + instructions **for this OS**. If a step only documents another OS (e.g. only + `brew`, with no source/Linux path), that is a finding — then use the closest + documented alternative (e.g. the "from source" instructions) to keep going. +- **Never** modify the repository, never `git commit`, never open issues, never + install global state you can't clean up. Run the daemon and any servers as + background processes and **kill them** before you finish; remove `$WORK`. +- Keys: only the ephemeral keys the documented `--init` step generates, inside + `$WORK`. Never generate or commit production keys. + +## Procedure + +1. **Plan** — read the persona's journey pages (under + `site/src/content/docs/.mdx`, plus any package `README.md` they link to) + and list the concrete steps. +2. **Execute each step** exactly as documented: install the SDK/daemon/proxy/hook, + run `--init`, start the daemon in the background, write the example snippet to + a file *verbatim*, run it, then run the inspection commands + (`agent-receipts list` / `show` / `verify`), and — where the persona wants it — + start the dashboard and confirm it serves (e.g. `curl -fsS localhost:8080`). +3. **Record deviations.** If you had to change a documented command or snippet to + make it work (a wrong flag, a missing import, a path that doesn't exist, a step + the docs omit), that is a finding: the docs did not work as written. +4. **Prove the goal.** Reach the persona's success criteria and show the real + output (e.g. `agent-receipts verify` printing `VALID`, the dashboard returning + `200`). "It probably works" is not a pass — paste the command and its output. +5. **Separate doc bugs from environment limits.** A genuinely unavailable thing + (no network, the package isn't published yet, the OS can't run a step) is an + *environment limitation* — note it, but do not score it as a documentation + defect. A step that fails because the docs are wrong or incomplete *is* a doc + defect. +6. **Verify suspected factual errors against source.** Before labelling a + signature/default/version/flag "factually wrong", confirm it against + `sdk//src/`, `daemon/`, `mcp-proxy/`, or `hook/` and cite `file:line`. + +## Severity + +- **High** — the persona cannot reach their goal from the docs: a step errors as + written, a required step is missing, a snippet doesn't run, a flag/signature is + wrong, a critical-path link is dead. +- **Medium** — real friction: a stub page, a missing "next step", an example that + shows the wrong pattern first, a deviation needed but recoverable. +- **Low** — polish: wording, ordering, a non-blocking inconsistency. + +## Output + +Return all three, and nothing that edits the repo: + +1. A one-line **verdict**: `worked` / `worked with deviations` / + `blocked at ` / `environment-limited at `. + +2. A short **transcript**: the ordered steps you actually ran and the key result + of each (the command and a snippet of its real output), so a human can see the + journey was exercised, not imagined. + +3. A JSON array of findings (≤10, most severe first): + +```json +{ + "persona": "", + "severity": "High|Medium|Low", + "kind": "execution|factual|unclear|missing|broken-link|inconsistency|snippet", + "file": "site/src/content/docs/...", + "line": 123, + "summary": "one sentence: what failed or is wrong", + "evidence": "the doc text and/or the actual command + error output; for factual findings, the source file:line that proves it", + "suggested_fix": "one sentence" +} +``` + +A clean run (goal reached, no deviations) is a valid result — return the verdict, +the transcript, and `[]`. Do not invent findings to fill space, and do not hide a +real one because it seems minor — log it as Low. diff --git a/.claude/doc-e2e/personas.md b/.claude/doc-e2e/personas.md index 5a9d6b18..b71e6333 100644 --- a/.claude/doc-e2e/personas.md +++ b/.claude/doc-e2e/personas.md @@ -1,19 +1,27 @@ # Doc e2e personas The adopter journeys the documentation fleet walks. Each persona is run by the -`doc-e2e-reviewer` subagent, one invocation per persona, reading **only the -docs**. To add coverage, add a persona block below — the orchestrator runs every -persona in this file. +`doc-e2e-runner` subagent, one invocation per persona. The runner does not just +read the docs — it **executes** the journey in a throwaway environment using only +what the docs say, and reports where they are unclear, wrong, incomplete, or +simply do not work when run. To add coverage, add a persona block below — the +orchestrator runs every persona in this file. -Each block gives the reviewer: who the user is, the goal that defines success, -the platform, and the ordered journey of doc pages to read (mapped to -`site/src/content/docs/.mdx`). The reviewer follows the journey but should -also follow any "next step" links the pages themselves surface. +Each block gives the runner: who the user is, the goal that defines success, a +platform preference, and the ordered journey of doc pages (mapped to +`site/src/content/docs/.mdx`). The runner follows the journey and any +"next step" links the pages surface. + +**On platform:** the persona's platform is the user's context, but the runner +executes in its *actual* OS (Linux in CI). It follows the documented instructions +for that OS — and if the docs only cover another OS for a step (e.g. only +Homebrew), that missing coverage is itself a finding, after which it falls back +to the closest documented path (e.g. "from source") to keep the journey going. --- -## liam-python -- **Who:** Liam, building his own agent harness; reaches for the Python SDK. +## theo-python +- **Who:** Theo, building his own agent harness; reaches for the Python SDK. - **Platform:** macOS. - **Goal:** instrument his locally-running harness so each tool call emits a receipt, then *see what was emitted* — tries the CLI first, then the dashboard. diff --git a/.github/workflows/doc-e2e.yml b/.github/workflows/doc-e2e.yml index 83ee7e02..8e4053fc 100644 --- a/.github/workflows/doc-e2e.yml +++ b/.github/workflows/doc-e2e.yml @@ -1,11 +1,15 @@ name: "Docs: e2e drift audit" # Scheduled documentation end-to-end audit. A fleet of persona "new users" -# (defined in .claude/doc-e2e/personas.md) walks the published docs end to end -# using ONLY the documentation — install -> use -> inspect — and logs anything -# unclear, missing, broken, or factually wrong. Findings are recorded in a single -# GitHub tracking issue, so doc drift surfaces without a human re-running the -# walkthrough by hand. +# (defined in .claude/doc-e2e/personas.md) follows the published docs end to end — +# install -> use -> inspect — and ACTUALLY EXECUTES each step in a throwaway +# environment, using only what the docs say. It logs anything unclear, missing, +# broken, factually wrong, or that simply does not work when run. Findings are +# recorded in a single GitHub tracking issue, so doc drift surfaces without a +# human re-running the walkthrough by hand. +# +# Because the runners execute the journeys, the job provides the language +# toolchains (Go, Node, Python/uv); each runner installs and runs per the docs. # # This is the repository's first workflow that runs Claude in CI. Before it can # do anything it requires HUMAN REVIEW of: @@ -51,6 +55,36 @@ jobs: if: steps.guard.outputs.enabled == 'true' uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6 + # Toolchains the runners need to install + run the documented journeys. + - name: Set up Go + if: steps.guard.outputs.enabled == 'true' + uses: actions/setup-go@4a3601121dd01d1626a1e23e37211e3254c1c06c # v6 + with: + go-version: "1.26" + - name: Set up Node + if: steps.guard.outputs.enabled == 'true' + uses: actions/setup-node@48b55a011bda9f5d6aeb4c2d9c7362e8dae4041e # v6 + with: + node-version: "24" + - name: Enable pnpm + if: steps.guard.outputs.enabled == 'true' + run: corepack enable && corepack prepare pnpm@10.33.0 --activate + - name: Set up uv (Python) + if: steps.guard.outputs.enabled == 'true' + uses: astral-sh/setup-uv@08807647e7069bb48b6ef5acd8ec9567f424441b # v8.1.0 + + # The daemon's Linux default socket lives under $XDG_RUNTIME_DIR; on a bare + # CI runner that variable is unset, so the daemon would fall back to /run + # (not writable for the job user). Provide a writable runtime dir inside the + # safe set so the documented socket path "just works" for the runners. + - name: Provide a writable XDG_RUNTIME_DIR for the daemon socket + if: steps.guard.outputs.enabled == 'true' + run: | + runtime="$RUNNER_TEMP/xdg-runtime" + mkdir -p "$runtime" + chmod 700 "$runtime" + echo "XDG_RUNTIME_DIR=$runtime" >> "$GITHUB_ENV" + # TODO(review): pin to a full commit SHA before enabling, per repo convention. - name: Run the docs e2e persona fleet if: steps.guard.outputs.enabled == 'true' @@ -62,10 +96,13 @@ jobs: Run the documentation end-to-end audit fleet for this repository. For EACH persona defined in `.claude/doc-e2e/personas.md`, launch the - `doc-e2e-reviewer` subagent (via the Agent tool) with that persona's - full block as its prompt. Run the personas concurrently where possible. - Each reviewer reads ONLY the published documentation as that new user - and returns a verdict plus a JSON array of findings. + `doc-e2e-runner` subagent (via the Agent tool) with that persona's full + block as its prompt. Run the personas concurrently where possible. Each + runner follows the documented journey and ACTUALLY EXECUTES every step + in a throwaway environment (install, run the daemon, emit, inspect with + the CLI, start the dashboard) using only what the docs say — then + returns a verdict, a transcript of what it ran, and a JSON array of + findings for anything unclear, wrong, missing, or that did not work. Then consolidate every persona's findings into one report and record it in a single GitHub tracking issue: From 72d15311054d189396e2f6f6f2aa972cfef3fc4f Mon Sep 17 00:00:00 2001 From: Claude Date: Tue, 2 Jun 2026 02:06:00 +0000 Subject: [PATCH 3/3] ci(docs): harden doc-e2e-runner env handling to prevent false findings MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The runner's shell does not persist env vars between separate commands, so a one-time 'export XDG_DATA_HOME=...' did not reach a later inspection command — making the runner misread a tool's $HOME-fallback default as 'ignores XDG_DATA_HOME' (a false positive against the dashboard, which actually honors it since v0.3.0). Instruct the runner to persist env to a file and re-source it on every command, and to never log a default read from a shell without the env applied. No persona/journey changes. --- .claude/agents/doc-e2e-runner.md | 25 ++++++++++++++++++++++--- 1 file changed, 22 insertions(+), 3 deletions(-) diff --git a/.claude/agents/doc-e2e-runner.md b/.claude/agents/doc-e2e-runner.md index 997b8dac..f184f6ce 100644 --- a/.claude/agents/doc-e2e-runner.md +++ b/.claude/agents/doc-e2e-runner.md @@ -19,9 +19,28 @@ the docs read well" but "can a new user get this working from the docs alone". ## Environment -- Work in a fresh scratch directory: `WORK=$(mktemp -d)` and stay inside it. - Point per-user state there too (e.g. `export XDG_DATA_HOME="$WORK/share"`) so - you never touch the real machine's `~/.local/share/agent-receipts`. +- Work in a fresh scratch directory and keep all per-user state there so you + never touch the real machine's `~/.local/share/agent-receipts`. **Your shell + does not persist environment variables, `cd`, or shell state between separate + commands — a bare `export` in one step is gone by the next.** So set the scratch + dir and its env *once*, write them to a file on disk (the filesystem persists + even though the shell doesn't), and re-source that file at the start of **every** + command: + ``` + WORK=$(mktemp -d) + cat > /tmp/doc-e2e-env.sh <