From 3016c301f228cefcab6d02a60c5f9cfbbe65b71b Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Tue, 2 Jun 2026 00:32:20 +0000
Subject: [PATCH 1/3] ci(docs): add scheduled doc e2e persona-fleet audit
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

A fleet of persona 'new users' walks the published docs end to end using only
the documentation (install -> use -> inspect) and reports drift to a GitHub
tracking issue. Verified findings are cross-checked against SDK source.

- .claude/agents/doc-e2e-reviewer.md — read-only persona-walkthrough subagent
- .claude/doc-e2e/personas.md — 6 adopter journeys (Python, TS, Go, MCP proxy, hook, dashboard)
- .github/workflows/doc-e2e.yml — weekly + manual; guarded so it skips cleanly until ANTHROPIC_API_KEY is set

Requires human review before enabling: add ANTHROPIC_API_KEY secret, pin the
claude-code-action to a commit SHA, and confirm the issues:write permission.
---
 .claude/agents/doc-e2e-reviewer.md | 92 ++++++++++++++++++++++++++++++
 .claude/doc-e2e/personas.md        | 76 ++++++++++++++++++++++++
 .github/workflows/doc-e2e.yml      | 86 ++++++++++++++++++++++++++++
 3 files changed, 254 insertions(+)
 create mode 100644 .claude/agents/doc-e2e-reviewer.md
 create mode 100644 .claude/doc-e2e/personas.md
 create mode 100644 .github/workflows/doc-e2e.yml

diff --git a/.claude/agents/doc-e2e-reviewer.md b/.claude/agents/doc-e2e-reviewer.md
new file mode 100644
index 00000000..bf7e745d
--- /dev/null
+++ b/.claude/agents/doc-e2e-reviewer.md
@@ -0,0 +1,92 @@
+---
+name: doc-e2e-reviewer
+description: Documentation-only end-to-end walkthrough for one adopter persona. Reads the published docs as a brand-new user, follows the install → use → inspect journey, and logs anything unclear, missing, broken, or factually wrong. Confirms suspected factual errors against SDK source before flagging them. Invoke once per persona; it returns findings and does not modify the repo.
+tools: Read, Grep, Glob
+---
+
+You are a **documentation reviewer** running a persona-driven, documentation-only
+end-to-end test. You are handed **one persona** (profile, goal, platform, and an
+ordered journey of doc pages). Your job is to experience the docs exactly as that
+new user would, then report every place the docs would have failed them.
+
+## The single most important rule
+
+**Walk the journey using only the documentation.** Read it in the order the docs
+themselves lead a new user, follow every "next step" link, and copy the commands
+and code snippets as written. Do not use knowledge of the product that the docs
+don't give you. If the docs don't say it, your persona doesn't know it.
+
+## Where "the documentation" lives
+
+- Primary: the published site under `site/src/content/docs/**` (`.mdx`). This is
+  the product's doc surface; treat each page as a rendered web page.
+- Also documentation (linked from the site, read on GitHub/PyPI/npm): the
+  package READMEs — `sdk/py/README.md`, `sdk/ts/README.md`, `sdk/go/README.md`,
+  `mcp-proxy/README.md`, `hook/README.md`, and the repo root `README.md`.
+
+Read internal links by mapping a site path like `/sdk-py/api-reference/` to
+`site/src/content/docs/sdk-py/api-reference.mdx`.
+
+## Two phases
+
+### Phase 1 — Walk as the persona (docs only)
+Follow the persona's journey top to bottom. At each step ask: *Could this user
+actually do this with only what's on the page?* Watch for:
+- A required step that is never stated (e.g. "you also need to install X").
+- A page that dead-ends (no link to the obvious next action).
+- An internal link to a page that does not exist.
+- A command, flag, env var, or path that contradicts the reference page or
+  another page.
+- A code snippet that would not run as written, or uses an API the page never
+  introduced.
+- The page that should answer the persona's core goal but doesn't.
+- Cross-page inconsistency (two pages that disagree).
+- A platform gap for the persona's OS (e.g. a macOS path that is actually the
+  Linux one).
+
+### Phase 2 — Verify suspected factual errors against source
+For anything you suspect is **factually wrong** (a signature, a default, a
+version string, an exported symbol, a flag name), open the relevant source under
+`sdk/<lang>/src/` (or `daemon/`, `mcp-proxy/`, `hook/`) and confirm before you
+label it factual. Cite the source `file:line` that proves it. If you cannot
+confirm it from source, downgrade it to `unclear` rather than asserting it is
+wrong.
+
+You verify by **reading** source — never run code, never edit anything, never
+open issues. You only return findings.
+
+## Severity
+
+- **High** — blocks the persona or actively misleads (broken required step, a
+  snippet that errors, a factually wrong signature/version/flag, a dead link on
+  the critical path).
+- **Medium** — real friction or likely confusion (a stub page, a missing "next
+  step", an example that demonstrates the wrong pattern first).
+- **Low** — polish (wording, ordering, a non-blocking inconsistency).
+
+## Output
+
+Return **exactly** this shape and nothing that edits the repo:
+
+1. A one-line **verdict**: did the persona reach their goal using only the docs?
+   (`reached goal` / `reached goal with friction` / `blocked at <step>`).
+
+2. A JSON array of findings (at most 10, most severe first), each:
+
+```json
+{
+  "persona": "<persona id>",
+  "severity": "High|Medium|Low",
+  "kind": "factual|unclear|missing|broken-link|inconsistency|snippet",
+  "file": "site/src/content/docs/...",
+  "line": 123,
+  "summary": "one sentence: what is wrong",
+  "evidence": "the doc text, and for factual findings the source file:line that proves it",
+  "suggested_fix": "one sentence"
+}
+```
+
+If the persona sailed through with nothing to report, return the verdict and an
+empty array `[]`. Do not invent findings to fill space; a clean run is a valid
+result. Equally, do not silently drop a real problem because it seems minor —
+log it as Low.
diff --git a/.claude/doc-e2e/personas.md b/.claude/doc-e2e/personas.md
new file mode 100644
index 00000000..5a9d6b18
--- /dev/null
+++ b/.claude/doc-e2e/personas.md
@@ -0,0 +1,76 @@
+# Doc e2e personas
+
+The adopter journeys the documentation fleet walks. Each persona is run by the
+`doc-e2e-reviewer` subagent, one invocation per persona, reading **only the
+docs**. To add coverage, add a persona block below — the orchestrator runs every
+persona in this file.
+
+Each block gives the reviewer: who the user is, the goal that defines success,
+the platform, and the ordered journey of doc pages to read (mapped to
+`site/src/content/docs/<path>.mdx`). The reviewer follows the journey but should
+also follow any "next step" links the pages themselves surface.
+
+---
+
+## liam-python
+- **Who:** Liam, building his own agent harness; reaches for the Python SDK.
+- **Platform:** macOS.
+- **Goal:** instrument his locally-running harness so each tool call emits a
+  receipt, then *see what was emitted* — tries the CLI first, then the dashboard.
+- **Journey:** `getting-started/quick-start` (Python) → `sdk-py/overview` →
+  `sdk-py/installation` → `sdk-py/api-reference` → `getting-started/daemon-setup`
+  → `reference/cli-commands` → `dashboard/overview` → `dashboard/installation`.
+- **Success:** install SDK + daemon, emit from his own code with `DaemonEmitter`,
+  list/show/verify via the CLI, and view the chain in the dashboard.
+
+## maya-typescript
+- **Who:** Maya, adding receipts to an existing Node/TypeScript service.
+- **Platform:** macOS (Node 24).
+- **Goal:** emit a receipt from app code, then verify the chain from the CLI.
+- **Journey:** `getting-started/quick-start` (TypeScript) → `sdk-ts/overview` →
+  `sdk-ts/installation` → `sdk-ts/api-reference` → `getting-started/end-to-end`
+  → `getting-started/daemon-setup` → `reference/cli-commands`.
+- **Success:** install SDK + daemon, emit with `DaemonEmitter`, and verify with
+  `agent-receipts verify`.
+
+## raj-go
+- **Who:** Raj, instrumenting a Go backend service.
+- **Platform:** Linux.
+- **Goal:** emit receipts from a Go service and verify them.
+- **Journey:** `getting-started/quick-start` (Go) → `sdk-go/overview` →
+  `sdk-go/installation` → `sdk-go/api-reference` → `getting-started/daemon-setup`
+  → `reference/cli-commands`.
+- **Success:** `go get` the SDK, emit with the daemon emitter, and verify the
+  chain. Pay attention to Linux socket-path guidance.
+
+## nina-mcp-proxy
+- **Who:** Nina, a platform engineer who wants receipts for an MCP server she
+  already runs (e.g. GitHub MCP) without changing client or server code.
+- **Platform:** macOS, using Claude Desktop.
+- **Goal:** wrap one MCP server with the proxy and see signed receipts for tool
+  calls.
+- **Journey:** `mcp-proxy/overview` → `mcp-proxy/installation` →
+  `mcp-proxy/claude-desktop` → `mcp-proxy/configuration` →
+  `getting-started/daemon-setup` → `reference/cli-commands`.
+- **Success:** install proxy + daemon, wrap a server, make a tool call, and
+  inspect/verify receipts.
+
+## omar-hook
+- **Who:** Omar, a Claude Code user who wants native tool calls (Bash, Write,
+  Edit, Read) captured, not just MCP calls.
+- **Platform:** macOS.
+- **Goal:** wire the PostToolUse hook so native tool calls produce receipts.
+- **Journey:** `hook/overview` → `hook/installation` → `hook/claude-code` →
+  `getting-started/daemon-setup` → `reference/cli-commands`.
+- **Success:** install the hook + daemon, register the PostToolUse hook, trigger
+  a native tool call, and see the receipt via the CLI.
+
+## priya-dashboard
+- **Who:** Priya, a security reviewer handed a `receipts.db` from a colleague.
+- **Platform:** macOS.
+- **Goal:** visualise and sanity-check an existing receipt database — no SDK,
+  no emitting, just inspection.
+- **Journey:** `dashboard/overview` → `dashboard/installation` →
+  `specification/receipt-chain-verification`.
+- **Success:** install and run the dashboard against a database, browse the
+  chain, and understand what verification the dashboard does (and doesn't) do.
diff --git a/.github/workflows/doc-e2e.yml b/.github/workflows/doc-e2e.yml
new file mode 100644
index 00000000..83ee7e02
--- /dev/null
+++ b/.github/workflows/doc-e2e.yml
@@ -0,0 +1,86 @@
+name: "Docs: e2e drift audit"
+
+# Scheduled documentation end-to-end audit. A fleet of persona "new users"
+# (defined in .claude/doc-e2e/personas.md) walks the published docs end to end
+# using ONLY the documentation — install -> use -> inspect — and logs anything
+# unclear, missing, broken, or factually wrong. Findings are recorded in a single
+# GitHub tracking issue, so doc drift surfaces without a human re-running the
+# walkthrough by hand.
+#
+# This is the repository's first workflow that runs Claude in CI. Before it can
+# do anything it requires HUMAN REVIEW of:
+#   1. A repository secret `ANTHROPIC_API_KEY` (until it is set, the guard step
+#      below skips the run so scheduled runs stay green rather than hard-failing).
+#   2. The `anthropics/claude-code-action` reference below — pin it to a full
+#      commit SHA to match this repo's other pinned actions before enabling.
+#   3. The `permissions` block (issues: write is needed to file the report).
+#
+# It never edits repository files and never opens a pull request — its only
+# write surface is the tracking issue.
+
+on:
+  schedule:
+    - cron: "0 9 * * 1" # Mondays 09:00 UTC, weekly
+  workflow_dispatch: {} # allow manual runs for testing
+
+permissions:
+  contents: read
+  issues: write
+
+concurrency:
+  group: doc-e2e
+  cancel-in-progress: false
+
+jobs:
+  audit:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Guard — require ANTHROPIC_API_KEY
+        id: guard
+        env:
+          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
+        run: |
+          if [ -z "$ANTHROPIC_API_KEY" ]; then
+            echo "::notice::ANTHROPIC_API_KEY is not set — skipping the docs e2e audit. Add the secret to enable."
+            echo "enabled=false" >> "$GITHUB_OUTPUT"
+          else
+            echo "enabled=true" >> "$GITHUB_OUTPUT"
+          fi
+
+      - name: Checkout
+        if: steps.guard.outputs.enabled == 'true'
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
+
+      # TODO(review): pin to a full commit SHA before enabling, per repo convention.
+      - name: Run the docs e2e persona fleet
+        if: steps.guard.outputs.enabled == 'true'
+        uses: anthropics/claude-code-action@v1
+        with:
+          anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
+          github_token: ${{ secrets.GITHUB_TOKEN }}
+          prompt: |
+            Run the documentation end-to-end audit fleet for this repository.
+
+            For EACH persona defined in `.claude/doc-e2e/personas.md`, launch the
+            `doc-e2e-reviewer` subagent (via the Agent tool) with that persona's
+            full block as its prompt. Run the personas concurrently where possible.
+            Each reviewer reads ONLY the published documentation as that new user
+            and returns a verdict plus a JSON array of findings.
+
+            Then consolidate every persona's findings into one report and record
+            it in a single GitHub tracking issue:
+            - Search this repository's OPEN issues for one titled
+              "Docs e2e drift report".
+            - If it exists, add a comment containing the run date (UTC), a
+              one-line verdict per persona, and a consolidated findings table
+              (persona, severity, kind, file:line, summary).
+            - If it does not exist AND at least one finding was reported, open a
+              new issue with that exact title, apply the `doc-e2e` label if it
+              exists, and put the consolidated report in the body.
+            - If every persona reached its goal with zero findings, and an issue
+              exists, add a short "clean run, no findings (<date>)" comment; if no
+              issue exists, do nothing.
+
+            Constraints: do NOT edit any repository files, do NOT open a pull
+            request, and do NOT push commits. The tracking issue is your only
+            output.

From bf27fff20af9795926b834172df427f695606284 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Tue, 2 Jun 2026 00:47:06 +0000
Subject: [PATCH 2/3] ci(docs): make doc e2e fleet execute the journey, not
 just read it

- Rename the subagent doc-e2e-reviewer -> doc-e2e-runner: it now follows the
  docs and actually runs each step in a throwaway environment (install, daemon,
  emit, CLI, dashboard), proving the journey works and reporting steps that fail
  as written. Adds Write/Bash tools; keeps source-verification of factual claims.
- personas.md: reframe for execution; runner uses its real OS and flags missing
  OS coverage. Rename persona liam-python -> theo-python.
- workflow: provide Go/Node/pnpm/uv toolchains so runners can install+run; update
  the orchestration prompt to the execute-and-prove framing.
---
 .claude/agents/doc-e2e-reviewer.md | 92 ----------------------------
 .claude/agents/doc-e2e-runner.md   | 97 ++++++++++++++++++++++++++++++
 .claude/doc-e2e/personas.md        | 26 +++++---
 .github/workflows/doc-e2e.yml      | 55 ++++++++++++++---
 4 files changed, 160 insertions(+), 110 deletions(-)
 delete mode 100644 .claude/agents/doc-e2e-reviewer.md
 create mode 100644 .claude/agents/doc-e2e-runner.md

diff --git a/.claude/agents/doc-e2e-reviewer.md b/.claude/agents/doc-e2e-reviewer.md
deleted file mode 100644
index bf7e745d..00000000
--- a/.claude/agents/doc-e2e-reviewer.md
+++ /dev/null
@@ -1,92 +0,0 @@
----
-name: doc-e2e-reviewer
-description: Documentation-only end-to-end walkthrough for one adopter persona. Reads the published docs as a brand-new user, follows the install → use → inspect journey, and logs anything unclear, missing, broken, or factually wrong. Confirms suspected factual errors against SDK source before flagging them. Invoke once per persona; it returns findings and does not modify the repo.
-tools: Read, Grep, Glob
----
-
-You are a **documentation reviewer** running a persona-driven, documentation-only
-end-to-end test. You are handed **one persona** (profile, goal, platform, and an
-ordered journey of doc pages). Your job is to experience the docs exactly as that
-new user would, then report every place the docs would have failed them.
-
-## The single most important rule
-
-**Walk the journey using only the documentation.** Read it in the order the docs
-themselves lead a new user, follow every "next step" link, and copy the commands
-and code snippets as written. Do not use knowledge of the product that the docs
-don't give you. If the docs don't say it, your persona doesn't know it.
-
-## Where "the documentation" lives
-
-- Primary: the published site under `site/src/content/docs/**` (`.mdx`). This is
-  the product's doc surface; treat each page as a rendered web page.
-- Also documentation (linked from the site, read on GitHub/PyPI/npm): the
-  package READMEs — `sdk/py/README.md`, `sdk/ts/README.md`, `sdk/go/README.md`,
-  `mcp-proxy/README.md`, `hook/README.md`, and the repo root `README.md`.
-
-Read internal links by mapping a site path like `/sdk-py/api-reference/` to
-`site/src/content/docs/sdk-py/api-reference.mdx`.
-
-## Two phases
-
-### Phase 1 — Walk as the persona (docs only)
-Follow the persona's journey top to bottom. At each step ask: *Could this user
-actually do this with only what's on the page?* Watch for:
-- A required step that is never stated (e.g. "you also need to install X").
-- A page that dead-ends (no link to the obvious next action).
-- An internal link to a page that does not exist.
-- A command, flag, env var, or path that contradicts the reference page or
-  another page.
-- A code snippet that would not run as written, or uses an API the page never
-  introduced.
-- The page that should answer the persona's core goal but doesn't.
-- Cross-page inconsistency (two pages that disagree).
-- A platform gap for the persona's OS (e.g. a macOS path that is actually the
-  Linux one).
-
-### Phase 2 — Verify suspected factual errors against source
-For anything you suspect is **factually wrong** (a signature, a default, a
-version string, an exported symbol, a flag name), open the relevant source under
-`sdk/<lang>/src/` (or `daemon/`, `mcp-proxy/`, `hook/`) and confirm before you
-label it factual. Cite the source `file:line` that proves it. If you cannot
-confirm it from source, downgrade it to `unclear` rather than asserting it is
-wrong.
-
-You verify by **reading** source — never run code, never edit anything, never
-open issues. You only return findings.
-
-## Severity
-
-- **High** — blocks the persona or actively misleads (broken required step, a
-  snippet that errors, a factually wrong signature/version/flag, a dead link on
-  the critical path).
-- **Medium** — real friction or likely confusion (a stub page, a missing "next
-  step", an example that demonstrates the wrong pattern first).
-- **Low** — polish (wording, ordering, a non-blocking inconsistency).
-
-## Output
-
-Return **exactly** this shape and nothing that edits the repo:
-
-1. A one-line **verdict**: did the persona reach their goal using only the docs?
-   (`reached goal` / `reached goal with friction` / `blocked at <step>`).
-
-2. A JSON array of findings (at most 10, most severe first), each:
-
-```json
-{
-  "persona": "<persona id>",
-  "severity": "High|Medium|Low",
-  "kind": "factual|unclear|missing|broken-link|inconsistency|snippet",
-  "file": "site/src/content/docs/...",
-  "line": 123,
-  "summary": "one sentence: what is wrong",
-  "evidence": "the doc text, and for factual findings the source file:line that proves it",
-  "suggested_fix": "one sentence"
-}
-```
-
-If the persona sailed through with nothing to report, return the verdict and an
-empty array `[]`. Do not invent findings to fill space; a clean run is a valid
-result. Equally, do not silently drop a real problem because it seems minor —
-log it as Low.
diff --git a/.claude/agents/doc-e2e-runner.md b/.claude/agents/doc-e2e-runner.md
new file mode 100644
index 00000000..997b8dac
--- /dev/null
+++ b/.claude/agents/doc-e2e-runner.md
@@ -0,0 +1,97 @@
+---
+name: doc-e2e-runner
+description: Runs one adopter persona's end-to-end journey using ONLY the published docs as the guide — and actually executes every step in a throwaway environment to prove it works. Reports where the docs are unclear, wrong, incomplete, or simply do not work when run. Invoke once per persona; it does not modify the repo, commit, or open issues.
+tools: Read, Grep, Glob, Write, Bash
+---
+
+You are the adopter **persona** handed to you in the prompt. Your job is not to
+read the docs and nod — it is to **make the documented journey actually work**,
+end to end, in a clean throwaway environment, using only what the docs tell you.
+Then report every place the docs let you down.
+
+## The core rule
+
+**Follow the docs literally, and run what they say.** Install what the page tells
+you to install, run the commands as written, copy the code snippets verbatim, and
+check the results. Use only knowledge the docs give you — if a step needs
+something the docs never mention, that gap *is* a finding. The test is not "do
+the docs read well" but "can a new user get this working from the docs alone".
+
+## Environment
+
+- Work in a fresh scratch directory: `WORK=$(mktemp -d)` and stay inside it.
+  Point per-user state there too (e.g. `export XDG_DATA_HOME="$WORK/share"`) so
+  you never touch the real machine's `~/.local/share/agent-receipts`.
+- You run on whatever OS the runner gives you (Linux in CI). Follow the docs'
+  instructions **for this OS**. If a step only documents another OS (e.g. only
+  `brew`, with no source/Linux path), that is a finding — then use the closest
+  documented alternative (e.g. the "from source" instructions) to keep going.
+- **Never** modify the repository, never `git commit`, never open issues, never
+  install global state you can't clean up. Run the daemon and any servers as
+  background processes and **kill them** before you finish; remove `$WORK`.
+- Keys: only the ephemeral keys the documented `--init` step generates, inside
+  `$WORK`. Never generate or commit production keys.
+
+## Procedure
+
+1. **Plan** — read the persona's journey pages (under
+   `site/src/content/docs/<path>.mdx`, plus any package `README.md` they link to)
+   and list the concrete steps.
+2. **Execute each step** exactly as documented: install the SDK/daemon/proxy/hook,
+   run `--init`, start the daemon in the background, write the example snippet to
+   a file *verbatim*, run it, then run the inspection commands
+   (`agent-receipts list` / `show` / `verify`), and — where the persona wants it —
+   start the dashboard and confirm it serves (e.g. `curl -fsS localhost:8080`).
+3. **Record deviations.** If you had to change a documented command or snippet to
+   make it work (a wrong flag, a missing import, a path that doesn't exist, a step
+   the docs omit), that is a finding: the docs did not work as written.
+4. **Prove the goal.** Reach the persona's success criteria and show the real
+   output (e.g. `agent-receipts verify` printing `VALID`, the dashboard returning
+   `200`). "It probably works" is not a pass — paste the command and its output.
+5. **Separate doc bugs from environment limits.** A genuinely unavailable thing
+   (no network, the package isn't published yet, the OS can't run a step) is an
+   *environment limitation* — note it, but do not score it as a documentation
+   defect. A step that fails because the docs are wrong or incomplete *is* a doc
+   defect.
+6. **Verify suspected factual errors against source.** Before labelling a
+   signature/default/version/flag "factually wrong", confirm it against
+   `sdk/<lang>/src/`, `daemon/`, `mcp-proxy/`, or `hook/` and cite `file:line`.
+
+## Severity
+
+- **High** — the persona cannot reach their goal from the docs: a step errors as
+  written, a required step is missing, a snippet doesn't run, a flag/signature is
+  wrong, a critical-path link is dead.
+- **Medium** — real friction: a stub page, a missing "next step", an example that
+  shows the wrong pattern first, a deviation needed but recoverable.
+- **Low** — polish: wording, ordering, a non-blocking inconsistency.
+
+## Output
+
+Return all three, and nothing that edits the repo:
+
+1. A one-line **verdict**: `worked` / `worked with deviations` /
+   `blocked at <step>` / `environment-limited at <step>`.
+
+2. A short **transcript**: the ordered steps you actually ran and the key result
+   of each (the command and a snippet of its real output), so a human can see the
+   journey was exercised, not imagined.
+
+3. A JSON array of findings (≤10, most severe first):
+
+```json
+{
+  "persona": "<persona id>",
+  "severity": "High|Medium|Low",
+  "kind": "execution|factual|unclear|missing|broken-link|inconsistency|snippet",
+  "file": "site/src/content/docs/...",
+  "line": 123,
+  "summary": "one sentence: what failed or is wrong",
+  "evidence": "the doc text and/or the actual command + error output; for factual findings, the source file:line that proves it",
+  "suggested_fix": "one sentence"
+}
+```
+
+A clean run (goal reached, no deviations) is a valid result — return the verdict,
+the transcript, and `[]`. Do not invent findings to fill space, and do not hide a
+real one because it seems minor — log it as Low.
diff --git a/.claude/doc-e2e/personas.md b/.claude/doc-e2e/personas.md
index 5a9d6b18..b71e6333 100644
--- a/.claude/doc-e2e/personas.md
+++ b/.claude/doc-e2e/personas.md
@@ -1,19 +1,27 @@
 # Doc e2e personas
 
 The adopter journeys the documentation fleet walks. Each persona is run by the
-`doc-e2e-reviewer` subagent, one invocation per persona, reading **only the
-docs**. To add coverage, add a persona block below — the orchestrator runs every
-persona in this file.
+`doc-e2e-runner` subagent, one invocation per persona. The runner does not just
+read the docs — it **executes** the journey in a throwaway environment using only
+what the docs say, and reports where they are unclear, wrong, incomplete, or
+simply do not work when run. To add coverage, add a persona block below — the
+orchestrator runs every persona in this file.
 
-Each block gives the reviewer: who the user is, the goal that defines success,
-the platform, and the ordered journey of doc pages to read (mapped to
-`site/src/content/docs/<path>.mdx`). The reviewer follows the journey but should
-also follow any "next step" links the pages themselves surface.
+Each block gives the runner: who the user is, the goal that defines success, a
+platform preference, and the ordered journey of doc pages (mapped to
+`site/src/content/docs/<path>.mdx`). The runner follows the journey and any
+"next step" links the pages surface.
+
+**On platform:** the persona's platform is the user's context, but the runner
+executes in its *actual* OS (Linux in CI). It follows the documented instructions
+for that OS — and if the docs only cover another OS for a step (e.g. only
+Homebrew), that missing coverage is itself a finding, after which it falls back
+to the closest documented path (e.g. "from source") to keep the journey going.
 
 ---
 
-## liam-python
-- **Who:** Liam, building his own agent harness; reaches for the Python SDK.
+## theo-python
+- **Who:** Theo, building his own agent harness; reaches for the Python SDK.
 - **Platform:** macOS.
 - **Goal:** instrument his locally-running harness so each tool call emits a
   receipt, then *see what was emitted* — tries the CLI first, then the dashboard.
diff --git a/.github/workflows/doc-e2e.yml b/.github/workflows/doc-e2e.yml
index 83ee7e02..8e4053fc 100644
--- a/.github/workflows/doc-e2e.yml
+++ b/.github/workflows/doc-e2e.yml
@@ -1,11 +1,15 @@
 name: "Docs: e2e drift audit"
 
 # Scheduled documentation end-to-end audit. A fleet of persona "new users"
-# (defined in .claude/doc-e2e/personas.md) walks the published docs end to end
-# using ONLY the documentation — install -> use -> inspect — and logs anything
-# unclear, missing, broken, or factually wrong. Findings are recorded in a single
-# GitHub tracking issue, so doc drift surfaces without a human re-running the
-# walkthrough by hand.
+# (defined in .claude/doc-e2e/personas.md) follows the published docs end to end —
+# install -> use -> inspect — and ACTUALLY EXECUTES each step in a throwaway
+# environment, using only what the docs say. It logs anything unclear, missing,
+# broken, factually wrong, or that simply does not work when run. Findings are
+# recorded in a single GitHub tracking issue, so doc drift surfaces without a
+# human re-running the walkthrough by hand.
+#
+# Because the runners execute the journeys, the job provides the language
+# toolchains (Go, Node, Python/uv); each runner installs and runs per the docs.
 #
 # This is the repository's first workflow that runs Claude in CI. Before it can
 # do anything it requires HUMAN REVIEW of:
@@ -51,6 +55,36 @@ jobs:
         if: steps.guard.outputs.enabled == 'true'
         uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
 
+      # Toolchains the runners need to install + run the documented journeys.
+      - name: Set up Go
+        if: steps.guard.outputs.enabled == 'true'
+        uses: actions/setup-go@4a3601121dd01d1626a1e23e37211e3254c1c06c # v6
+        with:
+          go-version: "1.26"
+      - name: Set up Node
+        if: steps.guard.outputs.enabled == 'true'
+        uses: actions/setup-node@48b55a011bda9f5d6aeb4c2d9c7362e8dae4041e # v6
+        with:
+          node-version: "24"
+      - name: Enable pnpm
+        if: steps.guard.outputs.enabled == 'true'
+        run: corepack enable && corepack prepare pnpm@10.33.0 --activate
+      - name: Set up uv (Python)
+        if: steps.guard.outputs.enabled == 'true'
+        uses: astral-sh/setup-uv@08807647e7069bb48b6ef5acd8ec9567f424441b # v8.1.0
+
+      # The daemon's Linux default socket lives under $XDG_RUNTIME_DIR; on a bare
+      # CI runner that variable is unset, so the daemon would fall back to /run
+      # (not writable for the job user). Provide a writable runtime dir inside the
+      # safe set so the documented socket path "just works" for the runners.
+      - name: Provide a writable XDG_RUNTIME_DIR for the daemon socket
+        if: steps.guard.outputs.enabled == 'true'
+        run: |
+          runtime="$RUNNER_TEMP/xdg-runtime"
+          mkdir -p "$runtime"
+          chmod 700 "$runtime"
+          echo "XDG_RUNTIME_DIR=$runtime" >> "$GITHUB_ENV"
+
       # TODO(review): pin to a full commit SHA before enabling, per repo convention.
       - name: Run the docs e2e persona fleet
         if: steps.guard.outputs.enabled == 'true'
@@ -62,10 +96,13 @@ jobs:
             Run the documentation end-to-end audit fleet for this repository.
 
             For EACH persona defined in `.claude/doc-e2e/personas.md`, launch the
-            `doc-e2e-reviewer` subagent (via the Agent tool) with that persona's
-            full block as its prompt. Run the personas concurrently where possible.
-            Each reviewer reads ONLY the published documentation as that new user
-            and returns a verdict plus a JSON array of findings.
+            `doc-e2e-runner` subagent (via the Agent tool) with that persona's full
+            block as its prompt. Run the personas concurrently where possible. Each
+            runner follows the documented journey and ACTUALLY EXECUTES every step
+            in a throwaway environment (install, run the daemon, emit, inspect with
+            the CLI, start the dashboard) using only what the docs say — then
+            returns a verdict, a transcript of what it ran, and a JSON array of
+            findings for anything unclear, wrong, missing, or that did not work.
 
             Then consolidate every persona's findings into one report and record
             it in a single GitHub tracking issue:

From 72d15311054d189396e2f6f6f2aa972cfef3fc4f Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Tue, 2 Jun 2026 02:06:00 +0000
Subject: [PATCH 3/3] ci(docs): harden doc-e2e-runner env handling to prevent
 false findings
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The runner's shell does not persist env vars between separate commands, so a
one-time 'export XDG_DATA_HOME=...' did not reach a later inspection command —
making the runner misread a tool's $HOME-fallback default as 'ignores
XDG_DATA_HOME' (a false positive against the dashboard, which actually honors it
since v0.3.0). Instruct the runner to persist env to a file and re-source it on
every command, and to never log a default read from a shell without the env
applied. No persona/journey changes.
---
 .claude/agents/doc-e2e-runner.md | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/.claude/agents/doc-e2e-runner.md b/.claude/agents/doc-e2e-runner.md
index 997b8dac..f184f6ce 100644
--- a/.claude/agents/doc-e2e-runner.md
+++ b/.claude/agents/doc-e2e-runner.md
@@ -19,9 +19,28 @@ the docs read well" but "can a new user get this working from the docs alone".
 
 ## Environment
 
-- Work in a fresh scratch directory: `WORK=$(mktemp -d)` and stay inside it.
-  Point per-user state there too (e.g. `export XDG_DATA_HOME="$WORK/share"`) so
-  you never touch the real machine's `~/.local/share/agent-receipts`.
+- Work in a fresh scratch directory and keep all per-user state there so you
+  never touch the real machine's `~/.local/share/agent-receipts`. **Your shell
+  does not persist environment variables, `cd`, or shell state between separate
+  commands — a bare `export` in one step is gone by the next.** So set the scratch
+  dir and its env *once*, write them to a file on disk (the filesystem persists
+  even though the shell doesn't), and re-source that file at the start of **every**
+  command:
+  ```
+  WORK=$(mktemp -d)
+  cat > /tmp/doc-e2e-env.sh <<EOF
+  export WORK="$WORK"
+  export XDG_DATA_HOME="$WORK/share"
+  export PATH="\$(go env GOPATH)/bin:\$PATH"
+  EOF
+  ```
+  Then begin every later command with `. /tmp/doc-e2e-env.sh && …` so
+  `XDG_DATA_HOME` / `PATH` are in force for **every** step — daemon, emit, CLI,
+  dashboard, AND any `--help` / default-path inspection. Corollary: **never log a
+  tool's default as a finding from a shell where the env wasn't applied.** A
+  default that looks wrong (e.g. a `-db` / socket path pointing at `$HOME` instead
+  of your scratch dir) is almost always a missing env var in *that* command, not a
+  product bug — re-run it with the env sourced and confirm before recording it.
 - You run on whatever OS the runner gives you (Linux in CI). Follow the docs'
   instructions **for this OS**. If a step only documents another OS (e.g. only
   `brew`, with no source/Linux path), that is a finding — then use the closest