fix: address dogfood run issues from first CI execution (#540)

patricklafrance · web-flow · commit a5a98125e4a7 · 2026-02-27T21:15:06.000-05:00
diff --git a/.github/prompts/dogfood.md b/.github/prompts/dogfood.md
@@ -6,7 +6,7 @@
 
 ## Task
 
-Run an exploratory QA session on the endpoints sample app using the `/dogfood` skill, then report findings as a GitHub issue.
+Run an exploratory QA session on the endpoints sample app using the agent-browser dogfood skill, then report findings as a GitHub issue.
 
 ### Step 1 — Start servers
 
@@ -16,26 +16,25 @@ Start the endpoint servers in the background and wait for them to be ready:
 pnpm serve-endpoints > /tmp/endpoints-serve.log 2>&1 &
 ```
 
-Wait for both servers to be ready:
+Wait for both servers to be ready (single command to save a turn):
 
 ```bash
-curl --retry 30 --retry-delay 5 --retry-connrefused --silent --output /dev/null http://localhost:8080
-curl --retry 30 --retry-delay 5 --retry-connrefused --silent --output /dev/null http://localhost:8081
+curl --retry 30 --retry-delay 5 --retry-connrefused --silent --output /dev/null http://localhost:8080 && curl --retry 30 --retry-delay 5 --retry-connrefused --silent --output /dev/null http://localhost:8081
 ```
 
 If either curl command fails, run `cat /tmp/endpoints-serve.log` for diagnostics and stop.
 
-### Step 2 — Run the dogfood skill
+### Step 2 — Run the dogfood session
 
-Invoke the `/dogfood` skill with these parameters:
+Read and follow the skill instructions at `node_modules/agent-browser/skills/dogfood/SKILL.md` with these parameters:
 - **Target URL**: `http://localhost:8080`
 - **Session name**: `endpoints`
 - **Output directory**: `/tmp/dogfood-output`
 - **Auth credentials**: username `temp`, password `temp`
 
 ### Step 3 — Known noise (IGNORE these)
 
-Tell the skill to ignore:
+Apply these ignore rules during the dogfood session:
 - `/federated-tabs/failing` — intentionally throws to exercise error boundaries. The error boundary UI is expected.
 - MSW (Mock Service Worker) console warnings — expected in this mock-data app.
 - React warnings and deprecation notices — only flag actual JS errors/exceptions.
@@ -47,19 +46,27 @@ After the skill completes, read the generated report at `/tmp/dogfood-output/rep
 - **If the report contains zero issues**: end with "DOGFOOD PASSED — no issues found" and stop.
 - **If the report contains issues**:
 
-  1. **Identify referenced evidence** — Using the Grep tool, find all `screenshots/*.png` and `videos/*.webm` paths referenced in the report. Only these files should be uploaded (ignore exploration screenshots not cited in the report).
+  1. **Identify referenced evidence** — Using the Grep tool (parameter is `path`, not `file_path`), find all `screenshots/*.png` and `videos/*.webm` paths referenced in the report. Only these files should be uploaded (ignore exploration screenshots not cited in the report).
 
   2. **Push evidence to the `dogfood-evidence` branch** — Evidence is stored in date-stamped directories (`YYYY-MM-DD/screenshots/`, `YYYY-MM-DD/videos/`) so each run's evidence persists and old issue links keep working.
-     - Configure git auth:
+     - Configure git auth (use `--global` so it applies to any repo or clone):
        ```bash
-       git config user.name "github-actions[bot]"
-       git config user.email "github-actions[bot]@users.noreply.github.com"
+       git config --global user.name "github-actions[bot]"
+       git config --global user.email "github-actions[bot]@users.noreply.github.com"
        git remote set-url origin "https://x-access-token:${GH_TOKEN}@github.com/workleap/wl-squide.git"
        ```
      - Fetch or create the `dogfood-evidence` branch:
        - If it exists: `git fetch origin dogfood-evidence && git checkout -B dogfood-evidence origin/dogfood-evidence`
        - If not: `git checkout --orphan dogfood-evidence && git rm -rf . 2>/dev/null || true`
-     - **Prune old evidence** — Delete any date directories older than 60 days using `find` and `rm`, then commit the deletions (if any).
+     - **Prune old evidence** — This step is mandatory even if you expect nothing to prune. Delete date directories older than 60 days, then commit the deletions if any files were removed:
+       ```bash
+       CUTOFF=$(date -d '60 days ago' +%Y-%m-%d)
+       for d in ./[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]/; do
+         name=$(basename "$d")
+         [[ "$name" < "$CUTOFF" ]] && rm -rf "$d"
+       done
+       git add -A && git diff --cached --quiet || git commit -m "Prune evidence older than 60 days"
+       ```
      - **Add new evidence** — Create `YYYY-MM-DD/screenshots/` and `YYYY-MM-DD/videos/`, copy only the referenced files, stage, commit, push.
 
   3. **Rewrite evidence paths** in the report — Replace relative asset paths with absolute GitHub URLs so images render in the issue. First, capture today's date into a variable, then use it in the sed replacements:
diff --git a/.github/prompts/update-agent-docs.md b/.github/prompts/update-agent-docs.md
@@ -34,7 +34,7 @@ The prompt includes a mode indicator (e.g., `mode: full-audit` or `mode: last-co
 
 For `push` triggers, read the diff and determine which documentation files are affected by the changes. For `workflow_dispatch` triggers, review all documentation files against the current codebase.
 
-Also check whether recent changes contain architectural decisions that lack an ADR. Read existing ADRs in `agent-docs/adr/` first. If you find a new pattern, a replaced dependency, an infrastructure change, or a choice between viable approaches — and no existing ADR covers it — write a new ADR following the process in `agent-docs/adr/README.md`. Set its status to `proposed`.
+Also check whether recent changes contain architectural decisions that lack an ADR. Read existing ADRs in `agent-docs/adr/` first. If you find a new pattern, a replaced dependency, an infrastructure change, or a choice between viable approaches — and no existing ADR covers it — write a new ADR following the process in `agent-docs/adr/README.md`. Set its status to `proposed`. When writing the ADR, follow the ADR vs docs boundary rules below — record the decision rationale, not operational details.
 
 ### Context
 
@@ -84,6 +84,20 @@ When documenting Squide:
 - ONLY modify files under `agent-docs/` and `AGENTS.md` at the root. Modifying files outside this set will cause an infinite workflow loop.
 - Do NOT modify `CLAUDE.md`.
 
+### ADR vs docs boundary
+
+ADRs record **why** a decision was made (the problem, the alternatives, the chosen option, and the trade-offs accepted). Operational details about **how** the decision is implemented belong in `agent-docs/docs/`.
+
+- **Belongs in an ADR:** the problem that motivated the decision, options evaluated, which option was chosen and why, architectural trade-offs accepted.
+- **Belongs in `agent-docs/docs/`:** file paths and storage locations, URL rewriting patterns, CLI commands and flags, permissions and access controls, step-by-step operational procedures, server start/build commands.
+
+Examples:
+
+- **Good ADR sentence:** "Evidence is stored on an orphan branch so GitHub issue links remain stable across runs. See [ci-cd.md](../docs/references/ci-cd.md) for operational details."
+- **Bad ADR sentence:** "Evidence files are pushed to the `dogfood-evidence` branch using `git push --force`, and URLs are rewritten from `./screenshots/` to `https://raw.githubusercontent.com/...`."
+
+**Never put operational details (commands, paths, configs, permissions, URL patterns) into an ADR.** State the decision and its rationale, then link to the relevant `agent-docs/docs/` file for implementation specifics. Operational content in ADRs drifts from the actual implementation and misleads agents into following stale procedures instead of reading the source of truth.
+
 ### AGENTS.md requirements
 
 AGENTS.md must stay between 80–150 lines. It must contain:
diff --git a/.github/workflows/dogfood.yml b/.github/workflows/dogfood.yml
@@ -59,7 +59,7 @@ jobs:
           show_full_output: true
           claude_args: >-
             --max-turns 200
-            --allowedTools Read,Glob,Grep,Skill,Bash(agent-browser:*),Bash(mkdir:*),Bash(cp:*),Bash(sleep:*),Bash(cat:*),Bash(gh:*),Bash(date:*),Bash(pnpm:*),Bash(curl:*),Bash(git:*),Bash(sed:*),Bash(rm:*),Bash(find:*),Bash(ls:*)
+            --allowedTools Read,Glob,Grep,Bash(agent-browser:*),Bash(mkdir:*),Bash(cp:*),Bash(sleep:*),Bash(cat:*),Bash(gh:*),Bash(date:*),Bash(pnpm:*),Bash(curl:*),Bash(git:*),Bash(sed:*),Bash(rm:*),Bash(find:*),Bash(ls:*)
         env:
           # Required by gh CLI to create issues for dogfood findings.
           GH_TOKEN: ${{ github.token }}
diff --git a/agent-docs/adr/0008-environment-variables-on-runtime.md b/agent-docs/adr/0008-environment-variables-on-runtime.md
@@ -22,7 +22,7 @@ Variables are provided at initialization time via the `environmentVariables` opt
 
 TypeScript type safety is achieved through module augmentation: consumers declare their variable names and types by augmenting the empty `EnvironmentVariables` interface exported from `@squide/env-vars`. This gives compile-time checking on `useEnvironmentVariable("apiUrl")` — both the key name and the return type are validated.
 
-Evidence: `packages/env-vars/src/EnvironmentVariablesPlugin.ts` creates the plugin and stores variables in `EnvironmentVariablesRegistry`. `packages/env-vars/src/EnvironmentVariablesRegistry.ts` implements the duplicate-key detection logic. `packages/firefly/src/initializeFirefly.ts` (lines 167-184) always instantiates the plugin. The TypeScript module augmentation pattern is documented in `docs/reference/runtime/runtime-class.md`.
+Evidence: `packages/env-vars/src/EnvironmentVariablesPlugin.ts` creates the plugin and stores variables in `EnvironmentVariablesRegistry`. `packages/env-vars/src/EnvironmentVariablesRegistry.ts` implements the duplicate-key detection logic. `packages/firefly/src/initializeFirefly.ts` (lines 167-184) always instantiates the plugin. The TypeScript module augmentation pattern is documented in `docs/reference/runtime/FireflyRuntime.md`.
 
 ## Consequences
 
diff --git a/agent-docs/adr/0015-lean-yml-markdown-prompt-pattern.md b/agent-docs/adr/0015-lean-yml-markdown-prompt-pattern.md
@@ -30,7 +30,7 @@ Seven agent workflows follow this pattern, each with a 1:1 matching prompt file:
 | `update-agent-docs.yml` | `update-agent-docs.md` (156 lines) | Multi-step with subagent coherence validation, creates PRs |
 | `update-dependencies.yml` | `update-dependencies.md` (164 lines) | Most complex: validation loop with browser testing, changeset creation |
 | `smoke-test.yml` | `smoke-test.md` (42 lines) | PR-triggered: starts dev server, runs browser smoke test on endpoints app |
-| `dogfood.yml` | `dogfood.md` (37 lines) | Scheduled: starts dev server, runs exploratory QA via `/dogfood` skill, files issues |
+| `dogfood.yml` | `dogfood.md` | Scheduled: runs exploratory QA via agent-browser dogfood skill, uploads evidence, files issues |
 
 The remaining workflows (`ci.yml`, `pr-pkg.yml`, `changeset.yml`, `retype-action.yml`) are traditional CI pipelines without agents. `claude.yml` is a generic claude-code-action step without a dedicated prompt file (used for ad-hoc PR interactions).
 
diff --git a/agent-docs/adr/0031-ai-driven-browser-qa-in-ci.md b/agent-docs/adr/0031-ai-driven-browser-qa-in-ci.md
@@ -24,7 +24,7 @@ Two separate QA needs were identified:
 Option 2, with the two use cases split into separate workflows:
 
 - **`smoke-test.yml` + `smoke-test.md`** (ADR-0015 lean YML + prompt pattern): triggered on PRs to main affecting packages or the endpoints app. The agent navigates a fixed list of pages, captures `agent-browser snapshot -i` (text) and `agent-browser console` output, and ends with `SMOKE TEST PASSED` or `SMOKE TEST FAILED`. Max 50 turns. No screenshots.
-- **`dogfood.yml` + `dogfood.md`** (same pattern): triggered on a monthly schedule (15th of each month) and on-demand. Invokes the `/dogfood` agent skill for exploratory QA. Max 200 turns. Files a GitHub issue if issues are found, stops silently if none.
+- **`dogfood.yml` + `dogfood.md`** (same pattern): triggered on a monthly schedule (15th of each month) and on-demand. Runs an `agent-browser` dogfood session following the skill instructions in `SKILL.md` against a production-like build (`pnpm serve-endpoints`). Max 200 turns. Evidence (screenshots, videos) is persisted on a Git orphan branch for linkability from GitHub issues. Files a GitHub issue if issues are found, stops silently if none.
 
 Evidence: `.github/workflows/smoke-test.yml`, `.github/workflows/dogfood.yml`, `.github/prompts/smoke-test.md`, `.github/prompts/dogfood.md`.
 
@@ -33,5 +33,6 @@ Evidence: `.github/workflows/smoke-test.yml`, `.github/workflows/dogfood.yml`, `
 - No test code to maintain — the smoke test is defined as a page list in `smoke-test.md`. Adding a new page means adding one line to the prompt file.
 - The dogfood session can discover issues outside the fixed page list, providing broader coverage.
 - AI-driven tests are less deterministic than scripted tests. False positives (AI misreads the UI) are possible but expected to be rare for binary PASS/FAIL outcomes.
-- Both workflows use `agent-browser install --with-deps` and start the dev server before the agent runs, adding setup time (~2 min for smoke test, ~5 min for dogfood including QA time).
+- Both workflows use `agent-browser install --with-deps`. Smoke test starts the dev server (`pnpm dev-endpoints`); dogfood builds and serves a production-like build (`pnpm serve-endpoints`), adding extra build time (~5–10 min for dogfood including build and QA time).
+- Dogfood evidence is stored on the `dogfood-evidence` orphan branch so GitHub issue links remain stable across runs. See [ci-cd.md](../docs/references/ci-cd.md#dogfood-workflow) for operational details.
 - Dogfood findings are filed as GitHub issues for human triage — the workflow does not block PRs or deployments.
diff --git a/agent-docs/docs/quality/testing.md b/agent-docs/docs/quality/testing.md
@@ -20,9 +20,9 @@ pnpm turbo run test --filter=@squide/core
 
 Use **agent-browser** (see `.agents/skills/agent-browser/`) to validate sample apps. It is installed as a workspace devDependency. A build alone is not sufficient — you must start the dev server and verify pages in a real browser.
 
-### Endpoints sample (`pnpm dev-endpoints`)
+### Endpoints sample (local dev: `pnpm dev-endpoints`, CI dogfood: `pnpm serve-endpoints`)
 
-1. Start the dev server in the background: `pnpm dev-endpoints`
+1. Start the dev server in the background: `pnpm dev-endpoints` (for local validation) or `pnpm serve-endpoints` (for production-like validation, used by the CI dogfood workflow)
 2. The app listens on port **8080**. Wait for it to be ready: `curl --retry 30 --retry-delay 5 --retry-connrefused --silent --output /dev/null http://localhost:8080`
 3. The app has a mock login page. Use username `temp` and password `temp` to authenticate.
 4. Navigate to each page and verify it renders without errors:
diff --git a/agent-docs/docs/references/ci-cd.md b/agent-docs/docs/references/ci-cd.md
@@ -17,6 +17,23 @@
 | Smoke Test | `.github/workflows/smoke-test.yml` | PRs to main (packages, endpoints, workflow changes) | Automated smoke test of endpoints app |
 | Dogfood | `.github/workflows/dogfood.yml` | 15th of month | Exploratory QA of endpoints app |
 
+## Dogfood Workflow
+
+The dogfood workflow (`dogfood.yml` + `dogfood.md`) runs monthly exploratory QA on the endpoints sample app. Key operational details:
+
+**Server**: Uses `pnpm serve-endpoints` (production-like build), not `pnpm dev-endpoints`. This tests built output rather than dev mode.
+
+**Evidence handling**: Screenshots and videos produced during the session are stored on the `dogfood-evidence` orphan branch in date-stamped directories (`YYYY-MM-DD/screenshots/`, `YYYY-MM-DD/videos/`). The workflow:
+1. Fetches or creates the `dogfood-evidence` orphan branch
+2. Prunes evidence directories older than 60 days
+3. Copies only report-referenced screenshots/videos into the date directory
+4. Force-pushes the branch (requires `contents: write` permission)
+5. Rewrites relative asset paths in the report to `https://raw.githubusercontent.com/workleap/wl-squide/dogfood-evidence/YYYY-MM-DD/...` so images render in GitHub issues
+
+**Issue creation**: If issues are found, files a GitHub issue with the rewritten report via `gh issue create` (requires `issues: write` permission). Stops silently if no issues.
+
+**Prompt file**: `.github/prompts/dogfood.md` contains the full operational steps.
+
 ## CI Pipeline Details
 
 The main CI workflow (`ci.yml`) runs on `ubuntu-latest` with: