Skip to content

Commit a5a9812

Browse files
fix: address dogfood run issues from first CI execution (#540)
1 parent 81c4dd3 commit a5a9812

8 files changed

Lines changed: 59 additions & 20 deletions

File tree

.github/prompts/dogfood.md

Lines changed: 19 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
## Task
88

9-
Run an exploratory QA session on the endpoints sample app using the `/dogfood` skill, then report findings as a GitHub issue.
9+
Run an exploratory QA session on the endpoints sample app using the agent-browser dogfood skill, then report findings as a GitHub issue.
1010

1111
### Step 1 — Start servers
1212

@@ -16,26 +16,25 @@ Start the endpoint servers in the background and wait for them to be ready:
1616
pnpm serve-endpoints > /tmp/endpoints-serve.log 2>&1 &
1717
```
1818

19-
Wait for both servers to be ready:
19+
Wait for both servers to be ready (single command to save a turn):
2020

2121
```bash
22-
curl --retry 30 --retry-delay 5 --retry-connrefused --silent --output /dev/null http://localhost:8080
23-
curl --retry 30 --retry-delay 5 --retry-connrefused --silent --output /dev/null http://localhost:8081
22+
curl --retry 30 --retry-delay 5 --retry-connrefused --silent --output /dev/null http://localhost:8080 && curl --retry 30 --retry-delay 5 --retry-connrefused --silent --output /dev/null http://localhost:8081
2423
```
2524

2625
If either curl command fails, run `cat /tmp/endpoints-serve.log` for diagnostics and stop.
2726

28-
### Step 2 — Run the dogfood skill
27+
### Step 2 — Run the dogfood session
2928

30-
Invoke the `/dogfood` skill with these parameters:
29+
Read and follow the skill instructions at `node_modules/agent-browser/skills/dogfood/SKILL.md` with these parameters:
3130
- **Target URL**: `http://localhost:8080`
3231
- **Session name**: `endpoints`
3332
- **Output directory**: `/tmp/dogfood-output`
3433
- **Auth credentials**: username `temp`, password `temp`
3534

3635
### Step 3 — Known noise (IGNORE these)
3736

38-
Tell the skill to ignore:
37+
Apply these ignore rules during the dogfood session:
3938
- `/federated-tabs/failing` — intentionally throws to exercise error boundaries. The error boundary UI is expected.
4039
- MSW (Mock Service Worker) console warnings — expected in this mock-data app.
4140
- React warnings and deprecation notices — only flag actual JS errors/exceptions.
@@ -47,19 +46,27 @@ After the skill completes, read the generated report at `/tmp/dogfood-output/rep
4746
- **If the report contains zero issues**: end with "DOGFOOD PASSED — no issues found" and stop.
4847
- **If the report contains issues**:
4948

50-
1. **Identify referenced evidence** — Using the Grep tool, find all `screenshots/*.png` and `videos/*.webm` paths referenced in the report. Only these files should be uploaded (ignore exploration screenshots not cited in the report).
49+
1. **Identify referenced evidence** — Using the Grep tool (parameter is `path`, not `file_path`), find all `screenshots/*.png` and `videos/*.webm` paths referenced in the report. Only these files should be uploaded (ignore exploration screenshots not cited in the report).
5150

5251
2. **Push evidence to the `dogfood-evidence` branch** — Evidence is stored in date-stamped directories (`YYYY-MM-DD/screenshots/`, `YYYY-MM-DD/videos/`) so each run's evidence persists and old issue links keep working.
53-
- Configure git auth:
52+
- Configure git auth (use `--global` so it applies to any repo or clone):
5453
```bash
55-
git config user.name "github-actions[bot]"
56-
git config user.email "github-actions[bot]@users.noreply.github.com"
54+
git config --global user.name "github-actions[bot]"
55+
git config --global user.email "github-actions[bot]@users.noreply.github.com"
5756
git remote set-url origin "https://x-access-token:${GH_TOKEN}@github.com/workleap/wl-squide.git"
5857
```
5958
- Fetch or create the `dogfood-evidence` branch:
6059
- If it exists: `git fetch origin dogfood-evidence && git checkout -B dogfood-evidence origin/dogfood-evidence`
6160
- If not: `git checkout --orphan dogfood-evidence && git rm -rf . 2>/dev/null || true`
62-
- **Prune old evidence** — Delete any date directories older than 60 days using `find` and `rm`, then commit the deletions (if any).
61+
- **Prune old evidence** — This step is mandatory even if you expect nothing to prune. Delete date directories older than 60 days, then commit the deletions if any files were removed:
62+
```bash
63+
CUTOFF=$(date -d '60 days ago' +%Y-%m-%d)
64+
for d in ./[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]/; do
65+
name=$(basename "$d")
66+
[[ "$name" < "$CUTOFF" ]] && rm -rf "$d"
67+
done
68+
git add -A && git diff --cached --quiet || git commit -m "Prune evidence older than 60 days"
69+
```
6370
- **Add new evidence** — Create `YYYY-MM-DD/screenshots/` and `YYYY-MM-DD/videos/`, copy only the referenced files, stage, commit, push.
6471

6572
3. **Rewrite evidence paths** in the report — Replace relative asset paths with absolute GitHub URLs so images render in the issue. First, capture today's date into a variable, then use it in the sed replacements:

.github/prompts/update-agent-docs.md

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ The prompt includes a mode indicator (e.g., `mode: full-audit` or `mode: last-co
3434

3535
For `push` triggers, read the diff and determine which documentation files are affected by the changes. For `workflow_dispatch` triggers, review all documentation files against the current codebase.
3636

37-
Also check whether recent changes contain architectural decisions that lack an ADR. Read existing ADRs in `agent-docs/adr/` first. If you find a new pattern, a replaced dependency, an infrastructure change, or a choice between viable approaches — and no existing ADR covers it — write a new ADR following the process in `agent-docs/adr/README.md`. Set its status to `proposed`.
37+
Also check whether recent changes contain architectural decisions that lack an ADR. Read existing ADRs in `agent-docs/adr/` first. If you find a new pattern, a replaced dependency, an infrastructure change, or a choice between viable approaches — and no existing ADR covers it — write a new ADR following the process in `agent-docs/adr/README.md`. Set its status to `proposed`. When writing the ADR, follow the ADR vs docs boundary rules below — record the decision rationale, not operational details.
3838

3939
### Context
4040

@@ -84,6 +84,20 @@ When documenting Squide:
8484
- ONLY modify files under `agent-docs/` and `AGENTS.md` at the root. Modifying files outside this set will cause an infinite workflow loop.
8585
- Do NOT modify `CLAUDE.md`.
8686

87+
### ADR vs docs boundary
88+
89+
ADRs record **why** a decision was made (the problem, the alternatives, the chosen option, and the trade-offs accepted). Operational details about **how** the decision is implemented belong in `agent-docs/docs/`.
90+
91+
- **Belongs in an ADR:** the problem that motivated the decision, options evaluated, which option was chosen and why, architectural trade-offs accepted.
92+
- **Belongs in `agent-docs/docs/`:** file paths and storage locations, URL rewriting patterns, CLI commands and flags, permissions and access controls, step-by-step operational procedures, server start/build commands.
93+
94+
Examples:
95+
96+
- **Good ADR sentence:** "Evidence is stored on an orphan branch so GitHub issue links remain stable across runs. See [ci-cd.md](../docs/references/ci-cd.md) for operational details."
97+
- **Bad ADR sentence:** "Evidence files are pushed to the `dogfood-evidence` branch using `git push --force`, and URLs are rewritten from `./screenshots/` to `https://raw.githubusercontent.com/...`."
98+
99+
**Never put operational details (commands, paths, configs, permissions, URL patterns) into an ADR.** State the decision and its rationale, then link to the relevant `agent-docs/docs/` file for implementation specifics. Operational content in ADRs drifts from the actual implementation and misleads agents into following stale procedures instead of reading the source of truth.
100+
87101
### AGENTS.md requirements
88102

89103
AGENTS.md must stay between 80–150 lines. It must contain:

.github/workflows/dogfood.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ jobs:
5959
show_full_output: true
6060
claude_args: >-
6161
--max-turns 200
62-
--allowedTools Read,Glob,Grep,Skill,Bash(agent-browser:*),Bash(mkdir:*),Bash(cp:*),Bash(sleep:*),Bash(cat:*),Bash(gh:*),Bash(date:*),Bash(pnpm:*),Bash(curl:*),Bash(git:*),Bash(sed:*),Bash(rm:*),Bash(find:*),Bash(ls:*)
62+
--allowedTools Read,Glob,Grep,Bash(agent-browser:*),Bash(mkdir:*),Bash(cp:*),Bash(sleep:*),Bash(cat:*),Bash(gh:*),Bash(date:*),Bash(pnpm:*),Bash(curl:*),Bash(git:*),Bash(sed:*),Bash(rm:*),Bash(find:*),Bash(ls:*)
6363
env:
6464
# Required by gh CLI to create issues for dogfood findings.
6565
GH_TOKEN: ${{ github.token }}

agent-docs/adr/0008-environment-variables-on-runtime.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ Variables are provided at initialization time via the `environmentVariables` opt
2222

2323
TypeScript type safety is achieved through module augmentation: consumers declare their variable names and types by augmenting the empty `EnvironmentVariables` interface exported from `@squide/env-vars`. This gives compile-time checking on `useEnvironmentVariable("apiUrl")` — both the key name and the return type are validated.
2424

25-
Evidence: `packages/env-vars/src/EnvironmentVariablesPlugin.ts` creates the plugin and stores variables in `EnvironmentVariablesRegistry`. `packages/env-vars/src/EnvironmentVariablesRegistry.ts` implements the duplicate-key detection logic. `packages/firefly/src/initializeFirefly.ts` (lines 167-184) always instantiates the plugin. The TypeScript module augmentation pattern is documented in `docs/reference/runtime/runtime-class.md`.
25+
Evidence: `packages/env-vars/src/EnvironmentVariablesPlugin.ts` creates the plugin and stores variables in `EnvironmentVariablesRegistry`. `packages/env-vars/src/EnvironmentVariablesRegistry.ts` implements the duplicate-key detection logic. `packages/firefly/src/initializeFirefly.ts` (lines 167-184) always instantiates the plugin. The TypeScript module augmentation pattern is documented in `docs/reference/runtime/FireflyRuntime.md`.
2626

2727
## Consequences
2828

agent-docs/adr/0015-lean-yml-markdown-prompt-pattern.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ Seven agent workflows follow this pattern, each with a 1:1 matching prompt file:
3030
| `update-agent-docs.yml` | `update-agent-docs.md` (156 lines) | Multi-step with subagent coherence validation, creates PRs |
3131
| `update-dependencies.yml` | `update-dependencies.md` (164 lines) | Most complex: validation loop with browser testing, changeset creation |
3232
| `smoke-test.yml` | `smoke-test.md` (42 lines) | PR-triggered: starts dev server, runs browser smoke test on endpoints app |
33-
| `dogfood.yml` | `dogfood.md` (37 lines) | Scheduled: starts dev server, runs exploratory QA via `/dogfood` skill, files issues |
33+
| `dogfood.yml` | `dogfood.md` | Scheduled: runs exploratory QA via agent-browser dogfood skill, uploads evidence, files issues |
3434

3535
The remaining workflows (`ci.yml`, `pr-pkg.yml`, `changeset.yml`, `retype-action.yml`) are traditional CI pipelines without agents. `claude.yml` is a generic claude-code-action step without a dedicated prompt file (used for ad-hoc PR interactions).
3636

agent-docs/adr/0031-ai-driven-browser-qa-in-ci.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ Two separate QA needs were identified:
2424
Option 2, with the two use cases split into separate workflows:
2525

2626
- **`smoke-test.yml` + `smoke-test.md`** (ADR-0015 lean YML + prompt pattern): triggered on PRs to main affecting packages or the endpoints app. The agent navigates a fixed list of pages, captures `agent-browser snapshot -i` (text) and `agent-browser console` output, and ends with `SMOKE TEST PASSED` or `SMOKE TEST FAILED`. Max 50 turns. No screenshots.
27-
- **`dogfood.yml` + `dogfood.md`** (same pattern): triggered on a monthly schedule (15th of each month) and on-demand. Invokes the `/dogfood` agent skill for exploratory QA. Max 200 turns. Files a GitHub issue if issues are found, stops silently if none.
27+
- **`dogfood.yml` + `dogfood.md`** (same pattern): triggered on a monthly schedule (15th of each month) and on-demand. Runs an `agent-browser` dogfood session following the skill instructions in `SKILL.md` against a production-like build (`pnpm serve-endpoints`). Max 200 turns. Evidence (screenshots, videos) is persisted on a Git orphan branch for linkability from GitHub issues. Files a GitHub issue if issues are found, stops silently if none.
2828

2929
Evidence: `.github/workflows/smoke-test.yml`, `.github/workflows/dogfood.yml`, `.github/prompts/smoke-test.md`, `.github/prompts/dogfood.md`.
3030

@@ -33,5 +33,6 @@ Evidence: `.github/workflows/smoke-test.yml`, `.github/workflows/dogfood.yml`, `
3333
- No test code to maintain — the smoke test is defined as a page list in `smoke-test.md`. Adding a new page means adding one line to the prompt file.
3434
- The dogfood session can discover issues outside the fixed page list, providing broader coverage.
3535
- AI-driven tests are less deterministic than scripted tests. False positives (AI misreads the UI) are possible but expected to be rare for binary PASS/FAIL outcomes.
36-
- Both workflows use `agent-browser install --with-deps` and start the dev server before the agent runs, adding setup time (~2 min for smoke test, ~5 min for dogfood including QA time).
36+
- Both workflows use `agent-browser install --with-deps`. Smoke test starts the dev server (`pnpm dev-endpoints`); dogfood builds and serves a production-like build (`pnpm serve-endpoints`), adding extra build time (~5–10 min for dogfood including build and QA time).
37+
- Dogfood evidence is stored on the `dogfood-evidence` orphan branch so GitHub issue links remain stable across runs. See [ci-cd.md](../docs/references/ci-cd.md#dogfood-workflow) for operational details.
3738
- Dogfood findings are filed as GitHub issues for human triage — the workflow does not block PRs or deployments.

agent-docs/docs/quality/testing.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,9 @@ pnpm turbo run test --filter=@squide/core
2020

2121
Use **agent-browser** (see `.agents/skills/agent-browser/`) to validate sample apps. It is installed as a workspace devDependency. A build alone is not sufficient — you must start the dev server and verify pages in a real browser.
2222

23-
### Endpoints sample (`pnpm dev-endpoints`)
23+
### Endpoints sample (local dev: `pnpm dev-endpoints`, CI dogfood: `pnpm serve-endpoints`)
2424

25-
1. Start the dev server in the background: `pnpm dev-endpoints`
25+
1. Start the dev server in the background: `pnpm dev-endpoints` (for local validation) or `pnpm serve-endpoints` (for production-like validation, used by the CI dogfood workflow)
2626
2. The app listens on port **8080**. Wait for it to be ready: `curl --retry 30 --retry-delay 5 --retry-connrefused --silent --output /dev/null http://localhost:8080`
2727
3. The app has a mock login page. Use username `temp` and password `temp` to authenticate.
2828
4. Navigate to each page and verify it renders without errors:

agent-docs/docs/references/ci-cd.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,23 @@
1717
| Smoke Test | `.github/workflows/smoke-test.yml` | PRs to main (packages, endpoints, workflow changes) | Automated smoke test of endpoints app |
1818
| Dogfood | `.github/workflows/dogfood.yml` | 15th of month | Exploratory QA of endpoints app |
1919

20+
## Dogfood Workflow
21+
22+
The dogfood workflow (`dogfood.yml` + `dogfood.md`) runs monthly exploratory QA on the endpoints sample app. Key operational details:
23+
24+
**Server**: Uses `pnpm serve-endpoints` (production-like build), not `pnpm dev-endpoints`. This tests built output rather than dev mode.
25+
26+
**Evidence handling**: Screenshots and videos produced during the session are stored on the `dogfood-evidence` orphan branch in date-stamped directories (`YYYY-MM-DD/screenshots/`, `YYYY-MM-DD/videos/`). The workflow:
27+
1. Fetches or creates the `dogfood-evidence` orphan branch
28+
2. Prunes evidence directories older than 60 days
29+
3. Copies only report-referenced screenshots/videos into the date directory
30+
4. Force-pushes the branch (requires `contents: write` permission)
31+
5. Rewrites relative asset paths in the report to `https://raw.githubusercontent.com/workleap/wl-squide/dogfood-evidence/YYYY-MM-DD/...` so images render in GitHub issues
32+
33+
**Issue creation**: If issues are found, files a GitHub issue with the rewritten report via `gh issue create` (requires `issues: write` permission). Stops silently if no issues.
34+
35+
**Prompt file**: `.github/prompts/dogfood.md` contains the full operational steps.
36+
2037
## CI Pipeline Details
2138

2239
The main CI workflow (`ci.yml`) runs on `ubuntu-latest` with:

0 commit comments

Comments
 (0)