chore: Improve workflows (#542)

patricklafrance · web-flow · commit 180ca08f1a8a · 2026-02-27T22:50:25.000-05:00
* chore: update documentation and workflows for improved server handling and testing procedures

* fix: ensure successful exit after pruning evidence older than 60 days
diff --git a/.github/prompts/dogfood.md b/.github/prompts/dogfood.md
@@ -3,6 +3,7 @@
 ## Constraints
 - Do NOT read AGENTS.md or agent-docs/
 - Do NOT read the target app's source code
+- Do NOT use the Skill tool — read skill files directly with the Read tool
 
 ## Task
 
@@ -65,9 +66,11 @@ After the skill completes, read the generated report at `/tmp/dogfood-output/rep
          name=$(basename "$d")
          [[ "$name" < "$CUTOFF" ]] && rm -rf "$d"
        done
-       git add -A && git diff --cached --quiet || git commit -m "Prune evidence older than 60 days"
+       git add -A && { git diff --cached --quiet || git commit -m "Prune evidence older than 60 days"; }
+       # Exit 0 in both cases: no changes (quiet succeeds) or changes committed (commit succeeds).
        ```
      - **Add new evidence** — Create `YYYY-MM-DD/screenshots/` and `YYYY-MM-DD/videos/`, copy only the referenced files, stage, commit, push.
+     - **Return to main branch** — `git checkout main`
 
   3. **Rewrite evidence paths** in the report — Replace relative asset paths with absolute GitHub URLs so images render in the issue. First, capture today's date into a variable, then use it in the sed replacements:
      ```bash
diff --git a/.github/prompts/smoke-test.md b/.github/prompts/smoke-test.md
@@ -7,35 +7,54 @@
 - Use ONLY `agent-browser snapshot -i` (text) and `agent-browser console` for verification
 
 ## Task
-Smoke-test the endpoints sample app at http://localhost:8080. Log in, then visit
+Smoke-test the endpoints sample app. Start the servers, log in, then visit
 every page listed below and verify each renders content without JavaScript errors.
 
-## Authentication
-The app has a mock login page. Use username `temp` and password `temp`.
+### Step 1 — Start servers
+
+Start the endpoint servers in the background and wait for them to be ready:
+
+```bash
+pnpm serve-endpoints > /tmp/endpoints-serve.log 2>&1 &
+```
+
+Wait for both servers to be ready (single command to save a turn):
+
+```bash
+curl --retry 30 --retry-delay 5 --retry-connrefused --silent --output /dev/null http://localhost:8080 && curl --retry 30 --retry-delay 5 --retry-connrefused --silent --output /dev/null http://localhost:8081
+```
+
+If either curl command fails, run `cat /tmp/endpoints-serve.log` for diagnostics and stop with "SMOKE TEST FAILED".
+
+### Step 2 — Validate pages
+
+The app runs at `http://localhost:8080`. It has a mock login page — use username `temp` and password `temp` to authenticate.
+
+Navigate to each page below and verify it renders content without JavaScript errors:
 
-## Pages to validate
 1. `/` (Home page)
 2. `/subscription`
 3. `/federated-tabs`
 4. `/federated-tabs/episodes`
 5. `/federated-tabs/locations`
 6. `/federated-tabs/failing` (expected to show an error boundary — see "Known noise")
 
-## How to verify each page
+For each page:
 1. Navigate to the page
 2. `agent-browser snapshot -i` — confirm meaningful content rendered (not blank/error)
 3. `agent-browser console` — check for JS errors (exceptions, failed assertions)
 
-## Known noise (IGNORE these)
+**Known noise (IGNORE these):**
 - `/federated-tabs/failing` — intentionally throws to exercise error boundaries. The error boundary UI is expected.
 - MSW (Mock Service Worker) console warnings — expected in this mock-data app.
 - React warnings, deprecation notices — only flag actual JS errors/exceptions.
 
-## What counts as a failure
+**What counts as a failure:**
 - A page renders blank or shows an unhandled error (except /federated-tabs/failing)
 - JavaScript exceptions in the console (not warnings)
 - Navigation links that lead nowhere or crash
 
-## Result
+### Result
+
 If all pages pass: end with "SMOKE TEST PASSED".
 If any page fails: end with "SMOKE TEST FAILED" and list which pages failed and why.
diff --git a/.github/prompts/update-dependencies.md b/.github/prompts/update-dependencies.md
@@ -86,8 +86,8 @@ All tests must pass. If a test fails, run the failing package's tests directly (
 
 Use `agent-browser` for all browser interactions in this step. It is installed as a workspace devDependency. Read the locally installed agent skill at `.agents/skills/agent-browser/` to learn the available commands. Do NOT use `agent-browser screenshot` — use only `snapshot` (text) and `console` (errors). Screenshots are binary and cannot be analyzed. Running a build is NOT sufficient — you must start the dev server and validate in a real browser.
 
-1. Start the dev server in the background using the shell `&` operator (do NOT use `run_in_background: true`): `pnpm dev-endpoints > /tmp/endpoints-dev.log 2>&1 &`
-2. The endpoints app listens on port **8080**. Wait for it to be ready — do NOT use `sleep`, do NOT write polling loops, do NOT parse the log file for a URL. Instead, immediately run: `curl --retry 30 --retry-delay 5 --retry-connrefused --silent --output /dev/null http://localhost:8080`
+1. Start the server in the background using the shell `&` operator (do NOT use `run_in_background: true`): `pnpm serve-endpoints > /tmp/endpoints-serve.log 2>&1 &`
+2. The endpoints app listens on port **8080** (host) and **8081** (remote module). Wait for both to be ready — do NOT use `sleep`, do NOT write polling loops, do NOT parse the log file for a URL. Instead, immediately run: `curl --retry 30 --retry-delay 5 --retry-connrefused --silent --output /dev/null http://localhost:8080 && curl --retry 30 --retry-delay 5 --retry-connrefused --silent --output /dev/null http://localhost:8081`. If the curl command fails, run `cat /tmp/endpoints-serve.log` for diagnostics.
 3. The app has a mock login page. Use username `temp` and password `temp` to authenticate.
 4. Navigate to the following pages and check that each renders without errors:
    - `/` (Home page)
diff --git a/.github/workflows/dogfood.yml b/.github/workflows/dogfood.yml
@@ -59,7 +59,7 @@ jobs:
           show_full_output: true
           claude_args: >-
             --max-turns 200
-            --allowedTools Read,Glob,Grep,Bash(agent-browser:*),Bash(mkdir:*),Bash(cp:*),Bash(sleep:*),Bash(cat:*),Bash(gh:*),Bash(date:*),Bash(pnpm:*),Bash(curl:*),Bash(git:*),Bash(sed:*),Bash(rm:*),Bash(find:*),Bash(ls:*)
+            --allowedTools Read,Write,Glob,Grep,Bash(agent-browser:*),Bash(mkdir:*),Bash(cp:*),Bash(sleep:*),Bash(cat:*),Bash(gh:*),Bash(date:*),Bash(pnpm:*),Bash(curl:*),Bash(git:*),Bash(sed:*),Bash(rm:*),Bash(find:*),Bash(ls:*)
         env:
           # Required by gh CLI to create issues for dogfood findings.
           GH_TOKEN: ${{ github.token }}
diff --git a/.github/workflows/smoke-test.yml b/.github/workflows/smoke-test.yml
@@ -56,26 +56,13 @@ jobs:
           agent-browser open about:blank
           agent-browser close
 
-      - name: Start endpoints dev server
-        run: pnpm dev-endpoints > /tmp/endpoints-dev.log 2>&1 &
-
-      - name: Wait for dev servers
-        run: |
-          curl --retry 30 --retry-delay 5 --retry-connrefused --silent --output /dev/null http://localhost:8080
-          curl --retry 30 --retry-delay 5 --retry-connrefused --silent --output /dev/null http://localhost:8081
-
       - name: Run smoke test
         uses: anthropics/claude-code-action@v1
         with:
           prompt: |
             Read and follow the instructions in .github/prompts/smoke-test.md
           anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
+          show_full_output: true
           claude_args: >-
             --max-turns 50
-            --allowedTools Read,Glob,Grep,Bash(agent-browser:*),Bash(kill:*),Bash(lsof:*),Bash(curl:*),Bash(fuser:*)
-
-      - name: Cleanup
-        if: always()
-        run: |
-          kill $(lsof -t -i:8080) 2>/dev/null || true
-          kill $(lsof -t -i:8081) 2>/dev/null || true
+            --allowedTools Read,Glob,Grep,Bash(agent-browser:*),Bash(pnpm:*),Bash(curl:*),Bash(cat:*),Bash(sleep:*)
diff --git a/agent-docs/adr/0015-lean-yml-markdown-prompt-pattern.md b/agent-docs/adr/0015-lean-yml-markdown-prompt-pattern.md
@@ -29,7 +29,7 @@ Seven agent workflows follow this pattern, each with a 1:1 matching prompt file:
 | `sync-agent-skill.yml` | `sync-agent-skill.md` (134 lines) | Multi-step with subagent validation and version bumping |
 | `update-agent-docs.yml` | `update-agent-docs.md` (156 lines) | Multi-step with subagent coherence validation, creates PRs |
 | `update-dependencies.yml` | `update-dependencies.md` (164 lines) | Most complex: validation loop with browser testing, changeset creation |
-| `smoke-test.yml` | `smoke-test.md` (42 lines) | PR-triggered: starts dev server, runs browser smoke test on endpoints app |
+| `smoke-test.yml` | `smoke-test.md` | PR-triggered: builds and serves endpoints, runs browser smoke test |
 | `dogfood.yml` | `dogfood.md` | Scheduled: runs exploratory QA via agent-browser dogfood skill, uploads evidence, files issues |
 
 The remaining workflows (`ci.yml`, `pr-pkg.yml`, `changeset.yml`, `retype-action.yml`) are traditional CI pipelines without agents. `claude.yml` is a generic claude-code-action step without a dedicated prompt file (used for ad-hoc PR interactions).
diff --git a/agent-docs/adr/0031-ai-driven-browser-qa-in-ci.md b/agent-docs/adr/0031-ai-driven-browser-qa-in-ci.md
@@ -33,6 +33,6 @@ Evidence: `.github/workflows/smoke-test.yml`, `.github/workflows/dogfood.yml`, `
 - No test code to maintain — the smoke test is defined as a page list in `smoke-test.md`. Adding a new page means adding one line to the prompt file.
 - The dogfood session can discover issues outside the fixed page list, providing broader coverage.
 - AI-driven tests are less deterministic than scripted tests. False positives (AI misreads the UI) are possible but expected to be rare for binary PASS/FAIL outcomes.
-- Both workflows use `agent-browser install --with-deps`. Smoke test starts the dev server (`pnpm dev-endpoints`); dogfood builds and serves a production-like build (`pnpm serve-endpoints`), adding extra build time (~5–10 min for dogfood including build and QA time).
+- Both workflows use `agent-browser install --with-deps` and `pnpm serve-endpoints` (production-like build). The dogfood session takes longer (~5–10 min including build and QA time) due to its exploratory nature vs the smoke test's fixed page list.
 - Dogfood evidence is stored on the `dogfood-evidence` orphan branch so GitHub issue links remain stable across runs. See [ci-cd.md](../docs/references/ci-cd.md#dogfood-workflow) for operational details.
 - Dogfood findings are filed as GitHub issues for human triage — the workflow does not block PRs or deployments.
diff --git a/agent-docs/docs/quality/testing.md b/agent-docs/docs/quality/testing.md
@@ -20,10 +20,10 @@ pnpm turbo run test --filter=@squide/core
 
 Use **agent-browser** (see `.agents/skills/agent-browser/`) to validate sample apps. It is installed as a workspace devDependency. A build alone is not sufficient — you must start the dev server and verify pages in a real browser.
 
-### Endpoints sample (local dev: `pnpm dev-endpoints`, CI dogfood: `pnpm serve-endpoints`)
+### Endpoints sample (local dev: `pnpm dev-endpoints`, CI: `pnpm serve-endpoints`)
 
-1. Start the dev server in the background: `pnpm dev-endpoints` (for local validation) or `pnpm serve-endpoints` (for production-like validation, used by the CI dogfood workflow)
-2. The app listens on port **8080**. Wait for it to be ready: `curl --retry 30 --retry-delay 5 --retry-connrefused --silent --output /dev/null http://localhost:8080`
+1. Start the server in the background: `pnpm dev-endpoints` (for local validation) or `pnpm serve-endpoints` (for production-like validation, used by CI workflows)
+2. The app listens on port **8080** (host) and **8081** (remote module). Wait for both to be ready: `curl --retry 30 --retry-delay 5 --retry-connrefused --silent --output /dev/null http://localhost:8080 && curl --retry 30 --retry-delay 5 --retry-connrefused --silent --output /dev/null http://localhost:8081`
 3. The app has a mock login page. Use username `temp` and password `temp` to authenticate.
 4. Navigate to each page and verify it renders without errors:
    - `/` (Home page)
diff --git a/agent-docs/docs/references/ci-cd.md b/agent-docs/docs/references/ci-cd.md
@@ -27,7 +27,7 @@ The dogfood workflow (`dogfood.yml` + `dogfood.md`) runs monthly exploratory QA
 1. Fetches or creates the `dogfood-evidence` orphan branch
 2. Prunes evidence directories older than 60 days
 3. Copies only report-referenced screenshots/videos into the date directory
-4. Force-pushes the branch (requires `contents: write` permission)
+4. Pushes the branch (requires `contents: write` permission)
 5. Rewrites relative asset paths in the report to `https://raw.githubusercontent.com/workleap/wl-squide/dogfood-evidence/YYYY-MM-DD/...` so images render in GitHub issues
 
 **Issue creation**: If issues are found, files a GitHub issue with the rewritten report via `gh issue create` (requires `issues: write` permission). Stops silently if no issues.