Skip to content

Commit 180ca08

Browse files
chore: Improve workflows (#542)
* chore: update documentation and workflows for improved server handling and testing procedures * fix: ensure successful exit after pruning evidence older than 60 days
1 parent a5a9812 commit 180ca08

9 files changed

Lines changed: 42 additions & 33 deletions

File tree

.github/prompts/dogfood.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
## Constraints
44
- Do NOT read AGENTS.md or agent-docs/
55
- Do NOT read the target app's source code
6+
- Do NOT use the Skill tool — read skill files directly with the Read tool
67

78
## Task
89

@@ -65,9 +66,11 @@ After the skill completes, read the generated report at `/tmp/dogfood-output/rep
6566
name=$(basename "$d")
6667
[[ "$name" < "$CUTOFF" ]] && rm -rf "$d"
6768
done
68-
git add -A && git diff --cached --quiet || git commit -m "Prune evidence older than 60 days"
69+
git add -A && { git diff --cached --quiet || git commit -m "Prune evidence older than 60 days"; }
70+
# Exit 0 in both cases: no changes (quiet succeeds) or changes committed (commit succeeds).
6971
```
7072
- **Add new evidence** — Create `YYYY-MM-DD/screenshots/` and `YYYY-MM-DD/videos/`, copy only the referenced files, stage, commit, push.
73+
- **Return to main branch**`git checkout main`
7174

7275
3. **Rewrite evidence paths** in the report — Replace relative asset paths with absolute GitHub URLs so images render in the issue. First, capture today's date into a variable, then use it in the sed replacements:
7376
```bash

.github/prompts/smoke-test.md

Lines changed: 27 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -7,35 +7,54 @@
77
- Use ONLY `agent-browser snapshot -i` (text) and `agent-browser console` for verification
88

99
## Task
10-
Smoke-test the endpoints sample app at http://localhost:8080. Log in, then visit
10+
Smoke-test the endpoints sample app. Start the servers, log in, then visit
1111
every page listed below and verify each renders content without JavaScript errors.
1212

13-
## Authentication
14-
The app has a mock login page. Use username `temp` and password `temp`.
13+
### Step 1 — Start servers
14+
15+
Start the endpoint servers in the background and wait for them to be ready:
16+
17+
```bash
18+
pnpm serve-endpoints > /tmp/endpoints-serve.log 2>&1 &
19+
```
20+
21+
Wait for both servers to be ready (single command to save a turn):
22+
23+
```bash
24+
curl --retry 30 --retry-delay 5 --retry-connrefused --silent --output /dev/null http://localhost:8080 && curl --retry 30 --retry-delay 5 --retry-connrefused --silent --output /dev/null http://localhost:8081
25+
```
26+
27+
If either curl command fails, run `cat /tmp/endpoints-serve.log` for diagnostics and stop with "SMOKE TEST FAILED".
28+
29+
### Step 2 — Validate pages
30+
31+
The app runs at `http://localhost:8080`. It has a mock login page — use username `temp` and password `temp` to authenticate.
32+
33+
Navigate to each page below and verify it renders content without JavaScript errors:
1534

16-
## Pages to validate
1735
1. `/` (Home page)
1836
2. `/subscription`
1937
3. `/federated-tabs`
2038
4. `/federated-tabs/episodes`
2139
5. `/federated-tabs/locations`
2240
6. `/federated-tabs/failing` (expected to show an error boundary — see "Known noise")
2341

24-
## How to verify each page
42+
For each page:
2543
1. Navigate to the page
2644
2. `agent-browser snapshot -i` — confirm meaningful content rendered (not blank/error)
2745
3. `agent-browser console` — check for JS errors (exceptions, failed assertions)
2846

29-
## Known noise (IGNORE these)
47+
**Known noise (IGNORE these):**
3048
- `/federated-tabs/failing` — intentionally throws to exercise error boundaries. The error boundary UI is expected.
3149
- MSW (Mock Service Worker) console warnings — expected in this mock-data app.
3250
- React warnings, deprecation notices — only flag actual JS errors/exceptions.
3351

34-
## What counts as a failure
52+
**What counts as a failure:**
3553
- A page renders blank or shows an unhandled error (except /federated-tabs/failing)
3654
- JavaScript exceptions in the console (not warnings)
3755
- Navigation links that lead nowhere or crash
3856

39-
## Result
57+
### Result
58+
4059
If all pages pass: end with "SMOKE TEST PASSED".
4160
If any page fails: end with "SMOKE TEST FAILED" and list which pages failed and why.

.github/prompts/update-dependencies.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -86,8 +86,8 @@ All tests must pass. If a test fails, run the failing package's tests directly (
8686

8787
Use `agent-browser` for all browser interactions in this step. It is installed as a workspace devDependency. Read the locally installed agent skill at `.agents/skills/agent-browser/` to learn the available commands. Do NOT use `agent-browser screenshot` — use only `snapshot` (text) and `console` (errors). Screenshots are binary and cannot be analyzed. Running a build is NOT sufficient — you must start the dev server and validate in a real browser.
8888

89-
1. Start the dev server in the background using the shell `&` operator (do NOT use `run_in_background: true`): `pnpm dev-endpoints > /tmp/endpoints-dev.log 2>&1 &`
90-
2. The endpoints app listens on port **8080**. Wait for it to be ready — do NOT use `sleep`, do NOT write polling loops, do NOT parse the log file for a URL. Instead, immediately run: `curl --retry 30 --retry-delay 5 --retry-connrefused --silent --output /dev/null http://localhost:8080`
89+
1. Start the server in the background using the shell `&` operator (do NOT use `run_in_background: true`): `pnpm serve-endpoints > /tmp/endpoints-serve.log 2>&1 &`
90+
2. The endpoints app listens on port **8080** (host) and **8081** (remote module). Wait for both to be ready — do NOT use `sleep`, do NOT write polling loops, do NOT parse the log file for a URL. Instead, immediately run: `curl --retry 30 --retry-delay 5 --retry-connrefused --silent --output /dev/null http://localhost:8080 && curl --retry 30 --retry-delay 5 --retry-connrefused --silent --output /dev/null http://localhost:8081`. If the curl command fails, run `cat /tmp/endpoints-serve.log` for diagnostics.
9191
3. The app has a mock login page. Use username `temp` and password `temp` to authenticate.
9292
4. Navigate to the following pages and check that each renders without errors:
9393
- `/` (Home page)

.github/workflows/dogfood.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ jobs:
5959
show_full_output: true
6060
claude_args: >-
6161
--max-turns 200
62-
--allowedTools Read,Glob,Grep,Bash(agent-browser:*),Bash(mkdir:*),Bash(cp:*),Bash(sleep:*),Bash(cat:*),Bash(gh:*),Bash(date:*),Bash(pnpm:*),Bash(curl:*),Bash(git:*),Bash(sed:*),Bash(rm:*),Bash(find:*),Bash(ls:*)
62+
--allowedTools Read,Write,Glob,Grep,Bash(agent-browser:*),Bash(mkdir:*),Bash(cp:*),Bash(sleep:*),Bash(cat:*),Bash(gh:*),Bash(date:*),Bash(pnpm:*),Bash(curl:*),Bash(git:*),Bash(sed:*),Bash(rm:*),Bash(find:*),Bash(ls:*)
6363
env:
6464
# Required by gh CLI to create issues for dogfood findings.
6565
GH_TOKEN: ${{ github.token }}

.github/workflows/smoke-test.yml

Lines changed: 2 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -56,26 +56,13 @@ jobs:
5656
agent-browser open about:blank
5757
agent-browser close
5858
59-
- name: Start endpoints dev server
60-
run: pnpm dev-endpoints > /tmp/endpoints-dev.log 2>&1 &
61-
62-
- name: Wait for dev servers
63-
run: |
64-
curl --retry 30 --retry-delay 5 --retry-connrefused --silent --output /dev/null http://localhost:8080
65-
curl --retry 30 --retry-delay 5 --retry-connrefused --silent --output /dev/null http://localhost:8081
66-
6759
- name: Run smoke test
6860
uses: anthropics/claude-code-action@v1
6961
with:
7062
prompt: |
7163
Read and follow the instructions in .github/prompts/smoke-test.md
7264
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
65+
show_full_output: true
7366
claude_args: >-
7467
--max-turns 50
75-
--allowedTools Read,Glob,Grep,Bash(agent-browser:*),Bash(kill:*),Bash(lsof:*),Bash(curl:*),Bash(fuser:*)
76-
77-
- name: Cleanup
78-
if: always()
79-
run: |
80-
kill $(lsof -t -i:8080) 2>/dev/null || true
81-
kill $(lsof -t -i:8081) 2>/dev/null || true
68+
--allowedTools Read,Glob,Grep,Bash(agent-browser:*),Bash(pnpm:*),Bash(curl:*),Bash(cat:*),Bash(sleep:*)

agent-docs/adr/0015-lean-yml-markdown-prompt-pattern.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ Seven agent workflows follow this pattern, each with a 1:1 matching prompt file:
2929
| `sync-agent-skill.yml` | `sync-agent-skill.md` (134 lines) | Multi-step with subagent validation and version bumping |
3030
| `update-agent-docs.yml` | `update-agent-docs.md` (156 lines) | Multi-step with subagent coherence validation, creates PRs |
3131
| `update-dependencies.yml` | `update-dependencies.md` (164 lines) | Most complex: validation loop with browser testing, changeset creation |
32-
| `smoke-test.yml` | `smoke-test.md` (42 lines) | PR-triggered: starts dev server, runs browser smoke test on endpoints app |
32+
| `smoke-test.yml` | `smoke-test.md` | PR-triggered: builds and serves endpoints, runs browser smoke test |
3333
| `dogfood.yml` | `dogfood.md` | Scheduled: runs exploratory QA via agent-browser dogfood skill, uploads evidence, files issues |
3434

3535
The remaining workflows (`ci.yml`, `pr-pkg.yml`, `changeset.yml`, `retype-action.yml`) are traditional CI pipelines without agents. `claude.yml` is a generic claude-code-action step without a dedicated prompt file (used for ad-hoc PR interactions).

agent-docs/adr/0031-ai-driven-browser-qa-in-ci.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,6 @@ Evidence: `.github/workflows/smoke-test.yml`, `.github/workflows/dogfood.yml`, `
3333
- No test code to maintain — the smoke test is defined as a page list in `smoke-test.md`. Adding a new page means adding one line to the prompt file.
3434
- The dogfood session can discover issues outside the fixed page list, providing broader coverage.
3535
- AI-driven tests are less deterministic than scripted tests. False positives (AI misreads the UI) are possible but expected to be rare for binary PASS/FAIL outcomes.
36-
- Both workflows use `agent-browser install --with-deps`. Smoke test starts the dev server (`pnpm dev-endpoints`); dogfood builds and serves a production-like build (`pnpm serve-endpoints`), adding extra build time (~5–10 min for dogfood including build and QA time).
36+
- Both workflows use `agent-browser install --with-deps` and `pnpm serve-endpoints` (production-like build). The dogfood session takes longer (~5–10 min including build and QA time) due to its exploratory nature vs the smoke test's fixed page list.
3737
- Dogfood evidence is stored on the `dogfood-evidence` orphan branch so GitHub issue links remain stable across runs. See [ci-cd.md](../docs/references/ci-cd.md#dogfood-workflow) for operational details.
3838
- Dogfood findings are filed as GitHub issues for human triage — the workflow does not block PRs or deployments.

agent-docs/docs/quality/testing.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,10 +20,10 @@ pnpm turbo run test --filter=@squide/core
2020

2121
Use **agent-browser** (see `.agents/skills/agent-browser/`) to validate sample apps. It is installed as a workspace devDependency. A build alone is not sufficient — you must start the dev server and verify pages in a real browser.
2222

23-
### Endpoints sample (local dev: `pnpm dev-endpoints`, CI dogfood: `pnpm serve-endpoints`)
23+
### Endpoints sample (local dev: `pnpm dev-endpoints`, CI: `pnpm serve-endpoints`)
2424

25-
1. Start the dev server in the background: `pnpm dev-endpoints` (for local validation) or `pnpm serve-endpoints` (for production-like validation, used by the CI dogfood workflow)
26-
2. The app listens on port **8080**. Wait for it to be ready: `curl --retry 30 --retry-delay 5 --retry-connrefused --silent --output /dev/null http://localhost:8080`
25+
1. Start the server in the background: `pnpm dev-endpoints` (for local validation) or `pnpm serve-endpoints` (for production-like validation, used by CI workflows)
26+
2. The app listens on port **8080** (host) and **8081** (remote module). Wait for both to be ready: `curl --retry 30 --retry-delay 5 --retry-connrefused --silent --output /dev/null http://localhost:8080 && curl --retry 30 --retry-delay 5 --retry-connrefused --silent --output /dev/null http://localhost:8081`
2727
3. The app has a mock login page. Use username `temp` and password `temp` to authenticate.
2828
4. Navigate to each page and verify it renders without errors:
2929
- `/` (Home page)

agent-docs/docs/references/ci-cd.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ The dogfood workflow (`dogfood.yml` + `dogfood.md`) runs monthly exploratory QA
2727
1. Fetches or creates the `dogfood-evidence` orphan branch
2828
2. Prunes evidence directories older than 60 days
2929
3. Copies only report-referenced screenshots/videos into the date directory
30-
4. Force-pushes the branch (requires `contents: write` permission)
30+
4. Pushes the branch (requires `contents: write` permission)
3131
5. Rewrites relative asset paths in the report to `https://raw.githubusercontent.com/workleap/wl-squide/dogfood-evidence/YYYY-MM-DD/...` so images render in GitHub issues
3232

3333
**Issue creation**: If issues are found, files a GitHub issue with the rewritten report via `gh issue create` (requires `issues: write` permission). Stops silently if no issues.

0 commit comments

Comments
 (0)