Skip to content

Commit a77d01d

Browse files
myieyeclaude
andauthored
Make agent test-running guidance consistent and precise (#2329)
* Make agent test guidance consistent and scoped The root Testing section said "use IDE testing tools over the cli" (agents cannot drive an IDE, so in practice this read as "do not run tests") while backend/FwLite/AGENTS.md demanded the full FwLiteOnly.slnf suite before every commit. Replace both with one policy: run filtered CLI tests for what you changed, verify tests you wrote actually pass, save targeted integration runs for finished critical sync work, never run LexBox integration tests or local Playwright. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Disambiguate which test suites agents may run locally "backend/Testing" is not all integration tests -- only the Integration/FlakyIntegration/RequiresDb categories and Testing.Browser need infrastructure; its unit tests are runnable via task test:unit. Likewise "do not run Playwright" only ever applied to suites needing the local lexbox stack (frontend/tests, Testing.Browser); the viewer standalone suite auto-starts a vite dev server against the demo project and is cheap to run filtered. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Document known CI flakes in the CI/CD agent guide Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent 7c3a324 commit a77d01d

4 files changed

Lines changed: 23 additions & 15 deletions

File tree

.github/AGENTS.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,19 @@ The CI/CD setup is:
5050

5151
---
5252

53+
## Known Flaky CI Failures (re-run before debugging)
54+
55+
The `GHA integration tests / dotnet` check (`integration-test-gha.yaml`) fails in two known ways that are NOT regressions. Re-run first (`gh run rerun <runId> --failed`) — especially on frontend-only or dependency-only PRs, which can't affect the lexbox-api / hg / fw-headless containers it exercises:
56+
57+
1. **cert-manager readiness timeout**`setup-k8s` waits `--timeout=90s` for cert-manager pods; on a cold kind cluster they don't always make it → deploy aborts fast (~3 min) and the status step logs "No resources found in languagedepot namespace". Environmental — tends to hit all branches in the same window.
58+
2. **MediaFileTests large-upload stream error**`Testing.FwHeadless.MediaFileTests.UploadReplacementFile_TooLarge_ThrowsError` intermittently throws `HttpRequestException: Error while copying content to a stream` (transient connection drop streaming the large file) instead of the expected validation error. Shows as Failed: 1 / Passed: ~146 after the full ~14 min run.
59+
60+
Also expected, not a failure: on frontend-only PRs the backend image-publish workflows (`lexbox-fw-headless`, `lexbox-hgweb`) don't trigger (path filters), so `setup-k8s` gets `manifest unknown` pulling those images at the PR version and falls back to the `develop` tag via `continue-on-error`. Those log lines are noise.
61+
62+
Separately: a PR whose merge state is CONFLICTING silently *skips* the build/test checks rather than failing them — if expected checks are missing, reconcile with develop first.
63+
64+
---
65+
5366
## Workflow Dependencies
5467

5568
```mermaid

AGENTS.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -52,9 +52,11 @@ Key documentation for this project:
5252

5353
### Testing
5454

55-
-**Do NOT run LexBox dotnet INTEGRATION tests** unless the user explicitly asks. They require full test infrastructure (database, services) which usually isn't available.
56-
-**FwLite integration tests CAN be run** — e.g. `FwLiteProjectSync.Tests` They're just a bit slow, but run them freely when making critical changes to relevant code.
57-
-**DO run unit tests locally** and filter to the tests that are relevant to the changes you are making. Use IDE testing tools over the cli.
55+
-**DO run unit tests via the CLI**, filtered to the tests relevant to your changes (e.g. `dotnet test backend/FwLite/FwLiteShared.Tests --filter "FullyQualifiedName~MyTestClass"`). Verify tests you wrote or changed actually pass before handing work back. Never run whole suites just to "see if anything broke".
56+
-**FwLite integration tests** (e.g. `FwLiteProjectSync.Tests`) need no infrastructure but are slow. Run a **targeted selection** (specific tests, not necessarily whole classes) when you touched critical sync code **and believe the work is finished** — not on every iteration. Waiting on tests burns time; be deliberate about which runs buy real signal.
57+
-**`backend/Testing` contains unit tests too** — only tests marked `Category=Integration|FlakyIntegration|RequiresDb` (and the `Testing.Browser` namespace) need infrastructure. Its unit tests are fine to run: `task test:unit -- <filter>` excludes those categories for you.
58+
-**FwLite viewer Playwright tests MAY be run** — they're cheap: `task playwright-test-standalone -- <test-name-filter>` (from `frontend/viewer/`) auto-starts the vite dev server with the in-browser demo project; no lexbox stack, chromium only. Always filter to relevant tests; details in `frontend/viewer/AGENTS.md`.
59+
-**Do NOT run tests that need the lexbox stack** unless the user explicitly asks: LexBox integration tests (`Category=Integration`/`FlakyIntegration`, `Testing.Browser`) and the lexbox frontend Playwright suite (`frontend/tests`). The local stack is usually down or torn down between sessions and results aren't trustworthy — rely on CI for these.
5860

5961
### Questions?
6062

@@ -78,7 +80,6 @@ Before implementing any change that will touch many files or is in a 🔴 **Crit
7880
- ✅ If the user asks about "the" PR, but does not explicitly name a PR or branch, assume they mean the PR associated with the current branch.
7981
- ✅ Use **Mermaid diagrams** for flowcharts and architecture (not ASCII art)
8082
- ✅ Prefer IDE diagnostics (compiler/lint errors) over CLI tools for identifying issues. Fixing these diagnostics is part of completing any instruction.
81-
- ✅ Do NOT run integration tests unless user explicitly requests
8283
- ✅ When handling a user prompt ALWAYS ask for clarification if there are details to clarify, important decisions that must be made first or the plan sounds unwise
8384
- ❌ Do NOT git commit or git push without explicit user approval
8485

backend/FwLite/AGENTS.md

Lines changed: 4 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Lightweight FieldWorks application for dictionary editing with CRDT-based sync.
1212
**Before making changes:**
1313
1. Read the relevant section below thoroughly
1414
2. Understand the sync flow end-to-end
15-
3. Run the full test suite: `dotnet test FwLiteOnly.slnf`
15+
3. Identify which tests cover the affected area (run a targeted selection when the work is done — see the root `AGENTS.md` Testing section)
1616
4. Test with real FwData projects, not just unit tests
1717

1818
---
@@ -23,7 +23,7 @@ Lightweight FieldWorks application for dictionary editing with CRDT-based sync.
2323
# Run FwLite Web (typical workflow)
2424
task fw-lite-web # from repo root
2525

26-
# Run tests (ALWAYS run before committing)
26+
# Run all FwLite tests (slow — prefer targeted runs, see root AGENTS.md Testing section)
2727
dotnet test FwLiteOnly.slnf
2828

2929
# Build MAUI app (Windows)
@@ -269,15 +269,9 @@ if (entity?.DeletedAt is not null) return;
269269

270270
## Testing Strategy
271271

272-
### Before ANY commit:
272+
### When the work is finished:
273273

274-
```bash
275-
# Run all FwLite tests
276-
dotnet test FwLiteOnly.slnf
277-
278-
# If touching sync code, also run:
279-
dotnet test FwLiteProjectSync.Tests
280-
```
274+
Run a targeted selection of the tests covering what you changed (root `AGENTS.md` → Testing). For 🔴 critical sync changes that usually includes the relevant `FwLiteProjectSync.Tests` scenarios. `dotnet test FwLiteOnly.slnf` runs everything but is slow — reserve it for when broad signal is genuinely needed.
281275

282276
### Test Categories
283277

frontend/AGENTS.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ pnpm run -r lint
6060

6161
- Playwright for E2E tests
6262
- Test files in `tests/`
63-
- Run with `pnpm test`
63+
- Run with `pnpm test` — requires the full local lexbox stack (`task up`). Agents: don't run these locally, rely on CI (root `AGENTS.md` → Testing). The cheap, agent-runnable Playwright suite is the *viewer's* (`viewer/AGENTS.md`), not this one.
6464

6565
## Important Files
6666

0 commit comments

Comments
 (0)