Make agent test-running guidance consistent and precise (#2329)

myieye · claude · web-flow · commit a77d01dba6cc · 2026-06-03T14:10:23.000+02:00
* Make agent test guidance consistent and scoped

The root Testing section said "use IDE testing tools over the cli" (agents
cannot drive an IDE, so in practice this read as "do not run tests") while
backend/FwLite/AGENTS.md demanded the full FwLiteOnly.slnf suite before
every commit. Replace both with one policy: run filtered CLI tests for
what you changed, verify tests you wrote actually pass, save targeted
integration runs for finished critical sync work, never run LexBox
integration tests or local Playwright.

Co-Authored-By: Claude Opus 4.8 (1M context) &lt;noreply@anthropic.com&gt;

* Disambiguate which test suites agents may run locally

"backend/Testing" is not all integration tests -- only the
Integration/FlakyIntegration/RequiresDb categories and Testing.Browser
need infrastructure; its unit tests are runnable via task test:unit.
Likewise "do not run Playwright" only ever applied to suites needing the
local lexbox stack (frontend/tests, Testing.Browser); the viewer
standalone suite auto-starts a vite dev server against the demo project
and is cheap to run filtered.

Co-Authored-By: Claude Opus 4.8 (1M context) &lt;noreply@anthropic.com&gt;

* Document known CI flakes in the CI/CD agent guide

Co-Authored-By: Claude Opus 4.8 (1M context) &lt;noreply@anthropic.com&gt;

---------

Co-authored-by: Claude Opus 4.8 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/.github/AGENTS.md b/.github/AGENTS.md
@@ -50,6 +50,19 @@ The CI/CD setup is:
 
 ---
 
+## Known Flaky CI Failures (re-run before debugging)
+
+The `GHA integration tests / dotnet` check (`integration-test-gha.yaml`) fails in two known ways that are NOT regressions. Re-run first (`gh run rerun <runId> --failed`) — especially on frontend-only or dependency-only PRs, which can't affect the lexbox-api / hg / fw-headless containers it exercises:
+
+1. **cert-manager readiness timeout** — `setup-k8s` waits `--timeout=90s` for cert-manager pods; on a cold kind cluster they don't always make it → deploy aborts fast (~3 min) and the status step logs "No resources found in languagedepot namespace". Environmental — tends to hit all branches in the same window.
+2. **MediaFileTests large-upload stream error** — `Testing.FwHeadless.MediaFileTests.UploadReplacementFile_TooLarge_ThrowsError` intermittently throws `HttpRequestException: Error while copying content to a stream` (transient connection drop streaming the large file) instead of the expected validation error. Shows as Failed: 1 / Passed: ~146 after the full ~14 min run.
+
+Also expected, not a failure: on frontend-only PRs the backend image-publish workflows (`lexbox-fw-headless`, `lexbox-hgweb`) don't trigger (path filters), so `setup-k8s` gets `manifest unknown` pulling those images at the PR version and falls back to the `develop` tag via `continue-on-error`. Those log lines are noise.
+
+Separately: a PR whose merge state is CONFLICTING silently *skips* the build/test checks rather than failing them — if expected checks are missing, reconcile with develop first.
+
+---
+
 ## Workflow Dependencies
 
 ```mermaid
diff --git a/AGENTS.md b/AGENTS.md
@@ -52,9 +52,11 @@ Key documentation for this project:
 
 ### Testing
 
-- ❌ **Do NOT run LexBox dotnet INTEGRATION tests** unless the user explicitly asks. They require full test infrastructure (database, services) which usually isn't available.
-- ✅ **FwLite integration tests CAN be run** — e.g. `FwLiteProjectSync.Tests` They're just a bit slow, but run them freely when making critical changes to relevant code.
-- ✅ **DO run unit tests locally** and filter to the tests that are relevant to the changes you are making. Use IDE testing tools over the cli.
+- ✅ **DO run unit tests via the CLI**, filtered to the tests relevant to your changes (e.g. `dotnet test backend/FwLite/FwLiteShared.Tests --filter "FullyQualifiedName~MyTestClass"`). Verify tests you wrote or changed actually pass before handing work back. Never run whole suites just to "see if anything broke".
+- ✅ **FwLite integration tests** (e.g. `FwLiteProjectSync.Tests`) need no infrastructure but are slow. Run a **targeted selection** (specific tests, not necessarily whole classes) when you touched critical sync code **and believe the work is finished** — not on every iteration. Waiting on tests burns time; be deliberate about which runs buy real signal.
+- ✅ **`backend/Testing` contains unit tests too** — only tests marked `Category=Integration|FlakyIntegration|RequiresDb` (and the `Testing.Browser` namespace) need infrastructure. Its unit tests are fine to run: `task test:unit -- <filter>` excludes those categories for you.
+- ✅ **FwLite viewer Playwright tests MAY be run** — they're cheap: `task playwright-test-standalone -- <test-name-filter>` (from `frontend/viewer/`) auto-starts the vite dev server with the in-browser demo project; no lexbox stack, chromium only. Always filter to relevant tests; details in `frontend/viewer/AGENTS.md`.
+- ❌ **Do NOT run tests that need the lexbox stack** unless the user explicitly asks: LexBox integration tests (`Category=Integration`/`FlakyIntegration`, `Testing.Browser`) and the lexbox frontend Playwright suite (`frontend/tests`). The local stack is usually down or torn down between sessions and results aren't trustworthy — rely on CI for these.
 
 ### Questions?
 
@@ -78,7 +80,6 @@ Before implementing any change that will touch many files or is in a 🔴 **Crit
 - ✅ If the user asks about "the" PR, but does not explicitly name a PR or branch, assume they mean the PR associated with the current branch.
 - ✅ Use **Mermaid diagrams** for flowcharts and architecture (not ASCII art)
 - ✅ Prefer IDE diagnostics (compiler/lint errors) over CLI tools for identifying issues. Fixing these diagnostics is part of completing any instruction.
-- ✅ Do NOT run integration tests unless user explicitly requests
 - ✅ When handling a user prompt ALWAYS ask for clarification if there are details to clarify, important decisions that must be made first or the plan sounds unwise
 - ❌ Do NOT git commit or git push without explicit user approval
 
diff --git a/backend/FwLite/AGENTS.md b/backend/FwLite/AGENTS.md
@@ -12,7 +12,7 @@ Lightweight FieldWorks application for dictionary editing with CRDT-based sync.
 **Before making changes:**
 1. Read the relevant section below thoroughly
 2. Understand the sync flow end-to-end
-3. Run the full test suite: `dotnet test FwLiteOnly.slnf`
+3. Identify which tests cover the affected area (run a targeted selection when the work is done — see the root `AGENTS.md` Testing section)
 4. Test with real FwData projects, not just unit tests
 
 ---
@@ -23,7 +23,7 @@ Lightweight FieldWorks application for dictionary editing with CRDT-based sync.
 # Run FwLite Web (typical workflow)
 task fw-lite-web   # from repo root
 
-# Run tests (ALWAYS run before committing)
+# Run all FwLite tests (slow — prefer targeted runs, see root AGENTS.md Testing section)
 dotnet test FwLiteOnly.slnf
 
 # Build MAUI app (Windows)
@@ -269,15 +269,9 @@ if (entity?.DeletedAt is not null) return;
 
 ## Testing Strategy
 
-### Before ANY commit:
+### When the work is finished:
 
-```bash
-# Run all FwLite tests
-dotnet test FwLiteOnly.slnf
-
-# If touching sync code, also run:
-dotnet test FwLiteProjectSync.Tests
-```
+Run a targeted selection of the tests covering what you changed (root `AGENTS.md` → Testing). For 🔴 critical sync changes that usually includes the relevant `FwLiteProjectSync.Tests` scenarios. `dotnet test FwLiteOnly.slnf` runs everything but is slow — reserve it for when broad signal is genuinely needed.
 
 ### Test Categories
 
diff --git a/frontend/AGENTS.md b/frontend/AGENTS.md
@@ -60,7 +60,7 @@ pnpm run -r lint
 
 - Playwright for E2E tests
 - Test files in `tests/`
-- Run with `pnpm test`
+- Run with `pnpm test` — requires the full local lexbox stack (`task up`). Agents: don't run these locally, rely on CI (root `AGENTS.md` → Testing). The cheap, agent-runnable Playwright suite is the *viewer's* (`viewer/AGENTS.md`), not this one.
 
 ## Important Files