|
| 1 | +--- |
| 2 | +name: fix-flaky-go-test |
| 3 | +description: >- |
| 4 | + Fix flaky Go tests in Chainlink: stress, Postgres, -shuffle, race (tools/bin), |
| 5 | + build tags. Use for intermittent failures, CI-only, -count/-shuffle issues, |
| 6 | + races, noisy output. |
| 7 | +--- |
| 8 | + |
| 9 | +# Fix flaky Go tests (Chainlink) |
| 10 | + |
| 11 | +<scope> |
| 12 | +Reproduce before refactors. Fix determinism, isolation, time, concurrency. |
| 13 | +Do not widen assertions or add blind retries. |
| 14 | +Core tests need Postgres and usually CL_DATABASE_URL. CI uses tools/bin (gotestsum, race, integration), not only go test ./... |
| 15 | +Read README.md Running tests, .github/workflows/ci-core.yml, tools/bin for parity. |
| 16 | +</scope> |
| 17 | + |
| 18 | +<setup> |
| 19 | +Run README prep: pnpm, make mockery, make generate, Postgres, make setup-testdb, source .dbenv, make testdb after pulls. Use make testdb-force if DB stuck. |
| 20 | +Unset env vars except CL_DATABASE_URL when tests act wrong. |
| 21 | +CL_DATABASE_URL must target a *_test database (preparetest). |
| 22 | +Modules: repo root, integration-tests/, core/scripts/. Run go test from the correct module root. |
| 23 | +</setup> |
| 24 | + |
| 25 | +<requirements> |
| 26 | +If unknown, ask: package path, test name, module root, whether file is //go:build integration, whether test uses pgtest/cltest/SqlxDB or is -short safe. |
| 27 | +State your assumptions when you start. |
| 28 | +</requirements> |
| 29 | + |
| 30 | +<principles> |
| 31 | +Stress with plain go test -count/-failfast/-shuffle; gotestsum --rerun-fails in tools/bin/go_core_tests can hide flakes on PRs. |
| 32 | +Treat flakes as production bugs until disproved. |
| 33 | +Prefer injected time, IO, randomness; per-test resources; scoped state. |
| 34 | +Do not loosen timeouts or assertions without a named cause. |
| 35 | +</principles> |
| 36 | + |
| 37 | +<classify> |
| 38 | +Append --tags integration to every go test below if the file has //go:build integration. |
| 39 | +deployment/ CCIP: use tools/bin/go_core_ccip_deployment_tests pattern (cd deployment, CL_RESERVE_PORTS=128). |
| 40 | +Optional CI parity: GODEBUG=goindex=0 on go test (see ci-core.yml). |
| 41 | +If the file uses //go:build dev or trace, add matching --tags when reproducing. |
| 42 | +</classify> |
| 43 | + |
| 44 | +<workflow> |
| 45 | +<reproduce> |
| 46 | +Stop when you have a stable repro. Add -v when needed. |
| 47 | +Record package, -run regex, failure mode. |
| 48 | + |
| 49 | +1. No DB quick path: |
| 50 | +```sh |
| 51 | +go test -short ./path/to/pkg -run '^TestName$' -count 100 -failfast |
| 52 | +``` |
| 53 | + |
| 54 | +2. With DB from repo root: |
| 55 | +```sh |
| 56 | +source .dbenv && make testdb |
| 57 | +go test ./path/to/pkg -run '^TestName$' -count 100 -failfast |
| 58 | +``` |
| 59 | + |
| 60 | +3. Whole package: same DB prep then go test ./path/to/pkg -count 100 -failfast |
| 61 | + |
| 62 | +4. Shuffle: add -shuffle on; bisect with -shuffle N |
| 63 | + |
| 64 | +5. Race (fail if race.* exists): |
| 65 | +```sh |
| 66 | +GORACE="log_path=$PWD/race" go test -race -shuffle on -timeout 10s -count 100 ./path/to/pkg -run '^TestName$' -failfast |
| 67 | +``` |
| 68 | + |
| 69 | +6. Parallelism probe: -cpu 1,2,4 and -parallel 4 with -shuffle on -count 50 -failfast |
| 70 | + |
| 71 | +7. Optional full unit job after local repro: GODEBUG=goindex=0 ./tools/bin/go_core_tests ./... (see script for GITHUB_EVENT_NAME flags) |
| 72 | +</reproduce> |
| 73 | + |
| 74 | +<fix> |
| 75 | +Apply fix_patterns. Avoid permanent time.Sleep as the main fix. |
| 76 | +Re-run the same repro command. Record shuffle seed in commit or comment if order-dependent. |
| 77 | +</fix> |
| 78 | +</workflow> |
| 79 | + |
| 80 | +<root_causes> |
| 81 | +General: package init and globals, t.Parallel plus shared fixtures, wall clock without fakes, port or path collisions, map order assumptions, leaked env or cwd, goroutines after test end. |
| 82 | + |
| 83 | +Chainlink: shared Postgres or stale schema; missing pgtest.NewSqlxDB(t); cltest.TestApplication teardown or leaked HTTP; ports without :0 or CL_RESERVE_PORTS; stress without --tags integration on integration files; wrong module root. |
| 84 | +</root_causes> |
| 85 | + |
| 86 | +<fix_patterns> |
| 87 | +Scope state per test. Use t.Cleanup only when needed and obvious. Inject time, randomness, net, fs. Use t.TempDir and :0 listeners. Serialize or drop t.Parallel on shared resources. Prefer channels, WaitGroup, explicit sync over sleep polls. |
| 88 | + |
| 89 | +Chainlink: pgtest.NewSqlxDB(t) and core/internal/testutils/pgtest helpers; testutils.Context(t); core/internal/cltest TestApplication and matching cleanup; configtest and evmtest under core/internal/testutils; core/utils/testutils/heavyweight for ORM-heavy tests. |
| 90 | +</fix_patterns> |
| 91 | + |
| 92 | +<verify> |
| 93 | +Write the exact repro go test line including -run and --tags integration when relevant. |
| 94 | +Race: GORACE log_path, go test -race -shuffle on, confirm no race.* or document skip. |
| 95 | +Optional: TIMEOUT and COUNT with ./tools/bin/go_core_race_tests. |
| 96 | +Do not merge unexplained timeout or assertion loosening. |
| 97 | +</verify> |
0 commit comments