Skip to content

Commit 29b3d15

Browse files
committed
Adds a fix-flaky-go-test agent skill
1 parent 8351146 commit 29b3d15

1 file changed

Lines changed: 97 additions & 0 deletions

File tree

  • .agents/skills/fix-flaky-go-test
Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
---
2+
name: fix-flaky-go-test
3+
description: >-
4+
Fix flaky Go tests in Chainlink: stress, Postgres, -shuffle, race (tools/bin),
5+
build tags. Use for intermittent failures, CI-only, -count/-shuffle issues,
6+
races, noisy output.
7+
---
8+
9+
# Fix flaky Go tests (Chainlink)
10+
11+
<scope>
12+
Reproduce before refactors. Fix determinism, isolation, time, concurrency.
13+
Do not widen assertions or add blind retries.
14+
Core tests need Postgres and usually CL_DATABASE_URL. CI uses tools/bin (gotestsum, race, integration), not only go test ./...
15+
Read README.md Running tests, .github/workflows/ci-core.yml, tools/bin for parity.
16+
</scope>
17+
18+
<setup>
19+
Run README prep: pnpm, make mockery, make generate, Postgres, make setup-testdb, source .dbenv, make testdb after pulls. Use make testdb-force if DB stuck.
20+
Unset env vars except CL_DATABASE_URL when tests act wrong.
21+
CL_DATABASE_URL must target a *_test database (preparetest).
22+
Modules: repo root, integration-tests/, core/scripts/. Run go test from the correct module root.
23+
</setup>
24+
25+
<requirements>
26+
If unknown, ask: package path, test name, module root, whether file is //go:build integration, whether test uses pgtest/cltest/SqlxDB or is -short safe.
27+
State your assumptions when you start.
28+
</requirements>
29+
30+
<principles>
31+
Stress with plain go test -count/-failfast/-shuffle; gotestsum --rerun-fails in tools/bin/go_core_tests can hide flakes on PRs.
32+
Treat flakes as production bugs until disproved.
33+
Prefer injected time, IO, randomness; per-test resources; scoped state.
34+
Do not loosen timeouts or assertions without a named cause.
35+
</principles>
36+
37+
<classify>
38+
Append --tags integration to every go test below if the file has //go:build integration.
39+
deployment/ CCIP: use tools/bin/go_core_ccip_deployment_tests pattern (cd deployment, CL_RESERVE_PORTS=128).
40+
Optional CI parity: GODEBUG=goindex=0 on go test (see ci-core.yml).
41+
If the file uses //go:build dev or trace, add matching --tags when reproducing.
42+
</classify>
43+
44+
<workflow>
45+
<reproduce>
46+
Stop when you have a stable repro. Add -v when needed.
47+
Record package, -run regex, failure mode.
48+
49+
1. No DB quick path:
50+
```sh
51+
go test -short ./path/to/pkg -run '^TestName$' -count 100 -failfast
52+
```
53+
54+
2. With DB from repo root:
55+
```sh
56+
source .dbenv && make testdb
57+
go test ./path/to/pkg -run '^TestName$' -count 100 -failfast
58+
```
59+
60+
3. Whole package: same DB prep then go test ./path/to/pkg -count 100 -failfast
61+
62+
4. Shuffle: add -shuffle on; bisect with -shuffle N
63+
64+
5. Race (fail if race.* exists):
65+
```sh
66+
GORACE="log_path=$PWD/race" go test -race -shuffle on -timeout 10s -count 100 ./path/to/pkg -run '^TestName$' -failfast
67+
```
68+
69+
6. Parallelism probe: -cpu 1,2,4 and -parallel 4 with -shuffle on -count 50 -failfast
70+
71+
7. Optional full unit job after local repro: GODEBUG=goindex=0 ./tools/bin/go_core_tests ./... (see script for GITHUB_EVENT_NAME flags)
72+
</reproduce>
73+
74+
<fix>
75+
Apply fix_patterns. Avoid permanent time.Sleep as the main fix.
76+
Re-run the same repro command. Record shuffle seed in commit or comment if order-dependent.
77+
</fix>
78+
</workflow>
79+
80+
<root_causes>
81+
General: package init and globals, t.Parallel plus shared fixtures, wall clock without fakes, port or path collisions, map order assumptions, leaked env or cwd, goroutines after test end.
82+
83+
Chainlink: shared Postgres or stale schema; missing pgtest.NewSqlxDB(t); cltest.TestApplication teardown or leaked HTTP; ports without :0 or CL_RESERVE_PORTS; stress without --tags integration on integration files; wrong module root.
84+
</root_causes>
85+
86+
<fix_patterns>
87+
Scope state per test. Use t.Cleanup only when needed and obvious. Inject time, randomness, net, fs. Use t.TempDir and :0 listeners. Serialize or drop t.Parallel on shared resources. Prefer channels, WaitGroup, explicit sync over sleep polls.
88+
89+
Chainlink: pgtest.NewSqlxDB(t) and core/internal/testutils/pgtest helpers; testutils.Context(t); core/internal/cltest TestApplication and matching cleanup; configtest and evmtest under core/internal/testutils; core/utils/testutils/heavyweight for ORM-heavy tests.
90+
</fix_patterns>
91+
92+
<verify>
93+
Write the exact repro go test line including -run and --tags integration when relevant.
94+
Race: GORACE log_path, go test -race -shuffle on, confirm no race.* or document skip.
95+
Optional: TIMEOUT and COUNT with ./tools/bin/go_core_race_tests.
96+
Do not merge unexplained timeout or assertion loosening.
97+
</verify>

0 commit comments

Comments
 (0)