Skip to content

[DX-3769] CLI and AI Skill to Discover and Diagnose Flaky Tests (and easier test runs in general)#22125

Open
kalverra wants to merge 41 commits intodevelopfrom
testRunner
Open

[DX-3769] CLI and AI Skill to Discover and Diagnose Flaky Tests (and easier test runs in general)#22125
kalverra wants to merge 41 commits intodevelopfrom
testRunner

Conversation

@kalverra
Copy link
Copy Markdown
Collaborator

@kalverra kalverra commented Apr 22, 2026

Includes a new tool, tools/test that provides a simpler way for devs to run /chainlink unit tests, and includes features to help hunt down flakes and races.

Simpler Test Runs

Running tests in /chainlink requires chaining together 2-3 make commands, and you can miss steps. Not anymore!

# Help menu for all available commands
go tool test -h

# Spin up a Postgres container, run go tests, and tear it down all with one command!
go tool test run -v -count=1 -p 4 ./core/...
# Or use make
make new_test ARGS="-v -p 4 ./core/..."

# Use gotestsum if you prefer!
go tool test gotestsum --format=dots -- -count=1 ./core/...
# Even with make!
make new_gotestsum ARGS="--format=dots -- -count=1 ./core/..."

Why not just go test ./...?

I'd love to, but the problem is overhead.

go test doesn't have any sort of "before all test runs" control. So, to make go test ./... work, we'd need to launch a new PostgreSQL test container for every package that needs a real DB. This can cause serious memory, processing, and runtime overhead.

It is possible this isn't a big deal, and we could reduce/eliminate this concern with some refactoring, but that's outside the scope of this approach.

diagnose

You can now easily diagnose test issues with a single command, re-running tests, packages, and suites, and generating summarized results. With one command, you can exhaustively search for:

  • Flakes
  • Races
  • Slow tests
  • Deadlocks
# Run a package 25 times to look for slow tests (30s or more)
go tool test diagnose --iterations 25 --slow-threshold 30s -- ./core/services/ocr2/plugins/ocr2keeper/...
# Run all of core 10 times with race (will take a while!)
go tool test diagnose --iterations 10 -- --timeout=10m -race ./core/...

/diagnose-tests AI Skill

It contains steps for the agent to run various survey and test commands, review logs and code, and provide fixes for issues you encounter.

Note: Current sandbox and security restrictions mean that most agents will have to ask you to run the commands for them. Working with the security team on a workaround.

Note: Most agents will discover this skill automatically. But Claude Code won't, you'll need to point it to the file.

Future Plans/Improvements

  • Refine /diagnose-tests as I use it more, include more advice and restrictions
  • Maybe pull most of the test run logic into a separate repo and just go install it here?

Copilot AI review requested due to automatic review settings April 22, 2026 01:41
@kalverra kalverra requested review from a team as code owners April 22, 2026 01:41
@github-actions
Copy link
Copy Markdown
Contributor

👋 kalverra, thanks for creating this pull request!

To help reviewers, please consider creating future PRs as drafts first. This allows you to self-review and make any final changes before notifying the team.

Once you're ready, you can mark it as "Ready for review" to request feedback. Thanks!

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 22, 2026

✅ No conflicts with other open PRs targeting develop

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Risk Rating: MEDIUM

Adds a new nested Go module (tools/test) that provides a developer-friendly harness for running Chainlink unit tests (with optional ephemeral Postgres) plus a diagnose mode for surfacing flakes/timeouts/slow tests, and ships a corresponding agent skill/playbook.

Changes:

  • Introduce tools/test CLI with test, gotestsum, and multi-iteration diagnose workflows (including progress UI + report/log generation).
  • Add Postgres lifecycle management via testcontainers-go, plus runner/analyzer logic and unit tests.
  • Wire repository docs + make targets + agent skill docs to make the new workflows discoverable.

Scrupulous human review recommended (high-impact areas):

  • tools/test/internal/db/db.go: container lifecycle, Ryuk disablement, and failure/interrupt cleanup paths.
  • tools/test/internal/runner/runner.go: correctness of stdout/stderr capture and JSONL integrity under concurrent writes/cancellation.
  • tools/test/go.mod: toolchain version selection and compatibility with the root module.

Reviewed changes

Copilot reviewed 28 out of 31 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tools/test/main.go Entry point for the nested tools/test module CLI.
tools/test/internal/termstyle/termstyle.go Shared lipgloss palette for CLI progress/summary output.
tools/test/internal/runner/runner.go Implements go test, gotestsum, and the diagnose loop orchestration.
tools/test/internal/runner/runner_test.go Tests diagnose arg building, results dir naming, canceled-ctx behavior, etc.
tools/test/internal/runner/diagnose_results_dir.go Generates bounded-length results directory basenames.
tools/test/internal/runner/diagnose_progress.go Parses go test -json to track progress and render status lines.
tools/test/internal/runner/diagnose_progress_test.go Tests progress parsing/rendering helpers.
tools/test/internal/runner/analyze.go Parses iteration logs into report.json/csv + per-test log files + summary rendering.
tools/test/internal/runner/analyze_test.go Unit tests for flake/failure/timeout/slow classification and log capture.
tools/test/internal/runner/analyze_files_test.go Unit tests for writing log files and CSV output.
tools/test/internal/repo/repo.go Locates the chainlink repo root by walking up to the root go.mod.
tools/test/internal/repo/repo_test.go Tests repo root discovery behavior.
tools/test/internal/db/db.go Manages ephemeral Postgres via testcontainers + reset/dump/cleanup helpers.
tools/test/internal/db/db_test.go Tests dump-state nil/no-container behavior.
tools/test/internal/config/config.go Viper/cobra config binding and defaults for the harness.
tools/test/internal/config/config_test.go Tests persistent + local flag binding into config.
tools/test/internal/cmd/root.go Cobra root command, signal-aware execution, and DB setup pre-run.
tools/test/internal/cmd/root_test.go Tests displayed command path formatting.
tools/test/internal/cmd/test.go test subcommand (passthrough to go test).
tools/test/internal/cmd/gotestsum.go gotestsum subcommand wiring.
tools/test/internal/cmd/diagnose.go diagnose subcommand wiring and flags.
tools/test/go.mod Declares the nested module and dependencies.
tools/test/go.sum Adds dependency checksums for the nested module.
tools/test/README.md Basic usage docs for the new harness.
tools/test/AGENTS.md Agent-facing constraints/goals/commands for working on this tool.
tools/test/fixing-flaky-tests.md General playbook for diagnosing/fixing flakes.
tools/README.md Adds a pointer to the new tools/test harness.
GNUmakefile Adds new_test, new_gotestsum, and new_test_diagnose targets.
.gitignore Ignores diagnose-* output dirs while explicitly keeping the agent skill directory.
.agents/skills/diagnose-tests/SKILL.md Adds an AI skill for running diagnose and using its report/logs to drive fixes.

Comment thread tools/test/internal/runner/analyze.go Outdated
Comment thread tools/test/internal/runner/analyze_files_test.go Outdated
Comment thread tools/test/internal/db/db.go Outdated
Comment thread tools/test/internal/cmd/diagnose.go Outdated
Comment thread tools/test/internal/config/config.go Outdated
Comment thread tools/test/internal/runner/runner.go
Comment thread tools/test/internal/runner/runner.go
@trunk-io
Copy link
Copy Markdown

trunk-io Bot commented Apr 22, 2026

Static BadgeStatic BadgeStatic BadgeStatic Badge

Failed Test Failure Summary Logs
TestNewHTTPClient_PortRanges The test 'TestNewHTTPClient_PortRanges' failed during execution. Logs ↗︎
TestConfigDocs The test failed because the actual documentation content did not match the expected documentation string. Logs ↗︎
TestConfigDocs The test failed because the actual documentation content did not match the expected documentation string. Logs ↗︎
TestIntegration_KeeperPluginLogUpkeep_ErrHandler The test failed without a specific error message, indicating an unspecified failure during execution. Logs ↗︎

... and 1 more

View Full Report ↗︎Docs

Comment thread GNUmakefile Outdated
# Note: do not use "make target -p 4 ..." — -p is a make flag; use ARGS= instead.
.PHONY: new_test
new_test: ## tools/test: passthrough go test. Usage: make new_test ARGS="-v -p 4 ./core/..."
go -C ./tools/test run . test $(ARGS)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to use -C, rather than e.g.:

Suggested change
go -C ./tools/test run . test $(ARGS)
go run ./tools/test test $(ARGS)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tools/test is a submodule, so trying to run it from /chainlink root that way makes go confused.

❯ go run ./tools/test test ./core/bridges/...
main module (github.com/smartcontractkit/chainlink/v2) does not contain package github.com/smartcontractkit/chainlink/v2/tools/test

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah so I bet we could use go tool

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This, in combo with a relative replace should work:

tool github.com/smartcontractkit/chainlink-common/pkg/loop/cmd/loopinstall

Then you can do:

Suggested change
go -C ./tools/test run . test $(ARGS)
go tool test test $(ARGS)

And regardless, maybe consider differentiating the command and sub-command names?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! I've now implemented this approach, so you can call with

# Use vanilla go test commands                                                        
go tool test run -v -count=1 -p 4 ./core/...                                          
# Use gotestsum as the runner                                                         
go tool test gotestsum --format=dots -- -count=1 ./core/...                           
# Run the full core test suite 10 times and collect statistics, debug logs, and more  
go tool test diagnose --iterations 10 -- --timeout=15m ./core/...  

Pros

  • Smoother approach

Cons

  • This does mean we pull in dependencies from tools/test/go.mod whenever we run go mod tidy from the root /chainlink repo.

@kalverra kalverra requested a review from Fletch153 April 23, 2026 14:43
Copy link
Copy Markdown
Contributor

@sebawo sebawo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together. I found two issues that look worth fixing before merge: one can leave orphaned Postgres containers when gotestsum is missing, and one can hide package-level failures from diagnose reports.

Comment thread tools/test/internal/runner/analyze.go
Comment thread tools/test/internal/cmd/gotestsum.go Outdated
}()

if conf.Iterations < 1 {
return errors.New("--iterations must be >= 1")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: or default to 1?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's already set. Do you mean if someone passes in --iterations -2 we should just default to 1?

Comment thread tools/test/internal/config/config.go
Comment thread tools/test/internal/db/db.go
Comment thread tools/test/internal/db/db.go Outdated
Comment thread tools/test/internal/db/db.go Outdated
Comment thread tools/test/internal/repo/repo.go
Comment thread tools/test/internal/runner/analyze.go Outdated
// Gotestsum runs `gotestsum` with the given args (repo root as working directory).
func Gotestsum(ctx context.Context, conf *config.App, args []string) error {
if _, err := exec.LookPath("gotestsum"); err != nil {
return fmt.Errorf("gotestsum not on PATH: install with go install gotest.tools/gotestsum@latest: %w", err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we install it instead of failing? if so, should we support asdf and run reshim?

}

interrupted := ctx.Err() != nil
if interrupted && !conf.AIOutput {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably not worth to have a separate package for logging that would know how to handle AIOutput and we wouldn't have to have these ifs all over?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was considering it.

@cl-sonarqube-production
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants