Problem
When a --test-cmd is provided, thinktank runs it on each agent's worktree to score results. But if the test suite is already failing on the main branch (broken environment, missing env var, DB not up, flaky tests), every agent scores "tests failed" — even the best one. The scoring is meaningless and the recommendation is unreliable.
There is currently no way to distinguish:
- "Agent broke the tests" (agent is bad)
- "Tests were already failing before any agent ran" (environment issue)
Proposed Solution
Before spinning up agents, run the test command once on the current branch (without any agent changes). If it fails:
⚠ Pre-flight test check failed on main branch:
Command: npm test
Exit code: 1
Your test suite is failing before any agent runs. This will make
test-based scoring unreliable.
Options:
--skip-preflight Skip this check and run anyway
--no-test-cmd Omit test scoring for this run
Fix your test suite first, or use --skip-preflight to proceed anyway.
If pre-flight passes, add a line to the run output: "✓ Pre-flight: tests pass on main branch".
Acceptance Criteria
Problem
When a
--test-cmdis provided, thinktank runs it on each agent's worktree to score results. But if the test suite is already failing on the main branch (broken environment, missing env var, DB not up, flaky tests), every agent scores "tests failed" — even the best one. The scoring is meaningless and the recommendation is unreliable.There is currently no way to distinguish:
Proposed Solution
Before spinning up agents, run the test command once on the current branch (without any agent changes). If it fails:
If pre-flight passes, add a line to the run output: "✓ Pre-flight: tests pass on main branch".
Acceptance Criteria
--test-cmdis provided, a pre-flight test run executes on the current working tree before agents start--skip-preflight--skip-preflightflag skips the check and proceeds (useful when tests are flaky or intentionally broken before the fix)