You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The roachtest selective-test logic in pkg/cmd/roachtest/main.go is currently
opaque to anyone investigating a nightly run. The only signals emitted today are
two fmt.Printf lines, e.g. from Azure nightly build #6871 (master,
2026-04-22):
[05:56:21] 112 selected out of 318 successful tests.
[05:56:21] 587 out of 777 tests selected for the run!
These end up in the TeamCity build log but not in any file artifact under /artifacts/_runner-logs/, so they are not picked up by the Datadog log
uploader. Concrete consequences:
To explain why 587/777 (~75%) tests were selected, I had to read pkg/cmd/roachtest/main.go
(updateSpecForSelectiveTests) and pkg/cmd/roachtest/testselector/snowflake_query.sql
line-by-line. There is no per-criterion breakdown for the 5 OR'd selected=yes rules in the Snowflake query (failure_count > 0, first_run > now-N, last_run < now-M, last failure was preempt, last_status='UNKNOWN') — so we can't tell which rule is dominating.
Have updateSpecForSelectiveTests (and the surrounding selection code in main.go) write structured records to a file under /artifacts/_runner-logs/ so that the Datadog log uploader picks them up.
At a minimum the file should record:
total specs in suite, count of successful pool, count selected
per-criterion attribution counts (how many tests fired each of the 5
Snowflake selected=yes rules)
whether the Snowflake fallback path was triggered, and the underlying
error if so
the resolved --cloud, --suite, --successful-test-select-pct, TC_BUILD_BRANCH
(Nice to have) A per-test or sampled-per-test breakdown so we can answer
"why was test X selected" without re-running.
Once (1) is in place, add a Datadog monitor that fires when the fallback
path is exercised, so we know when test selection is operating without
Snowflake input.
Describe alternatives you've considered
Parsing the TeamCity build log out of band — fragile and not searchable.
Plumbing a *logger.Logger from main.go only for the current two summary
lines — punts on the per-criterion attribution which is the more useful
signal.
Background
The roachtest selective-test logic in
pkg/cmd/roachtest/main.gois currentlyopaque to anyone investigating a nightly run. The only signals emitted today are
two
fmt.Printflines, e.g. from Azure nightly build #6871 (master,2026-04-22):
These end up in the TeamCity build log but not in any file artifact under
/artifacts/_runner-logs/, so they are not picked up by the Datadog loguploader. Concrete consequences:
pkg/cmd/roachtest/main.go(
updateSpecForSelectiveTests) andpkg/cmd/roachtest/testselector/snowflake_query.sqlline-by-line. There is no per-criterion breakdown for the 5 OR'd
selected=yesrules in the Snowflake query (failure_count > 0,first_run > now-N,last_run < now-M, last failure was preempt,last_status='UNKNOWN') — so we can't tell which rule is dominating.Snowflake no longer crashes roachtest. The fallback emits one
error selecting tests:line to stdout, but with no Datadog-indexed logwe can't alert when the fallback fires. This was explicitly called out in
#168462 (comment)
and deferred to a follow-up.
Goal
updateSpecForSelectiveTests(and the surrounding selection code inmain.go) write structured records to a file under/artifacts/_runner-logs/so that the Datadog log uploader picks them up.At a minimum the file should record:
successfulpool, count selectedSnowflake
selected=yesrules)error if so
--cloud,--suite,--successful-test-select-pct,TC_BUILD_BRANCH"why was test X selected" without re-running.
path is exercised, so we know when test selection is operating without
Snowflake input.
Describe alternatives you've considered
*logger.Loggerfrommain.goonly for the current two summarylines — punts on the per-criterion attribution which is the more useful
signal.
Additional context
21322322.
This issue was drafted by Claude (Claude Code) during a triage session with @williamchoe3.
Epic: none
Jira issue: CRDB-63176