test(rabbitmq): bootstrap global CouchDB views in TestMain#4753
Open
Crash-- wants to merge 1 commit into
Open
Conversation
Tests in this package call lifecycle.Create / instance.Get directly without going through testutils.NewSetup → GetTestInstance → stack.Start, so they never trigger couchdb.InitGlobalDB themselves. They've historically relied on the side effect of earlier model/* test packages bootstrapping the global DB and on the design doc persisting in the shared CouchDB service across test binaries. Go's test result cache (persisted via actions/setup-go cache) can let those packages be skipped, breaking the implicit dependency. The CI flake on PR #4716 manifested as TestSyncCreatedOrgContact failing with "CouchDB(not_found): missing" because instance.Service.Get queried _design/domain-and-aliases on a global instances DB where that design doc had never been created. A more robust fix would be to make instance.Service.Get treat any CouchDB "not_found" as ErrNotFound (it currently only handles no_db_file / "Database does not exist."). That would remove the implicit dependency for every package, not just this one. The repercussions on other Get callers haven't been fully audited yet, so this localized bootstrap stays in place until the broader change is vetted. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Background — how we ran into this
While debugging CI failures on #4716 (S3 VFS backend), the
test (1.25.x, 3.2.3)job kept failing with this signature, deterministically across reruns:The same source built green on
test (1.26.x, 3.3.3)in the same workflow run, and crucially had been green ontest (1.25.x, 3.2.3)on the immediately preceding commit (only diff: a single trailing newline removal in an unrelated file). Reruns on fresh GitHub-hosted runners reproduced the failure identically. After purging the PR'sactions/setup-gocaches and re-running, both jobs went green.That ruled out a normal flake and pointed at cache-shaped state.
Root cause
pkg/rabbitmqtests calllifecycle.Create/instance.Getdirectly. They never go throughtestutils.NewSetup→GetTestInstance→stack.Start, so they never triggercouchdb.InitGlobalDB, which is the only function in the codebase that creates the design doc backing thedomain-and-aliasesview on the globalinstancesDB.So the implicit assumption these tests have been making is:
go test ./...runs packages in lexicographic order, somodel/move,model/sharing, etc. run beforepkg/rabbitmq.NewSetup/GetTestInstance, which callsstack.Startand thereforeInitGlobalDB.pkg/rabbitmqqueries it.actions/setup-go@v5 cache: truecaches~/.cache/go-build, which containsgo test's result cache. When the cache is warm and a package's inputs haven't changed, Go reports(cached) okfor it without re-running the test binary — so the side effect of creating the design doc never happens. The CouchDB service in the new run is fresh (its container is per-job), so the design doc is genuinely absent.Then in
instance.Service.Get:IsNoDatabaseErroronly matchesReason == "no_db_file"or"Database does not exist.". A missing design doc comes back as404 not_found "missing", which falls through to the raw-error branch.lifecycle.Createdoes not unwrap that asErrNotFound, propagates it, and the secondlifecycle.Createcall in the test (the firstbob := createInstanceInOrg(...)aftertarget := ...) fails.This explains every detail of the observed symptom:
createInstanceInOrgcall: the first creates the global DB lazily viaCreateDoc(whichIsNoDatabaseErrordoes handle), the second hits the now-existing DB but missing-design-doc path.setup-go-...-go-1.25.9-...vs...-go-1.26.3-...), and the two caches were in different states.model/*to actually re-run, recreating the design doc.Fix in this PR
Add a
TestMaininpkg/rabbitmqthat bootstraps the global views the same way production does, removing the cross-package implicit dependency for this package:loadTestConfigForMainis a small in-file replica ofconfig.UseTestFile's setup, sinceUseTestFilerequires a*testing.Twe don't have insideTestMain. CouchDB unreachability is tolerated (logged) so thisTestMaindoesn't hard-fail in environments where CouchDB is intentionally absent — individual tests that need it will still fail throughtestutils.NeedCouchdb(t)as before.Why not the broader fix
A more robust change would live in
model/instance/service.go:IsNotFoundErroris a strict superset ofIsNoDatabaseErrorand would treat a missing design doc as "no instance with that domain", which is what every caller ofinstance.Getalready wants. That removes the implicit dependency for every test package, not just this one.We deliberately don't ship that change here: the repercussions on other
Getcallers haven't been fully audited (e.g. it would silently hide a deployment-time view rename), and we wanted the immediate CI fix decoupled from a behavior-shift in core instance lookup. TheTestMaincarries aTODOpointing at this follow-up.Test plan
pkg/rabbitmqtests pass on bothtest (1.25.x, 3.2.3)andtest (1.26.x, 3.3.3)from a clean runner.model/*is cache-skipped (which is what cached reruns simulate).connection_test.go/publisher_test.go(which don't need CouchDB) still run unaffected when CouchDB is absent locally.🤖 Generated with Claude Code