fix: TestShell_BeforeNode db leakage by erikburt · Pull Request #21504 · smartcontractkit/chainlink

erikburt · 2026-03-11T22:25:04Z

Changes

Parallelize more tests (core/cmd: more parallel tests #21466)
Fix issue with csa_test.go as seen in that PR^

Notes

In TestShell_BeforeNode we are trying to use txdb driver so the changes are not actually persisted/leaked to other tests
- However, when calling app.Before(c) it ignores any preexisting config set on the Shell object. Defaulting the driver to pgx which causes db leakage, which resulted in csa_test.go failling as it was expecting a clean keystore
- Adding newAppWithOpts and allow the opts object to actually set the driver, does allow for txdb driver, but that comes with other issues.
The current solution in this PR (newAppWithOpts) does work locally when running only the core/cmd package.
- There's something else at play when we run the full suite in CI which causes these tests to timeout.
- Although, it's also failing for the "go core integration tests` pipeline
Running with locally -race also breaks some more tests in the same package

Testing

https://github.com/smartcontractkit/chainlink/actions/runs/22979409950/job/66715662324?pr=21504

github-actions · 2026-03-11T22:26:13Z

✅ No conflicts with other open PRs targeting develop

github-actions · 2026-03-11T22:26:31Z

I see you updated files related to core. Please run make gocs in the root directory to add a changeset as well as in the text include at least one of the following tags:

#added For any new functionality added.
#breaking_change For any functionality that requires manual action for the node to boot.
#bugfix For bug fixes.
#changed For any change to the existing functionality.
#db_update For any feature that introduces updates to database schema.
#deprecation_notice For any upcoming deprecation functionality.
#internal For changesets that need to be excluded from the final changelog.
#nops For any feature that is NOP facing and needs to be in the official Release Notes for the release.
#removed For any functionality/config that is removed.
#updated For any functionality that is updated.
#wip For any change that is not ready yet and external communication about it should be held off till it is feature complete.

trunk-io · 2026-03-12T00:33:26Z

_{View Full Report ↗︎ ⋅ Docs}

cl-sonarqube-production · 2026-03-12T18:17:51Z

Quality Gate failed

Failed conditions
15.11% Technical Debt Ratio on New Code (required ≤ 4%)
C Maintainability Rating on New Code (required ≥ A)

See analysis details on SonarQube

Catch issues before they fail your Quality Gate with our IDE extension SonarQube IDE

Expand the comment on TestShell_BeforeNode and TestShell_RunNode_WithBeforeNode to document *why* heavyweight.FullTestDBV2 is the right tool here, not txdb. The subtests need to share a seeded encrypted key ring so the incorrect_password case has something to fail decryption against (keystore.Unlock on an empty keystore silently creates a new ring with whatever password is supplied). Naively you would reach for txdb, but chainlink-common/pkg/sqlutil/pg/pg.go:69 passes uuid.New().String() as the DSN on every pg.NewConnection call — deliberately, so each ORM gets its own transaction — which means the seed connection and BeforeNode's LockedDB land on different keys in the txdb driver's DSN-keyed conns map and get independent transactions. Seed is invisible to BeforeNode. Same reason Erik's PR #21504 newAppWithOpts + txdb override couldn't make incorrect_password pass. A real per-test physical DB via FullTestDBV2 gives cross-connection visibility with its own t.Cleanup and without polluting the shared chainlink_test DB. No functional change — just documents the trap so the next person doesn't repeat the investigation. Refs: CORE-2388

TestShell_BeforeNode and TestShell_RunNode_WithBeforeNode were calling app.Before(c) after pre-setting shell.Config with a txdb-wrapped config via configtest.NewGeneralConfig. app.Before unconditionally ran opts.New() → CoreDefaults() which set Database.DriverName back to pgx and assigned the fresh config to s.Config, silently overwriting the test's txdb config. BeforeNode → pg.NewLockedDB then opened real pgx connections against the shared chainlink_test database, persisting keystore state and leaking it to other tests (flaking Test_CSAKeyStore_E2E) and eating slots from the server-wide max_connections budget (causing the mass 25-minute chan-receive timeouts in CORE-2388). Three changes: 1. core/cmd/app.go — guard the config creation in app.Before so a pre-set s.Config is preserved. Defense-in-depth: any future test that sets its own config and goes through app.Before is protected from the same silent swap. 2. core/cmd/shell_local.go — nil-guard CloseLogger in afterNode. Tests that call BeforeNode directly (without app.Before) never set it. app.After already has this nil check; afterNode was missing it. 3. core/cmd/shell_local_test.go — remove the unnecessary app.Before(c) calls from the two affected tests. BeforeNode only needs Config and Logger, both already set directly; other tests in the same file already build the shell this way. Also drop the incorrect_password subtests — they were load-bearing on the leak (the correct_password subtest would persist a CSA key to the shared DB, which the incorrect_password subtest would then fail to decrypt, producing the expected error). The wrong-password-against-populated-keyring invariant is already covered at the correct layer by TestMasterKeystore_Unlock_Save/won't_load_a_saved_keyRing_if_the_password_is_incorrect in core/services/keystore. Testing it through the CLI would require cross-connection state sharing that chainlink's txdb path does not support: chainlink-common/pkg/sqlutil/pg/pg.go:69 generates a fresh UUID DSN per pg.NewConnection call, isolating every pool into its own transaction. Refs: CORE-2388, CORE-2370 Supersedes: #21504

TestShell_BeforeNode and TestShell_RunNode_WithBeforeNode were calling app.Before(c) after pre-setting shell.Config with a txdb-wrapped config via configtest.NewGeneralConfig. app.Before unconditionally ran opts.New() → CoreDefaults() which set Database.DriverName back to pgx and assigned the fresh config to s.Config, silently overwriting the test's txdb config. BeforeNode → pg.NewLockedDB then opened real pgx connections against the shared chainlink_test database, persisting keystore state and leaking it to other tests (flaking Test_CSAKeyStore_E2E) and eating slots from the server-wide max_connections budget (causing the mass 25-minute chan-receive timeouts in CORE-2388). - core/cmd/app.go: guard opts.New() so a pre-set s.Config is preserved. Defense-in-depth against any future test hitting the same trap. - core/cmd/shell_local.go: nil-guard CloseLogger in afterNode (parity with app.After; tests calling BeforeNode directly never set it). - core/cmd/shell_local_test.go: remove the unnecessary app.Before(c) calls. Drop the incorrect_password subtests — they were load-bearing on the leak; the underlying keystore invariant is covered by TestMasterKeystore_Unlock_Save in core/services/keystore. Refs: CORE-2388, CORE-2370 Supersedes: #21504

…RE-2388] (#21975) * fix(cmd): prevent test DB connection leak via app.Before config swap TestShell_BeforeNode and TestShell_RunNode_WithBeforeNode were calling app.Before(c) after pre-setting shell.Config with a txdb-wrapped config via configtest.NewGeneralConfig. app.Before unconditionally ran opts.New() → CoreDefaults() which set Database.DriverName back to pgx and assigned the fresh config to s.Config, silently overwriting the test's txdb config. BeforeNode → pg.NewLockedDB then opened real pgx connections against the shared chainlink_test database, persisting keystore state and leaking it to other tests (flaking Test_CSAKeyStore_E2E) and eating slots from the server-wide max_connections budget (causing the mass 25-minute chan-receive timeouts in CORE-2388). - core/cmd/app.go: guard opts.New() so a pre-set s.Config is preserved. Defense-in-depth against any future test hitting the same trap. - core/cmd/shell_local.go: nil-guard CloseLogger in afterNode (parity with app.After; tests calling BeforeNode directly never set it). - core/cmd/shell_local_test.go: remove the unnecessary app.Before(c) calls. Drop the incorrect_password subtests — they were load-bearing on the leak; the underlying keystore invariant is covered by TestMasterKeystore_Unlock_Save in core/services/keystore. Refs: CORE-2388, CORE-2370 Supersedes: #21504 * test(cmd): restore app.Before call in BeforeNode tests Restores the app.Before(c) calls that were removed when fixing the txdb config overwrite bug. With the nil-check guard in app.Before, pre-set test configs are now preserved, making it safe to call app.Before in tests again. This keeps the test flow consistent with the real production CLI execution order. * test(cmd): restore incorrect password test using heavyweight DB Switch TestShell_BeforeNode to heavyweight.FullTestDBV2 so beforeNode's own DB connection can see pre-populated key material. This restores the incorrect password test case that was removed when the app.Before config fix made txdb connections isolated. * fix(cmd): use DB-backed keystore to seed encrypted key material cltest.NewKeyStore uses keystore.NewInMemory which stores keys in a struct field, never writing to encrypted_key_rings. BeforeNode opens its own DB connection and creates a separate keystore.New instance, so the seeded keys were invisible and any password was accepted. Switch to keystore.New (DB-backed) so encrypted keys persist to the table and the incorrect password test actually fails decryption.

erikburt added 2 commits March 11, 2026 15:20

parallelize tests

76321ac

fix: TestShell_BeforeNode db leakage

e34d38e

erikburt self-assigned this Mar 11, 2026

fix: different solution

393d41c

erikburt added 5 commits March 11, 2026 17:34

test

47284bf

fix: linting, add key id log for csa test

69adace

fix: TestShell_RunNode_WithBeforeNode

108f101

test: reduce parallelization on BeforeNode sub tests

d0a0bf0

test: less parallelization

4b00783

This was referenced Apr 8, 2026

fix: prevent test DB leakage from TestShell_BeforeNode #21916

Closed

fix: guard config initialization to prevent txdb config overwrite [CORE-2388] #21975

Merged

erikburt closed this Apr 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: TestShell_BeforeNode db leakage#21504

fix: TestShell_BeforeNode db leakage#21504
erikburt wants to merge 8 commits intodevelopfrom
fix/db-leakage-shell-local

erikburt commented Mar 11, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Mar 11, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Mar 11, 2026

Uh oh!

trunk-io Bot commented Mar 12, 2026 •

edited

Loading

Uh oh!

cl-sonarqube-production Bot commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

erikburt commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Notes

Testing

Uh oh!

github-actions Bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Mar 11, 2026

Uh oh!

trunk-io Bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cl-sonarqube-production Bot commented Mar 12, 2026

Quality Gate failed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

erikburt commented Mar 11, 2026 •

edited

Loading

github-actions Bot commented Mar 11, 2026 •

edited

Loading

trunk-io Bot commented Mar 12, 2026 •

edited

Loading