Skip to content

fix: TestShell_BeforeNode db leakage#21504

Closed
erikburt wants to merge 8 commits intodevelopfrom
fix/db-leakage-shell-local
Closed

fix: TestShell_BeforeNode db leakage#21504
erikburt wants to merge 8 commits intodevelopfrom
fix/db-leakage-shell-local

Conversation

@erikburt
Copy link
Copy Markdown
Collaborator

@erikburt erikburt commented Mar 11, 2026

Changes

Notes

  • In TestShell_BeforeNode we are trying to use txdb driver so the changes are not actually persisted/leaked to other tests
    • However, when calling app.Before(c) it ignores any preexisting config set on the Shell object. Defaulting the driver to pgx which causes db leakage, which resulted in csa_test.go failling as it was expecting a clean keystore
    • Adding newAppWithOpts and allow the opts object to actually set the driver, does allow for txdb driver, but that comes with other issues.
  • The current solution in this PR (newAppWithOpts) does work locally when running only the core/cmd package.
    • There's something else at play when we run the full suite in CI which causes these tests to timeout.
    • Although, it's also failing for the "go core integration tests` pipeline
  • Running with locally -race also breaks some more tests in the same package

Testing

https://github.com/smartcontractkit/chainlink/actions/runs/22979409950/job/66715662324?pr=21504

@erikburt erikburt self-assigned this Mar 11, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 11, 2026

✅ No conflicts with other open PRs targeting develop

@github-actions
Copy link
Copy Markdown
Contributor

I see you updated files related to core. Please run make gocs in the root directory to add a changeset as well as in the text include at least one of the following tags:

  • #added For any new functionality added.
  • #breaking_change For any functionality that requires manual action for the node to boot.
  • #bugfix For bug fixes.
  • #changed For any change to the existing functionality.
  • #db_update For any feature that introduces updates to database schema.
  • #deprecation_notice For any upcoming deprecation functionality.
  • #internal For changesets that need to be excluded from the final changelog.
  • #nops For any feature that is NOP facing and needs to be in the official Release Notes for the release.
  • #removed For any functionality/config that is removed.
  • #updated For any functionality that is updated.
  • #wip For any change that is not ready yet and external communication about it should be held off till it is feature complete.

@trunk-io
Copy link
Copy Markdown

trunk-io Bot commented Mar 12, 2026

Static BadgeStatic BadgeStatic BadgeStatic Badge

View Full Report ↗︎Docs

@cl-sonarqube-production
Copy link
Copy Markdown

Quality Gate failed Quality Gate failed

Failed conditions
15.11% Technical Debt Ratio on New Code (required ≤ 4%)
C Maintainability Rating on New Code (required ≥ A)

See analysis details on SonarQube

Catch issues before they fail your Quality Gate with our IDE extension SonarQube IDE SonarQube IDE

Fletch153 added a commit that referenced this pull request Apr 10, 2026
Expand the comment on TestShell_BeforeNode and TestShell_RunNode_WithBeforeNode
to document *why* heavyweight.FullTestDBV2 is the right tool here, not txdb.

The subtests need to share a seeded encrypted key ring so the
incorrect_password case has something to fail decryption against
(keystore.Unlock on an empty keystore silently creates a new ring
with whatever password is supplied). Naively you would reach for
txdb, but chainlink-common/pkg/sqlutil/pg/pg.go:69 passes
uuid.New().String() as the DSN on every pg.NewConnection call —
deliberately, so each ORM gets its own transaction — which means
the seed connection and BeforeNode's LockedDB land on different
keys in the txdb driver's DSN-keyed conns map and get independent
transactions. Seed is invisible to BeforeNode. Same reason Erik's
PR #21504 newAppWithOpts + txdb override couldn't make incorrect_password
pass.

A real per-test physical DB via FullTestDBV2 gives cross-connection
visibility with its own t.Cleanup and without polluting the shared
chainlink_test DB. No functional change — just documents the trap
so the next person doesn't repeat the investigation.

Refs: CORE-2388
Fletch153 added a commit that referenced this pull request Apr 11, 2026
TestShell_BeforeNode and TestShell_RunNode_WithBeforeNode were calling
app.Before(c) after pre-setting shell.Config with a txdb-wrapped config
via configtest.NewGeneralConfig. app.Before unconditionally ran
opts.New() → CoreDefaults() which set Database.DriverName back to pgx
and assigned the fresh config to s.Config, silently overwriting the
test's txdb config. BeforeNode → pg.NewLockedDB then opened real pgx
connections against the shared chainlink_test database, persisting
keystore state and leaking it to other tests (flaking Test_CSAKeyStore_E2E)
and eating slots from the server-wide max_connections budget (causing
the mass 25-minute chan-receive timeouts in CORE-2388).

Three changes:

1. core/cmd/app.go — guard the config creation in app.Before so a
   pre-set s.Config is preserved. Defense-in-depth: any future test
   that sets its own config and goes through app.Before is protected
   from the same silent swap.

2. core/cmd/shell_local.go — nil-guard CloseLogger in afterNode. Tests
   that call BeforeNode directly (without app.Before) never set it.
   app.After already has this nil check; afterNode was missing it.

3. core/cmd/shell_local_test.go — remove the unnecessary app.Before(c)
   calls from the two affected tests. BeforeNode only needs Config and
   Logger, both already set directly; other tests in the same file
   already build the shell this way. Also drop the incorrect_password
   subtests — they were load-bearing on the leak (the correct_password
   subtest would persist a CSA key to the shared DB, which the
   incorrect_password subtest would then fail to decrypt, producing
   the expected error). The wrong-password-against-populated-keyring
   invariant is already covered at the correct layer by
   TestMasterKeystore_Unlock_Save/won't_load_a_saved_keyRing_if_the_password_is_incorrect
   in core/services/keystore. Testing it through the CLI would require
   cross-connection state sharing that chainlink's txdb path does not
   support: chainlink-common/pkg/sqlutil/pg/pg.go:69 generates a fresh
   UUID DSN per pg.NewConnection call, isolating every pool into its
   own transaction.

Refs: CORE-2388, CORE-2370
Supersedes: #21504
Fletch153 added a commit that referenced this pull request Apr 11, 2026
TestShell_BeforeNode and TestShell_RunNode_WithBeforeNode were calling
app.Before(c) after pre-setting shell.Config with a txdb-wrapped config
via configtest.NewGeneralConfig. app.Before unconditionally ran
opts.New() → CoreDefaults() which set Database.DriverName back to pgx
and assigned the fresh config to s.Config, silently overwriting the
test's txdb config. BeforeNode → pg.NewLockedDB then opened real pgx
connections against the shared chainlink_test database, persisting
keystore state and leaking it to other tests (flaking Test_CSAKeyStore_E2E)
and eating slots from the server-wide max_connections budget (causing
the mass 25-minute chan-receive timeouts in CORE-2388).

- core/cmd/app.go: guard opts.New() so a pre-set s.Config is preserved.
  Defense-in-depth against any future test hitting the same trap.
- core/cmd/shell_local.go: nil-guard CloseLogger in afterNode (parity
  with app.After; tests calling BeforeNode directly never set it).
- core/cmd/shell_local_test.go: remove the unnecessary app.Before(c)
  calls. Drop the incorrect_password subtests — they were load-bearing
  on the leak; the underlying keystore invariant is covered by
  TestMasterKeystore_Unlock_Save in core/services/keystore.

Refs: CORE-2388, CORE-2370
Supersedes: #21504
github-merge-queue Bot pushed a commit that referenced this pull request Apr 14, 2026
…RE-2388] (#21975)

* fix(cmd): prevent test DB connection leak via app.Before config swap

TestShell_BeforeNode and TestShell_RunNode_WithBeforeNode were calling
app.Before(c) after pre-setting shell.Config with a txdb-wrapped config
via configtest.NewGeneralConfig. app.Before unconditionally ran
opts.New() → CoreDefaults() which set Database.DriverName back to pgx
and assigned the fresh config to s.Config, silently overwriting the
test's txdb config. BeforeNode → pg.NewLockedDB then opened real pgx
connections against the shared chainlink_test database, persisting
keystore state and leaking it to other tests (flaking Test_CSAKeyStore_E2E)
and eating slots from the server-wide max_connections budget (causing
the mass 25-minute chan-receive timeouts in CORE-2388).

- core/cmd/app.go: guard opts.New() so a pre-set s.Config is preserved.
  Defense-in-depth against any future test hitting the same trap.
- core/cmd/shell_local.go: nil-guard CloseLogger in afterNode (parity
  with app.After; tests calling BeforeNode directly never set it).
- core/cmd/shell_local_test.go: remove the unnecessary app.Before(c)
  calls. Drop the incorrect_password subtests — they were load-bearing
  on the leak; the underlying keystore invariant is covered by
  TestMasterKeystore_Unlock_Save in core/services/keystore.

Refs: CORE-2388, CORE-2370
Supersedes: #21504

* test(cmd): restore app.Before call in BeforeNode tests

Restores the app.Before(c) calls that were removed when fixing the
txdb config overwrite bug. With the nil-check guard in app.Before,
pre-set test configs are now preserved, making it safe to call
app.Before in tests again. This keeps the test flow consistent with
the real production CLI execution order.

* test(cmd): restore incorrect password test using heavyweight DB

Switch TestShell_BeforeNode to heavyweight.FullTestDBV2 so beforeNode's
own DB connection can see pre-populated key material. This restores the
incorrect password test case that was removed when the app.Before config
fix made txdb connections isolated.

* fix(cmd): use DB-backed keystore to seed encrypted key material

cltest.NewKeyStore uses keystore.NewInMemory which stores keys in a
struct field, never writing to encrypted_key_rings. BeforeNode opens
its own DB connection and creates a separate keystore.New instance,
so the seeded keys were invisible and any password was accepted.

Switch to keystore.New (DB-backed) so encrypted keys persist to the
table and the incorrect password test actually fails decryption.
@erikburt erikburt closed this Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant