feat(secrets): configurable credential chain and writer by chet · Pull Request #2873 · NVIDIA/infra-controller

chet · 2026-06-25T02:10:00Z

Turning on the Postgres secrets backend used to be all-or-nothing: setting
[secrets] dropped vault from the read chain and ran a mandatory import. Now
the backend stores are operator config, so a site can roll Postgres in
gradually -- Postgres in front of vault, vault as a safety net, writes
flipped over when ready, vault dropped once fully migrated.

Two new [secrets] fields drive it:

stores -- the backend-store read order (postgres, vault), first match
wins. The env/file local overrides are always tried first (when their
[credentials.*] section is enabled); this list just orders the stores
behind them. Defaults to ["vault"], today's behavior. The rollout is
editing this list: ["vault"] -> ["postgres", "vault"] -> ["postgres"].
writer -- vault (default) or postgres. Flip it to send new writes to the
journal.

import_from stays fully independent -- importing from vault is orthogonal to
where reads and writes flow, and it is now gated visibly at the call site. A
small pure assemble_chain prepends the local overrides, orders the stores
from config, and picks the writer, so the wiring is unit-testable without a
database; run.rs just hands it the constructed backends. An empty or
duplicate stores list fails the boot, as does putting vault ahead of
postgres; a postgres writer with no postgres store is allowed as a
shadow-write and only warns.

Tests added!

This supports #2811

Signed-off-by: Chet Nichols III chetn@nvidia.com

chet · 2026-06-25T02:10:21Z

@coderabbitai PTAL, thanks!

coderabbitai · 2026-06-25T02:10:25Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: fdc378de-2afc-472e-baec-6ffd3125f209

📥 Commits

Reviewing files that changed from the base of the PR and between adfae5e and a39d153.

📒 Files selected for processing (3)

crates/api-core/src/cfg/file.rs
crates/api-core/src/run.rs
crates/api-core/src/secrets/mod.rs

Summary by CodeRabbit

New Features
- Added configurable secret routing, letting credentials be read from ordered backends and written to a separately chosen destination.
- Expanded configuration options with clearer defaults for local and vault-based credential handling.
Bug Fixes
- Improved startup validation to catch invalid or duplicate backend settings earlier.
- Kept local credential sources prioritized ahead of configured secret backends for consistent access.
- Added clearer warnings when read and write destinations are configured in a risky order.

Walkthrough

Secrets configuration now supports ordered credential reads and a separate write backend. Runtime wiring validates backend lists, builds local override readers, selects the configured writer, and gates vault import on import_from.

Changes

Secrets routing, validation, and runtime wiring

Layer / File(s)	Summary
Secrets config schema `crates/api-core/src/cfg/file.rs`	`CarbideConfig.secrets` and `SecretsConfig` now describe ordered read backends, a separate write backend, `CredentialBackend`, and TOML defaults and parsing cases for those fields.
Backend validation `crates/api-core/src/secrets/mod.rs`	`validate_backends` rejects empty or duplicated backend lists, with tests covering distinct, empty, and duplicated inputs.
Runtime wiring `crates/api-core/src/run.rs`	`run()` builds local override readers, assembles the read chain from configured backends or `CARBIDE_CREDENTIAL_STORE`, selects the writer backend, and gates vault import on `import_from`; `import_vault_secrets_once` docs match the caller-controlled gate.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: configurable secrets credential chaining and write routing.
Description check	✅ Passed	The description is directly about the secrets backend rollout and matches the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai · 2026-06-25T02:10:36Z

@chet I’ll review the changes now. Thanks.

ᕱᕱ ᕱ(•ᴗ•)ᕤ

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

crates/api-core/src/run.rs (1)

274-298: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Validate stores before running the vault import.

assemble_chain is where empty, duplicate, or misordered stores fail. Running import_vault_secrets_once first can persist imported secrets and mark the import complete before startup rejects the invalid chain.

Proposed fix

-        // A one-time bulk import from vault, only when the operator asks for
-        // one. Independent of stores/writer.
-        if secrets_config.import_from == Some(ImportSource::Vault) {
-            import_vault_secrets_once(
-                &db_pool,
-                secrets_config,
-                &routing,
-                kms.as_ref(),
-                &vault_client,
-            )
-            .await?;
-        }
-
         let backends = crate::secrets::CredentialBackends {
             local_overrides,
             postgres_reader: pg_mgr.clone(),
             vault_reader: vault_client.clone(),
             postgres_writer: pg_mgr.clone(),
             vault_writer: vault_client.clone(),
         };
         let (chain, writer) = crate::secrets::assemble_chain(
             &secrets_config.stores,
             secrets_config.writer,
             backends,
         )?;
+
+        // A one-time bulk import from vault, only when the operator asks for
+        // one. Independent of stores/writer.
+        if secrets_config.import_from == Some(ImportSource::Vault) {
+            import_vault_secrets_once(
+                &db_pool,
+                secrets_config,
+                &routing,
+                kms.as_ref(),
+                &vault_client,
+            )
+            .await?;
+        }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/api-core/src/run.rs` around lines 274 - 298, Validate the secrets
chain before any vault import in run::...: move the assemble_chain call for
secrets_config.stores and secrets_config.writer ahead of
import_vault_secrets_once so empty, duplicate, or misordered stores fail fast
before any side effects. Keep the existing CredentialBackends setup, but use
assemble_chain as the gate before running the one-time Vault import and only
proceed to import_vault_secrets_once after the chain is successfully built.

🧹 Nitpick comments (2)

crates/api-core/src/secrets/assembly.rs (2)

299-320: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Validate through the public assembly entry point.

These tests call private validate_chain directly. If assemble_chain ever stops invoking it, invalid store lists would still pass this suite while startup accepts a broken config. Drive the cases through assemble_chain and keep them table-driven.

Suggested replacement

-    #[test]
-    fn validate_accepts_valid_store_lists() {
-        assert!(validate_chain(&[StoreKind::Vault]).is_ok());
-        assert!(validate_chain(&[StoreKind::Postgres]).is_ok());
-        assert!(validate_chain(&[StoreKind::Postgres, StoreKind::Vault]).is_ok());
-    }
-
-    #[test]
-    fn validate_rejects_empty_stores() {
-        assert!(validate_chain(&[]).is_err());
-    }
-
-    #[test]
-    fn validate_rejects_duplicate_stores() {
-        assert!(validate_chain(&[StoreKind::Vault, StoreKind::Vault]).is_err());
-    }
-
-    // vault ahead of postgres would serve stale vault over fresh postgres.
-    #[test]
-    fn validate_rejects_vault_before_postgres() {
-        assert!(validate_chain(&[StoreKind::Vault, StoreKind::Postgres]).is_err());
-    }
+    #[tokio::test]
+    async fn assemble_validates_store_lists() {
+        for (name, stores, expect_ok) in [
+            ("vault only", vec![StoreKind::Vault], true),
+            ("postgres only", vec![StoreKind::Postgres], true),
+            (
+                "postgres then vault",
+                vec![StoreKind::Postgres, StoreKind::Vault],
+                true,
+            ),
+            ("empty", vec![], false),
+            ("duplicate", vec![StoreKind::Vault, StoreKind::Vault], false),
+            (
+                "vault before postgres",
+                vec![StoreKind::Vault, StoreKind::Postgres],
+                false,
+            ),
+        ] {
+            let result = assemble_chain(
+                &stores,
+                WriterKind::Vault,
+                backends(
+                    vec![],
+                    store(&[]).await,
+                    store(&[]).await,
+                    writer_handle(),
+                    writer_handle(),
+                ),
+            );
+
+            if expect_ok {
+                assert!(result.is_ok(), "{name}");
+            } else {
+                assert!(result.is_err(), "{name}");
+            }
+        }
+    }

As per coding guidelines, “Verification should exercise the behavior that changed” and Rust tests should use table-driven style. As per path instructions, prefer findings about behavior and missing tests over style-only comments.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/api-core/src/secrets/assembly.rs` around lines 299 - 320, The tests
currently exercise the private validate_chain helper directly, so they won’t
catch regressions if assemble_chain stops calling it. Update these cases to go
through the public assemble_chain entry point instead, using table-driven
subtests for the valid and invalid store sequences while keeping the same
coverage for empty, duplicate, and vault-before-postgres inputs.

Sources: Coding guidelines, Path instructions

266-297: 🗄️ Data Integrity & Integration | 🔵 Trivial | ⚡ Quick win

Add the shadow-write writer case.

WriterKind::Postgres is intentionally allowed when stores == [Vault], but this test only covers Postgres writer when Postgres is also in the read chain. Add that rollout case so future validation changes cannot accidentally reject the migration path.

Suggested test addition

         let (_, writer) = assemble_chain(
             &[StoreKind::Postgres, StoreKind::Vault],
             WriterKind::Vault,
             backends(
@@
         )
         .expect("assemble");
         assert!(Arc::ptr_eq(&writer, &vault_w), "vault writer expected");
+
+        let (_, writer) = assemble_chain(
+            &[StoreKind::Vault],
+            WriterKind::Postgres,
+            backends(
+                vec![],
+                store(&[]).await,
+                store(&[]).await,
+                pg_w.clone(),
+                vault_w.clone(),
+            ),
+        )
+        .expect("assemble shadow write");
+        assert!(
+            Arc::ptr_eq(&writer, &pg_w),
+            "postgres shadow writer expected"
+        );
     }

As per coding guidelines, “Verification should exercise the behavior that changed.”

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/api-core/src/secrets/assembly.rs` around lines 266 - 297, Extend
writer_routes_to_the_selected_backend to cover the shadow-write rollout path by
adding a case where assemble_chain is called with stores containing only Vault
and WriterKind::Postgres, then assert it still selects the Postgres writer. Use
the existing writer_handle, backends, and assemble_chain setup so the test
exercises the migration behavior that is intentionally allowed and guards
against future validation changes.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/api-core/src/cfg/file.rs`:
- Around line 895-911: The secret-store config in the `stores`/`writer` fields
allows a Postgres-only reader setup to silently keep the default Vault writer,
which is invalid for this migration path. Update the validation/defaulting logic
around `default_secret_stores`, `StoreKind`, and `WriterKind` so that when
`stores` excludes Vault, `writer` must be `postgres` (or the config is
rejected); likewise reject `writer = Vault` whenever Vault is not present in
`stores`. Keep the runtime assembly consistent with the schema rules by
enforcing this check where the config is parsed or validated.

In `@crates/api-core/src/run.rs`:
- Around line 263-271: The warning in run() only handles the Postgres writer
case, so it misses the same read-after-write hazard when any writer is not
present in the read chain, including the default writer path. Update the check
in secrets handling to compare secrets_config.writer against
secrets_config.stores generically, and emit the warning whenever the selected
writer (for example via WriterKind in run.rs) is absent from stores, not just
when it equals Postgres.

---

Outside diff comments:
In `@crates/api-core/src/run.rs`:
- Around line 274-298: Validate the secrets chain before any vault import in
run::...: move the assemble_chain call for secrets_config.stores and
secrets_config.writer ahead of import_vault_secrets_once so empty, duplicate, or
misordered stores fail fast before any side effects. Keep the existing
CredentialBackends setup, but use assemble_chain as the gate before running the
one-time Vault import and only proceed to import_vault_secrets_once after the
chain is successfully built.

---

Nitpick comments:
In `@crates/api-core/src/secrets/assembly.rs`:
- Around line 299-320: The tests currently exercise the private validate_chain
helper directly, so they won’t catch regressions if assemble_chain stops calling
it. Update these cases to go through the public assemble_chain entry point
instead, using table-driven subtests for the valid and invalid store sequences
while keeping the same coverage for empty, duplicate, and vault-before-postgres
inputs.
- Around line 266-297: Extend writer_routes_to_the_selected_backend to cover the
shadow-write rollout path by adding a case where assemble_chain is called with
stores containing only Vault and WriterKind::Postgres, then assert it still
selects the Postgres writer. Use the existing writer_handle, backends, and
assemble_chain setup so the test exercises the migration behavior that is
intentionally allowed and guards against future validation changes.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5e7fa5e4-c641-490f-8bba-a2b70e4e3743

📥 Commits

Reviewing files that changed from the base of the PR and between e7781e0 and 97972dd.

📒 Files selected for processing (4)

crates/api-core/src/cfg/file.rs
crates/api-core/src/run.rs
crates/api-core/src/secrets/assembly.rs
crates/api-core/src/secrets/mod.rs

chet · 2026-06-25T02:17:03Z

@coderabbitai review

coderabbitai · 2026-06-25T02:17:08Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

chet · 2026-06-25T02:26:42Z

@coderabbitai full_review please, thanks!

coderabbitai · 2026-06-25T02:26:55Z

@chet Re-running a full review of PR #2873 now. Thank you.

ᕱ(ᵔᴥᵔ)ᕤ

✅ Action performed

Full review finished.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

crates/api-core/src/run.rs (1)

284-308: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Validate the credential chain before running the one-time import.

import_vault_secrets_once can persist imported secrets and mark the import complete before assemble_chain rejects an invalid stores list. Move chain assembly above the import gate so bad routing config fails before irreversible import side effects.

Proposed fix

-        // A one-time bulk import from vault, only when the operator asks for
-        // one. Independent of stores/writer.
-        if secrets_config.import_from == Some(ImportSource::Vault) {
-            import_vault_secrets_once(
-                &db_pool,
-                secrets_config,
-                &routing,
-                kms.as_ref(),
-                &vault_client,
-            )
-            .await?;
-        }
-
         let backends = crate::secrets::CredentialBackends {
             local_overrides,
             postgres_reader: pg_mgr.clone(),
             vault_reader: vault_client.clone(),
             postgres_writer: pg_mgr.clone(),
             vault_writer: vault_client.clone(),
         };
         let (chain, writer) = crate::secrets::assemble_chain(
             &secrets_config.stores,
             secrets_config.writer,
             backends,
         )?;
+
+        // A one-time bulk import from vault, only when the operator asks for
+        // one. Independent of stores/writer.
+        if secrets_config.import_from == Some(ImportSource::Vault) {
+            import_vault_secrets_once(
+                &db_pool,
+                secrets_config,
+                &routing,
+                kms.as_ref(),
+                &vault_client,
+            )
+            .await?;
+        }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/api-core/src/run.rs` around lines 284 - 308, Move the credential chain
validation in run.rs so crate::secrets::assemble_chain is called before the
Vault import gate, using secrets_config.stores, secrets_config.writer, and the
CredentialBackends setup to fail fast on invalid routing. Only call
import_vault_secrets_once after assemble_chain succeeds, so the one-time import
in the vault path cannot commit side effects before invalid configuration is
rejected.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@crates/api-core/src/run.rs`:
- Around line 284-308: Move the credential chain validation in run.rs so
crate::secrets::assemble_chain is called before the Vault import gate, using
secrets_config.stores, secrets_config.writer, and the CredentialBackends setup
to fail fast on invalid routing. Only call import_vault_secrets_once after
assemble_chain succeeds, so the one-time import in the vault path cannot commit
side effects before invalid configuration is rejected.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7b20f56b-f263-4c7d-98a5-dc9686785961

📥 Commits

Reviewing files that changed from the base of the PR and between 97972dd and 28e831b.

📒 Files selected for processing (4)

crates/api-core/src/cfg/file.rs
crates/api-core/src/run.rs
crates/api-core/src/secrets/assembly.rs
crates/api-core/src/secrets/mod.rs

🚧 Files skipped from review as they are similar to previous changes (3)

crates/api-core/src/secrets/mod.rs
crates/api-core/src/cfg/file.rs
crates/api-core/src/secrets/assembly.rs

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

crates/api-core/src/cfg/file.rs (1)

4304-4333: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Use a small case table for repeated parse failures.

These three cases run the same parse-and-expect-error operation. A labeled table keeps failures precise and makes new invalid values one-line additions.

Refactor sketch

-        let bad_reader = format!("[secrets]\nstores = [\"postgrez\"]\n{base_kms}");
-        assert!(toml::from_str::<Wrapper>(&bad_reader).is_err());
-
-        // env/file are local overrides, not store readers -- they belong in
-        // [credentials.*], not [secrets].stores, so they're rejected here.
-        let env_as_reader = format!("[secrets]\nstores = [\"env\"]\n{base_kms}");
-        assert!(toml::from_str::<Wrapper>(&env_as_reader).is_err());
-
-        let bad_writer = format!("[secrets]\nwriter = \"valt\"\n{base_kms}");
-        assert!(toml::from_str::<Wrapper>(&bad_writer).is_err());
+        for (name, toml_str) in [
+            (
+                "unknown reader",
+                format!("[secrets]\nstores = [\"postgrez\"]\n{base_kms}"),
+            ),
+            (
+                "local override as store reader",
+                format!("[secrets]\nstores = [\"env\"]\n{base_kms}"),
+            ),
+            (
+                "unknown writer",
+                format!("[secrets]\nwriter = \"valt\"\n{base_kms}"),
+            ),
+        ] {
+            assert!(
+                toml::from_str::<Wrapper>(&toml_str).is_err(),
+                "{name} should fail parsing",
+            );
+        }

As per coding guidelines, **/*.rs: “Use table-driven test style when writing tests in Rust.”

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/api-core/src/cfg/file.rs` around lines 4304 - 4333, The test
`secrets_config_rejects_unknown_reader_or_writer_kind` repeats the same
parse-and-assert-error flow for multiple invalid inputs; refactor it into a
small table-driven test so each case is labeled and easy to extend. Keep the
existing `Wrapper`, `SecretsConfig`, and `base_kms` setup, then iterate over the
bad reader/writer values with a named case table and assert
`toml::from_str::<Wrapper>(...)` fails for each entry.

Source: Coding guidelines

crates/api-core/src/secrets/assembly.rs (1)

299-320: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Convert the validation cases to a table-driven test.

These four tests all exercise validate_chain with different inputs/expected outcomes, so a single case table would better match the repository’s Rust test style and make future store-order cases cheaper to add. As per coding guidelines, “Use table-driven test style when writing tests in Rust.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/api-core/src/secrets/assembly.rs` around lines 299 - 320, The
`validate_chain` tests are written as four separate cases even though they only
vary by input and expected result; convert them into a single table-driven test
in the `validate_chain` test block so it matches the Rust test style used here.
Keep the existing coverage for valid, empty, duplicate, and
vault-before-postgres inputs, but move the assertions into one loop over cases
with clear names so future store-order scenarios are easy to add.
Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/api-core/src/cfg/file.rs`:
- Around line 4251-4253: The migrated-test description in the comment is stale
because it says the writer is defaulted, but the fixture uses an explicit
postgres writer. Update the explanatory comment near the reader/writer enum
parsing logic in the file configuration test setup to match the actual migrated
fixture behavior, using the existing reader chain and writer parse context so
the wording accurately reflects that the writer is set to postgres rather than
defaulted.
- Around line 905-909: Clarify the shadow-write configuration in the `file.rs`
credential write-mode docs: the current wording around `stores` implies a
`postgres` writer plus `vault` in `stores` always means “keep reading vault,”
but that also matches a `["postgres", "vault"]` read preference. Update the
comment near the credential write settings to explicitly name the intended
shadow-write shape and distinguish it from the read-ordering behavior in
`stores`, using the relevant config symbols in this section.

---

Nitpick comments:
In `@crates/api-core/src/cfg/file.rs`:
- Around line 4304-4333: The test
`secrets_config_rejects_unknown_reader_or_writer_kind` repeats the same
parse-and-assert-error flow for multiple invalid inputs; refactor it into a
small table-driven test so each case is labeled and easy to extend. Keep the
existing `Wrapper`, `SecretsConfig`, and `base_kms` setup, then iterate over the
bad reader/writer values with a named case table and assert
`toml::from_str::<Wrapper>(...)` fails for each entry.

In `@crates/api-core/src/secrets/assembly.rs`:
- Around line 299-320: The `validate_chain` tests are written as four separate
cases even though they only vary by input and expected result; convert them into
a single table-driven test in the `validate_chain` test block so it matches the
Rust test style used here. Keep the existing coverage for valid, empty,
duplicate, and vault-before-postgres inputs, but move the assertions into one
loop over cases with clear names so future store-order scenarios are easy to
add.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9b43000e-7d76-4f19-8710-db442ddcda02

📥 Commits

Reviewing files that changed from the base of the PR and between e7781e0 and 28e831b.

📒 Files selected for processing (4)

crates/api-core/src/cfg/file.rs
crates/api-core/src/run.rs
crates/api-core/src/secrets/assembly.rs
crates/api-core/src/secrets/mod.rs

chet · 2026-06-25T02:42:07Z

@coderabbitai full_review please, thanks!

github-actions · 2026-06-25T03:05:53Z

🌿 Preview your docs: https://nvidia-preview-pull-request-2873.docs.buildwithfern.com/infra-controller

github-actions · 2026-06-25T03:36:54Z

🔍 Container Scan Summary

Service	Total	Critical	High	Medium	Low	Other
boot-artifacts-aarch64	3	0	0	3	0	0
boot-artifacts-x86_64	3	0	0	3	0	0
forge-admin-cli-x86_64	283	6	24	98	7	148
machine-validation-runner	744	32	188	267	36	221
machine_validation	744	32	188	267	36	221
machine_validation-aarch64	744	32	188	267	36	221
nvmetal-carbide	744	32	188	267	36	221
TOTAL	3265	134	776	1172	151	1032

Per-CVE detail lives in the per-service grype-* artifacts (JSON + SARIF). Severity counts only — no CVE IDs published here.

chet · 2026-06-25T07:33:29Z

@coderabbitai full_review please, thanks!

coderabbitai · 2026-06-25T07:33:43Z

@chet Re-running a full review of PR #2873 now. Thank you.

ᕱ(ᵔᴥᵔ)ᕤ

✅ Action performed

Full review finished.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

crates/api-core/src/run.rs (1)
239-295: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Validate stores before any import side effects.

validate_stores currently runs after import_vault_secrets_once. With an invalid stores list plus import_from = "vault", startup can bulk-copy secrets and mark the import complete before failing config validation.
Proposed fix
     ) = if let Some(ref secrets_config) = carbide_config.secrets {
+        crate::secrets::validate_stores(&secrets_config.stores)?;
+
         let routing = SecretRouting::from_config(&secrets_config.routing)
             .map_err(eyre::Report::new)
             .wrap_err("secrets routing configuration")?;
@@
-        crate::secrets::validate_stores(&secrets_config.stores)?;
         let store_readers =
As per path instructions, crates/api*/** changes should be reviewed for “transaction safety,” “schema/API compatibility,” and safe runtime integration.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/api-core/src/run.rs` around lines 239 - 295, Move the
`crate::secrets::validate_stores(&secrets_config.stores)?` check to run before
any import side effects in `run`, so invalid `stores` abort startup before
`import_vault_secrets_once` can copy data. Keep the existing
`SecretRouting::from_config`, `build_kms_backend`, and
`PostgresCredentialManager::new` setup, but validate the secrets store
configuration immediately after `secrets_config` is available and before the
`import_from == Some(ImportSource::Vault)` branch.
Source: Path instructions

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/api-core/src/secrets/mod.rs`:
- Around line 77-80: The `stores` validation in
`crates/api-core/src/secrets/mod.rs` only checks for duplicates, but it must
also enforce the rollout order contract. Update the validation logic that builds
`unique` from `stores` so it rejects any configuration where `Vault` appears
before `Postgres` (for example `[Vault, Postgres]`), while still allowing
`["vault"]`, `["postgres", "vault"]`, and `["postgres"]`. Keep the
duplicate-name check in place and add an explicit ordering check in the same
validation path so the first-match-wins chain cannot shadow Postgres.

---

Outside diff comments:
In `@crates/api-core/src/run.rs`:
- Around line 239-295: Move the
`crate::secrets::validate_stores(&secrets_config.stores)?` check to run before
any import side effects in `run`, so invalid `stores` abort startup before
`import_vault_secrets_once` can copy data. Keep the existing
`SecretRouting::from_config`, `build_kms_backend`, and
`PostgresCredentialManager::new` setup, but validate the secrets store
configuration immediately after `secrets_config` is available and before the
`import_from == Some(ImportSource::Vault)` branch.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 08f30044-905a-4520-adbb-7cf28954e6d8

📥 Commits

Reviewing files that changed from the base of the PR and between 89122bf and 92ee3ea.

📒 Files selected for processing (3)

crates/api-core/src/cfg/file.rs
crates/api-core/src/run.rs
crates/api-core/src/secrets/mod.rs

coderabbitai

♻️ Duplicate comments (1)

crates/api-core/src/secrets/mod.rs (1)

67-80: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Enforce the rollout order in validate_stores.

["vault", "postgres"] is not just another distinct ordering. The read chain is first-match-wins, so Vault shadows Postgres, and the generic writer-presence warning in run.rs stays silent because Postgres is still present in stores. That reintroduces stale reads during the migration path the PR summary says should reject Vault-before-Postgres. Please fail this ordering here and flip the test to expect an error; the SecretsConfig docs/tests in crates/api-core/src/cfg/file.rs should then be aligned to the same contract.

Suggested fix

 pub fn validate_stores(stores: &[crate::cfg::file::CredentialBackend]) -> eyre::Result<()> {
     if stores.is_empty() {
         return Err(eyre::eyre!(
             "[secrets].stores is empty; at least one store (postgres or vault) is required"
         ));
     }
     let unique: std::collections::HashSet<_> = stores.iter().collect();
     if unique.len() != stores.len() {
         return Err(eyre::eyre!("[secrets].stores names a store more than once"));
     }
+    let vault_pos = stores
+        .iter()
+        .position(|store| *store == crate::cfg::file::CredentialBackend::Vault);
+    let postgres_pos = stores
+        .iter()
+        .position(|store| *store == crate::cfg::file::CredentialBackend::Postgres);
+    if matches!((vault_pos, postgres_pos), (Some(vault_pos), Some(postgres_pos)) if vault_pos < postgres_pos)
+    {
+        return Err(eyre::eyre!(
+            "[secrets].stores cannot place vault ahead of postgres; use [\"postgres\", \"vault\"] during migration"
+        ));
+    }
     Ok(())
 }

-    fn accepts_any_order_of_distinct_stores() {
+    fn accepts_only_supported_distinct_store_orders() {
         assert!(validate_stores(&[CredentialBackend::Vault]).is_ok());
         assert!(validate_stores(&[CredentialBackend::Postgres]).is_ok());
         assert!(validate_stores(&[CredentialBackend::Postgres, CredentialBackend::Vault]).is_ok());
-        // Order is the operator's choice -- vault ahead of postgres is fine.
-        assert!(validate_stores(&[CredentialBackend::Vault, CredentialBackend::Postgres]).is_ok());
+        assert!(validate_stores(&[CredentialBackend::Vault, CredentialBackend::Postgres]).is_err());
     }

Also applies to: 478-485

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/api-core/src/secrets/mod.rs` around lines 67 - 80, `validate_stores`
currently only rejects empty and duplicate entries, but it must also reject the
unsupported rollout ordering where `Vault` appears before `Postgres`. Update
`validate_stores` in `crates/api-core/src/secrets/mod.rs` to detect that
specific `CredentialBackend` order and return an error instead of allowing
first-match-wins shadowing; then adjust the related `validate_stores` test
expectation to assert an error for `["vault", "postgres"]`. Also align the
`SecretsConfig` docs/tests in `crates/api-core/src/cfg/file.rs` with the same
ordering contract so the behavior and documentation match.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@crates/api-core/src/secrets/mod.rs`:
- Around line 67-80: `validate_stores` currently only rejects empty and
duplicate entries, but it must also reject the unsupported rollout ordering
where `Vault` appears before `Postgres`. Update `validate_stores` in
`crates/api-core/src/secrets/mod.rs` to detect that specific `CredentialBackend`
order and return an error instead of allowing first-match-wins shadowing; then
adjust the related `validate_stores` test expectation to assert an error for
`["vault", "postgres"]`. Also align the `SecretsConfig` docs/tests in
`crates/api-core/src/cfg/file.rs` with the same ordering contract so the
behavior and documentation match.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7a838ee8-a905-430f-a063-fce2d2dde25a

📥 Commits

Reviewing files that changed from the base of the PR and between 852c517 and 92ee3ea.

📒 Files selected for processing (3)

crates/api-core/src/cfg/file.rs
crates/api-core/src/run.rs
crates/api-core/src/secrets/mod.rs

chet · 2026-06-25T08:03:28Z

@coderabbitai full_review please, thanks!

coderabbitai · 2026-06-25T08:03:40Z

@chet Re-running a full review of PR #2873 now. Thank you.

ᕱ(ᵔᴥᵔ)ᕤ

✅ Action performed

Full review finished.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

crates/api-core/src/run.rs (1)

287-303: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Validate stores before importing or initializing side effects.

validate_stores() runs after import_vault_secrets_once(). A config like stores = [] or ["vault", "vault"] will still import secrets and can mark the import complete before startup aborts, so fixing the config later skips the import even though the original boot was invalid. Move the validation to the start of the [secrets] branch, before KMS setup and the import path.

Proposed fix

     ) = if let Some(ref secrets_config) = carbide_config.secrets {
+        crate::secrets::validate_stores(&secrets_config.stores)?;
+
         let routing = SecretRouting::from_config(&secrets_config.routing)
             .map_err(eyre::Report::new)
             .wrap_err("secrets routing configuration")?;
         let kms = build_kms_backend(
             secrets_config,
@@
-        crate::secrets::validate_stores(&secrets_config.stores)?;
         let store_readers =
             secrets_config
                 .stores
                 .iter()

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/api-core/src/run.rs` around lines 287 - 303, The `[secrets]` startup
path in `run.rs` validates `secrets_config.stores` too late, after
`import_vault_secrets_once()` and other side effects have already run. Move
`crate::secrets::validate_stores(&secrets_config.stores)?` to the beginning of
the secrets branch, before any KMS setup, vault client use, or import logic, so
invalid configurations fail fast without completing the vault import. Use the
existing `import_vault_secrets_once`, `validate_stores`, and
`ImportSource::Vault` flow as the main reference points when rearranging the
control flow.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@crates/api-core/src/run.rs`:
- Around line 287-303: The `[secrets]` startup path in `run.rs` validates
`secrets_config.stores` too late, after `import_vault_secrets_once()` and other
side effects have already run. Move
`crate::secrets::validate_stores(&secrets_config.stores)?` to the beginning of
the secrets branch, before any KMS setup, vault client use, or import logic, so
invalid configurations fail fast without completing the vault import. Use the
existing `import_vault_secrets_once`, `validate_stores`, and
`ImportSource::Vault` flow as the main reference points when rearranging the
control flow.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 0e8f2280-b053-44e9-82f9-7010f1e2ac36

📥 Commits

Reviewing files that changed from the base of the PR and between 852c517 and bc45948.

📒 Files selected for processing (3)

crates/api-core/src/cfg/file.rs
crates/api-core/src/run.rs
crates/api-core/src/secrets/mod.rs

chet · 2026-06-25T18:41:52Z

@coderabbitai full_review please, thanks!

coderabbitai · 2026-06-25T18:42:08Z

@chet Re-running a full review of PR #2873 now. Thank you.

ᕱ(ᵔᴥᵔ)ᕤ

✅ Action performed

Full review finished.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

crates/api-core/src/run.rs (1)
287-303: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Validate backends before the one-time Vault import, not after.

crate::secrets::validate_backends(&secrets_config.backends)? is a pure, inexpensive config check, yet it is invoked at Line 303 — after import_vault_secrets_once (Lines 289–298) has already enumerated Vault and written the permanent import-complete marker. An empty or duplicated backends list therefore aborts the boot only after a side-effectful, irreversible import has run; the operator fixes the config, reboots, and the import is silently skipped on the strength of a marker written during a boot that was destined to fail.

Configuration validation should gate all side effects. Hoisting the check to the top of the [secrets] branch (alongside routing/KMS validation) restores fail-fast behavior and keeps the marker honest.
♻️ Suggested reordering
     ) = if let Some(ref secrets_config) = carbide_config.secrets {
+        // Reject a nonsensical backends list before any side effects (KMS
+        // setup, the one-time vault import) run.
+        crate::secrets::validate_backends(&secrets_config.backends)?;
+
         let routing = SecretRouting::from_config(&secrets_config.routing)
             .map_err(eyre::Report::new)
             .wrap_err("secrets routing configuration")?;
         // Read order: the always-first local overrides, then the configured
         // backends in the operator's chosen order (first match wins). The write
         // target is the single backend `writer` names.
-        crate::secrets::validate_backends(&secrets_config.backends)?;
         let backend_readers =
As per path instructions, crates/api*/** changes should be reviewed for "request validation" and "transaction safety."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/api-core/src/run.rs` around lines 287 - 303, Hoist the pure config
validation for secrets backends so it runs before any side effects in `run.rs`:
call `crate::secrets::validate_backends(&secrets_config.backends)?` at the start
of the `[secrets]` handling, alongside the existing routing/KMS checks, before
`import_vault_secrets_once` is invoked. Keep the one-time Vault import guarded
by the now-validated config so `import_vault_secrets_once` only runs after
`validate_backends` succeeds and the permanent import marker is never written
during a boot that should have failed fast.
Source: Path instructions

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@crates/api-core/src/run.rs`:
- Around line 287-303: Hoist the pure config validation for secrets backends so
it runs before any side effects in `run.rs`: call
`crate::secrets::validate_backends(&secrets_config.backends)?` at the start of
the `[secrets]` handling, alongside the existing routing/KMS checks, before
`import_vault_secrets_once` is invoked. Keep the one-time Vault import guarded
by the now-validated config so `import_vault_secrets_once` only runs after
`validate_backends` succeeds and the permanent import marker is never written
during a boot that should have failed fast.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9d1b6c43-8536-407e-b045-58b13112964d

📥 Commits

Reviewing files that changed from the base of the PR and between 917c94d and f764516.

📒 Files selected for processing (3)

crates/api-core/src/cfg/file.rs
crates/api-core/src/run.rs
crates/api-core/src/secrets/mod.rs

Turning on the Postgres secrets backend used to be all-or-nothing: setting [secrets] dropped vault from the read chain and ran a mandatory import. Now the backends and the write target are operator config, so a site can read from postgres, vault, or both -- in whatever order it wants -- and choose where new writes land. Two new [secrets] fields drive it, both naming a CredentialBackend (postgres or vault): - backends -- the read order, first match wins. The env/file local overrides are always tried first (when their [credentials.*] section is enabled); this list just orders the backends behind them. Defaults to ["vault"], today's behavior. Order is the operator's choice; for example, to roll Postgres in gradually: ["vault"] -> ["postgres", "vault"] -> ["postgres"]. - writer -- vault (default) or postgres. Flip it to send new writes to the journal. import_from stays fully independent -- importing from vault is orthogonal to where reads and writes flow, and it is now gated visibly at the call site. An empty or duplicate backends list fails the boot. Backend order is the operator's choice; when the writer's backend isn't the highest-priority one, a write can be shadowed on read for any path a higher-priority backend also holds, so that warns -- a deliberate shadow-write stays valid, it is not rejected. Tests added! This supports NVIDIA#2811 Signed-off-by: Chet Nichols III <chetn@nvidia.com>

chet · 2026-06-25T19:39:55Z

@coderabbitai full_review please, thanks!

Also addressed the outside-diff-range note: validate_backends now runs at the very top of the [secrets] branch, before KMS setup and the one-time vault import, so an invalid backends list fails the boot cleanly instead of after a partial, marker-writing import. Good catch.

coderabbitai · 2026-06-25T19:40:18Z

@chet Good improvement — moving validate_backends to the very start of the [secrets] path is the right failure mode, since it prevents boot-time side effects before configuration validity is established.

I’m kicking off a full review now.

ᕱ(ᵔᴥᵔ)ᕤ

✏️ Learnings added

Learnt from: chet
URL: https://github.com/NVIDIA/infra-controller/pull/2873

Timestamp: 2026-06-25T19:40:17.212Z
Learning: In `crates/api-core/src/run.rs`, configuration validation for `[secrets]` via `crates/api-core/src/secrets/mod.rs::validate_backends` should run at the very start of the `[secrets]` branch, before KMS setup and before the one-time Vault import, so invalid backend lists fail cleanly without partial import side effects such as marker writes.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

✅ Action performed

Full review finished.

chet requested a review from a team as a code owner June 25, 2026 02:10

coderabbitai Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread crates/api-core/src/cfg/file.rs Outdated

Comment thread crates/api-core/src/run.rs Outdated

chet force-pushed the gh-issue-2811 branch from 97972dd to 28e831b Compare June 25, 2026 02:25

coderabbitai Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread crates/api-core/src/cfg/file.rs Outdated

Comment thread crates/api-core/src/cfg/file.rs Outdated

chet force-pushed the gh-issue-2811 branch from 28e831b to 535da5c Compare June 25, 2026 02:41

chet force-pushed the gh-issue-2811 branch from 535da5c to 89122bf Compare June 25, 2026 03:04

rwthompsonii approved these changes Jun 25, 2026

View reviewed changes

chet force-pushed the gh-issue-2811 branch from 89122bf to 92ee3ea Compare June 25, 2026 07:33

coderabbitai Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread crates/api-core/src/secrets/mod.rs Outdated

coderabbitai Bot reviewed Jun 25, 2026

View reviewed changes

chet force-pushed the gh-issue-2811 branch from 92ee3ea to bc45948 Compare June 25, 2026 08:03

coderabbitai Bot reviewed Jun 25, 2026

View reviewed changes

chet force-pushed the gh-issue-2811 branch from bc45948 to f764516 Compare June 25, 2026 18:41

coderabbitai Bot reviewed Jun 25, 2026

View reviewed changes

chet force-pushed the gh-issue-2811 branch from f764516 to a39d153 Compare June 25, 2026 19:39

nick-vetro approved these changes Jun 25, 2026

View reviewed changes

chet merged commit 8e15e49 into NVIDIA:main Jun 25, 2026
54 checks passed

chet deleted the gh-issue-2811 branch June 25, 2026 20:35

Uh oh!

Conversation

chet commented Jun 25, 2026

Uh oh!

chet commented Jun 25, 2026

Uh oh!

coderabbitai Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

chet commented Jun 25, 2026

Uh oh!

coderabbitai Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chet commented Jun 25, 2026

Uh oh!

coderabbitai Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

chet commented Jun 25, 2026

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

github-actions Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 Container Scan Summary

Uh oh!

chet commented Jun 25, 2026

Uh oh!

coderabbitai Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

chet commented Jun 25, 2026

Uh oh!

coderabbitai Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

chet commented Jun 25, 2026

Uh oh!

coderabbitai Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

chet commented Jun 25, 2026

Uh oh!

coderabbitai Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading

github-actions Bot commented Jun 25, 2026 •

edited

Loading

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading