Skip to content

feat(zero-dpu): Allow flat VPC's to not belong to a network segment#2666

Merged
kensimon merged 13 commits into
NVIDIA:mainfrom
kensimon:unlink-vpc-and-segment
Jun 24, 2026
Merged

feat(zero-dpu): Allow flat VPC's to not belong to a network segment#2666
kensimon merged 13 commits into
NVIDIA:mainfrom
kensimon:unlink-vpc-and-segment

Conversation

@kensimon

@kensimon kensimon commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

This supports #2647.

In an environment without DPU's, we don't get dynamic network isolation, and the expectation is that there is a single, flat network address space, modeled as a single network segment. In this case, we still want to model VPC's for things like NVLink partitions, where tenants will have the understanding that the network is not isolated between VPC's.

To make this work, a network segment can no longer be the "link" that links an instance to a VPC. Before, an instance interface maps to a machine interface, which maps to a network segment, which maps to a VPC. But if multiple VPC's are all sharing a network segment, this link doesn't exist (namely, network_segments.vpc_id will be null), and so we need the instance address itself to store the VPC ID on it.

This change adds a vpc_id column to instance_addresses, so that we know the VPC of any given address without having to consult the network_segments table. It also changes the AllocateInstance API so that you can pass a auto_config object which allows specifying a VPC ID as part of the allocation request. (This replaces the auto: bool field, such that auto is implied if auto_config is set.)

Related issues

Type of Change

  • Add - New feature or capability
  • Change - Changes in existing functionality
  • Fix - Bug fixes
  • Remove - Removed features or deprecated functionality
  • Internal - Internal changes (refactoring, tests, docs, etc.)

Breaking Changes

  • This PR contains breaking changes

The --zero-dpu flag on nico-admin-cli instance allocate is now --flat-vpc-id=<vpc_id>. The old flag was just merged last week so it's doubtful anyone's relying on it yet (especially because the functionality isn't done yet.)

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Additional Notes

In an environment without DPU's, we don't get dynamic network isolation,
and the expectation is that there is a single, flat network address
space, modeled as a single network segment. In this case, we still want
to model VPC's for things like NVLink partitions, where tenants will
have the understanding that the network is not isolated between VPC's.

To make this work, a network segment can no longer be the "link" that links an
instance to a VPC. Before, an instance interface maps to a machine
interface, which maps to a network segment, which maps to a VPC. But if
multiple VPC's are all sharing a network segment, this link doesn't
exist (namely, network_segments.vpc_id will be null), and so we need the
instance address itself to store the VPC ID on it.

This change adds a vpc_id column to instance_addresses, so that we know
the VPC of any given address without having to consult the
network_segments table. It also changes the AllocateInstance API so that
you can pass a `auto_config` object which allows specifying a VPC ID as
part of the allocation request. (This replaces the `auto: bool` field,
such that auto is implied if `auto_config` is set.)
@kensimon kensimon requested a review from a team as a code owner June 16, 2026 20:57
@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

This PR replaces the zero_dpu: bool / auto: bool mechanism with a structured auto_config: Option<InstanceNetworkAutoConfig> carrying a vpc_id field. The change propagates from the protobuf wire contract through RPC conversions, internal model types, a new database migration adding instance_addresses.vpc_id, core allocation validation, and the admin CLI's --flat-vpc-id argument. The modernization adds explicit VPC ownership tracking for flat-VPC (zero-DPU) allocations, enabling stricter validation and more precise multi-VPC conflict detection.

Changes

Flat VPC / auto_config for zero-DPU instance allocation

Layer / File(s) Summary
Proto contract and RPC model conversions
crates/rpc/proto/forge.proto, crates/rpc/build.rs, crates/rpc/src/model/instance/config/network.rs, crates/rpc/src/model/instance/status/network.rs
Deprecates InstanceNetworkConfig.auto (field 2) and adds auto_config (field 3) carrying InstanceNetworkAutoConfig { vpc_id }. Adds InstanceInterfaceStatus.vpc_id (field 8). Derives serde::Serialize for the new proto type. Implements TryFrom<forge::InstanceNetworkAutoConfig> validating vpc_id presence. Rewrites RPC↔model conversions: inbound path derives auto from auto_config.is_some() and sets interface vpc_id per resolved segment; outbound path emits auto_config and derives the deprecated auto boolean. Updates test inputs and assertions throughout.
Internal model types
crates/api-model/src/instance/config/network.rs, crates/api-model/src/instance/status/network.rs, crates/api-model/src/instance_address.rs, crates/network/src/virtualization.rs
Replaces InstanceNetworkConfig.auto: bool with auto_config: Option<InstanceNetworkAutoConfig>. Adds InstanceInterfaceConfig.vpc_id: Option<VpcId> (derived from resolved/auto-allocated interfaces, cleared during update-diff normalization, preserved by copy_existing_resources). Extends InstanceInterfaceStatus and InstanceAddress with vpc_id. Updates into_external_view to strip interfaces when auto_config.is_some(). Updates for_segment_ids and for_vpc_prefix_id to accept and assign vpc_id values. Propagates vpc_id through all InstanceInterfaceStatus construction paths and test fixtures. Simplifies VpcVirtualizationType::Display to delegate to as_str().
Database schema, queries, and address allocation
crates/api-db/migrations/20260615120000_instance_addresses_vpc_id.sql, crates/api-db/src/vpc.rs, crates/api-db/src/instance_address.rs, crates/api-db/src/instance.rs, crates/api-db/src/instance_network_config.rs, crates/api-db/src/network_security_group.rs, crates/api-db/src/dpa_interface.rs
Adds vpc_id FK column to instance_addresses with backfill from network_segments, creates indexes on vpc_id and (instance_id, vpc_id). Changes find_by_segment return type to Result<Option<Vpc>> for explicit None handling. Adds instance_addresses-based deletion guard in try_delete. Rewrites find_ids to filter on instance_addresses.vpc_id directly. Refactors NSG propagation query to join via instance_addresses instead of network_segments. Introduces interface_vpc_id helper; updates validate and allocate to derive vpc_id per interface from either explicit iface.vpc_id or resolved segment. Updates add_inband_interfaces_to_config to gate on auto_config.vpc_id presence. Fixes optional-chain handling in get_dpa_vni.
Core allocation and validation logic
crates/api-core/src/instance/mod.rs, crates/api-core/src/handlers/instance.rs, crates/api-core/src/handlers/dpu.rs, crates/api-core/src/ethernet_virtualization.rs
Adds validate_zero_dpu_auto_vpc helper enforcing tenant ownership, Flat virtualization type, and fabric_interface_type == Nic. Switches zero-DPU allocation path from conditional network.auto to mandatory auto_config presence, calling the validator before proceeding. Updates HostInband segment binding validation to allow unbound segments while requiring bound segments to match the requested auto VPC and provide NIC-capable fabric. Switches DPU-managed host rejection condition from network.auto to auto_config.is_some(). Refactors DPU segment VPC validation to distinguish three outcomes: no VPC (FailedPrecondition), VPC with non-Dpu fabric (FailedPrecondition), or valid DPU VPC. Updates update_instance_network_config immutability checks to gate on auto_config instead of auto, and prevents changing vpc_id on existing auto-networked instances. Adds explicit None-to-error conversion in DPU handler and ethernet virtualization for find_by_segment results. Introduces hydrate_instance_allocation_request_from_deprecated_fields helper to derive VPC ID from machine HostInband segments when legacy auto=true requests lack auto_config.vpc_id.
Admin CLI: --flat-vpc-id, machine selection, and request building
crates/admin-cli/src/instance/allocate/args.rs, crates/admin-cli/src/instance/allocate/cmd.rs, crates/admin-cli/src/machine/show/cmd.rs, crates/admin-cli/src/rpc.rs, crates/admin-cli/src/instance/show/cmd.rs
Replaces --zero-dpu: bool flag with --flat-vpc-id: Option<VpcId> CLI argument. Updates get_next_free_machine signature to accept flat_vpc_id instead of zero_dpu. Replaces unconditional early return with a guarded branch: returns the machine only when flat_vpc_id.is_some() and instance_network_restrictions exist with Static membership; otherwise logs and skips. Adds flat_vpc_id allocation branch in build_instance_request that rejects combining flat_vpc_id with explicit interface/IP/VF selectors, fetches the VPC, enforces Flat virtualization, requires VPC organization ID, returns empty interface list, and propagates Some(vpc_id) for auto_config wiring. Removes zero_dpu from Mellanox vendor NIC filter predicate. Updates final InstanceConfig network wiring to set auto_config (with vpc_id) only when present, and derives auto from flat_vpc_id.is_some(). Refactors instance show interface rendering to batch VPC resolution via join_all from status vpc_id (when auto-networked) or segment network_segment_id (when explicit). Adds get_vpc_by_id local helper for single-VPC lookups.
Test infrastructure and fixture updates
crates/api-core/src/test_support/network_segment.rs, crates/api-core/src/tests/common/api_fixtures/test_managed_host.rs, crates/api-core/src/tests/common/api_fixtures/mod.rs, crates/test-harness/src/network/controller.rs, crates/api-core/src/cfg/file.rs, crates/agent/src/tests/full.rs
Introduces FIXTURE_TENANT_ORG_ID public constant in test_support::network_segment, replacing all hardcoded tenant organization UUIDs across test files. Updates create_host_inband_network_segment to remove auto-VPC creation, passing through the caller's vpc_id option directly. Adds TestManagedHost::from_rpc_machine constructor extracting host ID and DPU IDs from RPC machine. Adds release_instances_from_vpcs helper for concurrent instance cleanup before VPC deletion. Refactors NSG test to use TestManagedHost-based instance lifecycle instead of direct allocation/release calls. Updates fixture helpers and test setup throughout to use the shared constant.
Comprehensive test updates
crates/api-core/src/tests/instance*.rs, crates/api-core/src/tests/instance_allocate.rs, crates/api-core/src/tests/machine*.rs, crates/api-core/src/tests/network*.rs, crates/api-core/src/tests/vpc*.rs, crates/machine-a-tron/src/api_client.rs
Populates all InstanceInterfaceConfig literals with #[allow(deprecated)] auto: false and auto_config: None for protobuf contract alignment. Adds vpc_id: None to all interface status and config fixtures. Updates network config construction calls to pass vpc_ids and vpc_id arguments. Uses FIXTURE_TENANT_ORG_ID in all VPC creation requests. Adds new negative-path tests in instance_allocate rejecting zero-DPU allocations when auto_config.vpc_id is missing or non-Flat. Strengthens allocation tests to assert VPC ID propagation through RPC wire (auto_config.vpc_id), interface status (vpc_id field), persisted internal model (auto_config.vpc_id), and database instance_address rows (vpc_id column). Adapts machine_network tests to handle Option<Vpc> from find_by_segment with .unwrap() adjustments.

Sequence Diagram(s)

sequenceDiagram
  participant CLI as admin-cli
  participant RPC as build_instance_request
  participant CoreAlloc as allocate_instance
  participant ValidateVpc as validate_zero_dpu_auto_vpc
  participant DB as instance_address::allocate
  participant Resp as RPC response

  CLI->>RPC: --flat-vpc-id=<VpcId>
  RPC->>RPC: fetch VPC, validate Flat virtualization, build auto_config { vpc_id }
  RPC->>CoreAlloc: InstanceConfig { auto_config: Some { vpc_id }, interfaces: [] }
  CoreAlloc->>ValidateVpc: validate_zero_dpu_auto_vpc(vpc_id, tenant_org)
  ValidateVpc-->>CoreAlloc: Ok (Flat, Nic, tenant-owned)
  CoreAlloc->>CoreAlloc: validate HostInband segment binding vs vpc_id
  CoreAlloc->>DB: allocate addresses with interface_vpc_id per interface
  DB->>DB: INSERT instance_addresses(vpc_id)
  DB-->>CoreAlloc: allocated addresses
  CoreAlloc->>Resp: Instance { status.network.interfaces[*].vpc_id, network.auto_config.vpc_id }
  Resp->>CLI: RPC response with vpc_id propagated
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the primary change: enabling flat VPCs in zero-DPU environments by decoupling VPC association from network segments.
Description check ✅ Passed The description comprehensively explains the architectural motivation, implementation approach, breaking changes, and testing coverage, clearly relating to the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
crates/api-db/src/instance_network_config.rs (1)

75-80: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update the invalid-argument message to reference auto_config instead of deprecated auto.

Line 76 still reports InstanceNetworkConfig.auto, which no longer matches the request contract and can mislead operators during troubleshooting.

Suggested wording update
-            "InstanceNetworkConfig.auto reached the resolver with {} \
+            "InstanceNetworkConfig.auto_config reached the resolver with {} \
              pre-existing interfaces; auto requests must arrive with an \
              empty interfaces list",
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/api-db/src/instance_network_config.rs` around lines 75 - 80, The error
message in the DatabaseError::InvalidArgument call references the deprecated
field name `auto`, but the field has been renamed to `auto_config`. Update the
error message string on line 76 to replace "InstanceNetworkConfig.auto" with
"InstanceNetworkConfig.auto_config" so the error message accurately reflects the
current field name and prevents misleading operators during troubleshooting.
crates/api-core/src/instance/mod.rs (1)

935-944: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update stale auto error text to auto_config.

These user-facing errors still tell operators to set InstanceNetworkConfig.auto, but this path now requires/rejects auto_config.

Proposed fix
-        // Zero-DPU hosts (no DPU, or DPU in NIC mode) MUST use `auto`, because
+        // Zero-DPU hosts (no DPU, or DPU in NIC mode) MUST use `auto_config`, because
         // their only valid attachments are HostInband segments, and NICo knows
         // which one(s) the host is on. Conversely, hosts with DPUs cannot use
-        // `auto`, and are expected to enumerate their interfaces explicitly.
+        // `auto_config`, and are expected to enumerate their interfaces explicitly.
         if !mh_snapshot.has_managed_dpus() {
             let Some(requested_auto_config) = request.config.network.auto_config else {
                 return Err(CarbideError::InvalidArgument(format!(
-                    "zero-DPU host {} requires `InstanceNetworkConfig.auto = true`; cannot allocate an instance with explicitly-listed interfaces or with `auto = false`",
+                    "zero-DPU host {} requires `InstanceNetworkConfig.auto_config.vpc_id`; cannot allocate an instance without auto_config",
                     mh_snapshot.host_snapshot.id,
                 )));
             };
-            // `auto` is only valid on zero-DPU hosts; DPU-managed hosts must
+            // `auto_config` is only valid on zero-DPU hosts; DPU-managed hosts must
             // list their interfaces explicitly.
             if request.config.network.auto_config.is_some() {
                 return Err(CarbideError::InvalidArgument(format!(
-                    "host {} has DPUs; `InstanceNetworkConfig.auto` is only valid on zero-DPU hosts",
+                    "host {} has DPUs; `InstanceNetworkConfig.auto_config` is only valid on zero-DPU hosts",
                     mh_snapshot.host_snapshot.id,
                 )));
             }

Also applies to: 1022-1028

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/api-core/src/instance/mod.rs` around lines 935 - 944, The error
message in the zero-DPU host validation block references the outdated field name
`InstanceNetworkConfig.auto` but the code is now checking for `auto_config`.
Update the error message text to correctly reference
`InstanceNetworkConfig.auto_config` instead of `InstanceNetworkConfig.auto` to
match the actual field being validated. Additionally, apply the same correction
to any other similar error messages at lines 1022-1028 that still reference the
old field name.
crates/api-db/src/vpc.rs (1)

240-255: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not return soft-deleted VPCs from find_by_segment.

fetch_optional now models “segment is unbound”, but a segment whose vpc_id points at a soft-deleted VPC still returns Some(vpc). That lets allocation/validation treat deleted VPC membership as active.

Proposed fix
     .filter_relation(
         &ObjectColumnFilter::One(network_segment::IdColumn, &segment_id),
         Some("s"),
     );
+    query.push(" AND v.deleted IS NULL");
     query.push(" LIMIT 1");
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/api-db/src/vpc.rs` around lines 240 - 255, The find_by_segment
function in the VPC query builder returns soft-deleted VPCs because it lacks a
filter to exclude them, allowing allocation/validation logic to treat deleted
VPC membership as active. Add a filter to the FilterableQueryBuilder in
find_by_segment to exclude soft-deleted VPCs by filtering on the appropriate
soft-delete column (such as deleted_at or similar field) in the vpcs table,
similar to how the function already filters by segment_id using filter_relation.
crates/rpc/src/model/instance/config/network.rs (1)

1002-1029: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Test fixture does not exercise the intended rejection path.

The test test_auto_rejects_explicit_interfaces sets auto_config: None, which means auto = config.auto_config.is_some() evaluates to false. Consequently, the check at line 142 (if auto && !config.interfaces.is_empty()) never triggers. The test currently passes because the conversion succeeds (no rejection occurs), and then the expect_err call fails with the wrong reason.

To correctly test that auto_config combined with non-empty interfaces is rejected, supply auto_config: Some(...):

Proposed fix
     #[test]
     fn test_auto_rejects_explicit_interfaces() {
         // `auto: true` and populated interfaces are mutually exclusive,
         // so this should error.
+        let vpc_id = VpcId::new();
         let rpc_config = rpc::InstanceNetworkConfig {
             interfaces: vec![rpc::InstanceInterfaceConfig {
                 function_type: rpc::InterfaceFunctionType::Physical as i32,
                 network_segment_id: Some(NetworkSegmentId::new()),
                 network_details: None,
                 device: None,
                 device_instance: 0,
                 virtual_function_id: None,
                 ip_address: None,
                 ipv6_interface_config: None,
                 routing_profile: None,
             }],
             #[allow(deprecated)]
-            auto: false,
-            auto_config: None,
+            auto: true,
+            auto_config: Some(forge::InstanceNetworkAutoConfig {
+                vpc_id: Some(vpc_id),
+            }),
         };
         let result: Result<InstanceNetworkConfig, _> = rpc_config.try_into();
         let err = result.expect_err("auto + non-empty interfaces should be rejected");
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/rpc/src/model/instance/config/network.rs` around lines 1002 - 1029, In
the test function `test_auto_rejects_explicit_interfaces`, the `auto_config`
field is currently set to `None`, which causes `auto` to be derived as `false`
and prevents the intended rejection check from triggering. Change `auto_config:
None` to `auto_config: Some(...)` (with any valid auto configuration structure)
so that the condition `if auto && !config.interfaces.is_empty()` at line 142 is
properly evaluated and the rejection behavior is actually tested.
crates/admin-cli/src/instance/allocate/args.rs (1)

25-54: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a flat-VPC allocation example.

The command help now exposes --flat-vpc-id, but the examples only cover subnet and VPC-prefix allocation, leaving the zero-DPU replacement path undiscoverable.

Proposed help addition
 Allocate one instance onto a subnet:
     $ nico-admin-cli instance allocate --prefix-name eth0 --subnet 192.0.2.0/24
 
+Allocate one instance into a flat VPC for a zero-DPU host:
+    $ nico-admin-cli instance allocate --prefix-name eth0 \
+    --flat-vpc-id 12345678-1234-5678-90ab-cdef01234567
+
 Allocate several instances at once:
     $ nico-admin-cli instance allocate --number 4 --prefix-name eth0 \

As per coding guidelines, admin CLI examples should cover each meaningful axis of variation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/admin-cli/src/instance/allocate/args.rs` around lines 25 - 54, The
help documentation for the instance allocate command is missing an example
demonstrating the --flat-vpc-id flag, which is a key allocation option. Add a
new example in the after_long_help string that shows how to allocate an instance
using --flat-vpc-id instead of subnet or vpc-prefix-id. Follow the existing
example format and include it alongside the other allocation examples to make
this allocation path discoverable for users. This example should demonstrate the
zero-DPU replacement path as mentioned in the comment.

Source: Coding guidelines

🧹 Nitpick comments (4)
crates/api-core/src/tests/instance_config_update.rs (1)

901-903: ⚡ Quick win

Consolidate deprecated-field suppression into a single test helper.

These repeated #[allow(deprecated)] auto: false initializers are functionally correct but create high churn and spread lint suppression across the file. Prefer one helper that sets auto_config: None and contains the deprecation handling in one place.

Proposed refactor
+#[allow(deprecated)]
+fn test_network_config(
+    interfaces: Vec<rpc::InstanceInterfaceConfig>,
+) -> rpc::InstanceNetworkConfig {
+    rpc::InstanceNetworkConfig {
+        interfaces,
+        auto: false,
+        auto_config: None,
+    }
+}
...
-    let network = rpc::InstanceNetworkConfig {
-        interfaces: vec![...],
-        #[allow(deprecated)]
-        auto: false,
-        auto_config: None,
-    };
+    let network = test_network_config(vec![...]);

Also applies to: 969-971, 1028-1030, 1115-1117, 1188-1190, 1276-1278, 1344-1346, 1475-1477, 1543-1545, 1660-1662, 1702-1704, 1746-1748, 1884-1886, 1944-1946, 2004-2006

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/api-core/src/tests/instance_config_update.rs` around lines 901 - 903,
Create a single test helper function that encapsulates the initialization of the
deprecated `auto` field and `auto_config` field. Instead of repeating the
`#[allow(deprecated)]` attribute and the pattern `auto: false, auto_config:
None` at multiple test locations (lines 901-903, 969-971, 1028-1030, 1115-1117,
1188-1190, 1276-1278, 1344-1346, 1475-1477, 1543-1545, 1660-1662, 1702-1704,
1746-1748, 1884-1886, 1944-1946, 2004-2006), define a helper function that
builds the configuration struct with the deprecated field handling contained in
one place, then replace all occurrences of the scattered `#[allow(deprecated)]
auto: false, auto_config: None` pattern with calls to this helper function.

Source: Coding guidelines

crates/api-core/src/tests/instance.rs (2)

1033-1035: ⚡ Quick win

Centralize the deprecated auto field handling.

The repeated local #[allow(deprecated)] blocks make future removal noisy. Prefer ..Default::default() for these explicit-interface configs if available, or move the deprecated-field initialization behind one small test helper with a single documented allow. As per coding guidelines, “Avoid using #[allow(...)] unless you have a strong reason to do so.”

Representative cleanup
     rpc::InstanceNetworkConfig {
         interfaces,
-        #[allow(deprecated)]
-        auto: false,
-        auto_config: None,
+        ..Default::default()
     }

If the generated type cannot use Default cleanly, centralize the suppression instead:

#[allow(deprecated)]
fn explicit_network_config(
    interfaces: Vec<rpc::InstanceInterfaceConfig>,
) -> rpc::InstanceNetworkConfig {
    rpc::InstanceNetworkConfig {
        interfaces,
        auto: false,
        auto_config: None,
    }
}

Also applies to: 1849-1851, 2442-2444, 3030-3032, 3416-3418, 3506-3508, 3593-3595, 3749-3751, 3827-3829, 3965-3967, 4143-4145, 4242-4244, 4384-4386, 4522-4524, 4589-4591, 4745-4747, 5333-5335

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/api-core/src/tests/instance.rs` around lines 1033 - 1035, Centralize
the deprecated field handling across all instances in the test file. Create a
single test helper function (e.g., explicit_network_config) that contains the
`#[allow(deprecated)]` annotation and constructs the rpc::InstanceNetworkConfig
with the deprecated auto and auto_config fields set to false and None
respectively. Then replace all instances of the scattered #[allow(deprecated)]
blocks at lines 1033-1035, 1849-1851, 2442-2444, 3030-3032, 3416-3418,
3506-3508, 3593-3595, 3749-3751, 3827-3829, 3965-3967, 4143-4145, 4242-4244,
4384-4386, 4522-4524, 4589-4591, 4745-4747, and 5333-5335 with calls to this
centralized helper function, removing the redundant local allow annotations
entirely.

Source: Coding guidelines


167-178: ⚡ Quick win

Assert persisted instance_addresses.vpc_id in the record checks.

These tests now compute expected VPC IDs, but the rows read from instance_addresses still only validate address/prefix. Add direct record.vpc_id assertions so a regression that leaves the new DB association unset/wrong is caught even if config/status propagation still works. Based on PR objectives, instance_addresses.vpc_id is now the direct VPC association used by lookup paths.

Proposed assertions
     // This should the first IP. Algo does not look into machine_interface_addresses
     // table for used addresses for instance.
     assert_eq!(record.address.to_string(), "192.0.4.3");
+    assert_eq!(record.vpc_id, Some(vpc_ids[0]));
     assert_eq!(
         &record.address,
         network_config.interfaces[0]
     // This should the first IP. Algo does not look into machine_interface_addresses
     // table for used addresses for instance.
     assert_eq!(record.address.to_string(), "192.0.4.3");
+    assert_eq!(record.vpc_id, Some(vpc_id));
     assert_eq!(
         &record.address,
         network_config.interfaces[0]
     // This should the first IP. Algo does not look into machine_interface_addresses
     // table for used addresses for instance.
     assert_eq!(record.address.to_string(), "10.217.5.225");
+    assert_eq!(record.vpc_id, Some(vpc.id));
     assert_eq!(
         &record.address,
         network_config.interfaces[0]

Also applies to: 273-286, 355-359, 513-524, 2641-2641, 2671-2682

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/api-core/src/tests/instance.rs` around lines 167 - 178, The test code
now computes expected VPC IDs from segments using the join_all and
db::vpc::find_by_segment pattern, but the subsequent record assertions for
instance_addresses rows only validate address and prefix fields. Add direct
assertions that check record.vpc_id against the expected VPC IDs that were
computed in this block to ensure the database associations are correctly set.
Apply this fix at all locations where instance_addresses records are being
validated after computing VPC IDs, including the primary location (167-178) and
the other affected test locations mentioned in the comment (273-286, 355-359,
513-524, 2641-2641, 2671-2682).
crates/rpc/proto/forge.proto (1)

2749-2755: ⚡ Quick win

Update the contract comment to make auto_config the sole authoritative path.

Line 2749-Line 2751 still describes mutual exclusivity using deprecated auto. This can mislead API consumers and generated documentation; the rule should be documented against auto_config instead.

Suggested doc fix
-  // Mutually exclusive with `auto`: when `auto` is true, this
-  // MUST be empty, When `auto` is false, this lists the explicit
-  // interface configuration the caller wants applied.
+  // Mutually exclusive with `auto_config`: when `auto_config` is set, this
+  // MUST be empty. When `auto_config` is unset, this lists the explicit
+  // interface configuration the caller wants applied.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/rpc/proto/forge.proto` around lines 2749 - 2755, Update the contract
comment for the `interfaces` field (lines 2749-2751) to remove references to the
deprecated `auto` field and instead describe the mutual exclusivity relationship
with `auto_config`. The comment should make clear that `auto_config` is the
authoritative mechanism for controlling whether interfaces are automatically
configured or explicitly specified, ensuring API consumers and generated
documentation reflect the current contract rather than referencing deprecated
fields.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/admin-cli/src/instance/show/cmd.rs`:
- Around line 201-230: The current implementation uses filter_map and flatten
operations that remove missing VPC lookup results, causing the remaining VPCs to
shift to earlier indices and misalign with their corresponding interfaces.
Replace the filter_map(s.vpc_id).map(...) and
filter_map(c.network_segment_id).map(...) patterns with approaches that preserve
an Option<Vpc> for each interface instead of discarding missing results.
Specifically, for the auto_network branch, map each interface status to an
Option containing either the looked-up VPC or None if vpc_id is missing; for the
else branch, map each interface config to an Option containing either the
looked-up VPC or None if segment_id is missing or the lookup fails. Remove the
flatten() call since you now have Vec<Option<Vpc>> with proper 1:1 alignment,
and adjust the vpcs.get(idx) lookups to handle the Option wrapper appropriately
when displaying results.

In `@crates/admin-cli/src/machine/show/cmd.rs`:
- Around line 441-443: The current condition checking if flat_vpc_id.is_some()
returns the first eligible machine without verifying it is suitable for zero-DPU
flat-VPC auto allocation. Add an additional eligibility check in the if
statement to ensure the machine is a static-membership machine before returning
Some(machine), since the server rejects auto_config on DPU-backed machines and
this prevents batch allocation from trying valid hosts.

In `@crates/admin-cli/src/rpc.rs`:
- Around line 1267-1274: The error checking in the flat-vpc-id validation block
(in the condition with vpc_prefix_id, vf_vpc_prefix_id, and vf_subnet) currently
only rejects VPC prefixes and VF subnets, but it must also reject subnet,
explicit IP addresses, and IPv6 selector fields to prevent them from being
silently ignored. Expand the if condition to include checks for the subnet
field, any explicit IP selector fields, and IPv6 selector fields in the
allocate_instance struct, adding each to the logical OR chain with the existing
checks.

In `@crates/api-core/src/handlers/dpu.rs`:
- Around line 258-263: The error handling in the vpc::find_by_segment lookup
block incorrectly uses CarbideError::InvalidArgument when the network segment is
not found to be a member of a VPC. This is a system-side database state issue,
not a validation failure on caller input. Change the error variant from
CarbideError::InvalidArgument to CarbideError::FailedPrecondition to properly
classify this as a system failure and ensure correct error ownership semantics
and retry/alert behavior according to the project's error handling guidelines.

In `@crates/api-core/src/instance/mod.rs`:
- Line 31: The VPC-attached address creation in instance_addresses inserts for
requested_auto_config.vpc_id lacks the necessary VpcRowLock::Mutation lock,
allowing a race condition where concurrent VPC deletions can soft-delete a VPC
while addresses are still being committed to it. Acquire the
VpcRowLock::Mutation lock before performing the instance_addresses inserts for
the requested VPC to ensure mutual exclusion with concurrent VPC deletion
operations and prevent dangling addresses from being created for deleted VPCs.

In `@crates/api-db/migrations/20260615120000_instance_addresses_vpc_id.sql`:
- Around line 1-8: The migration adds a nullable vpc_id column to
instance_addresses and only partially backfills it via an UPDATE statement that
joins with network_segments. This leaves room for NULL vpc_id values in existing
or newly-inserted rows during a mixed-version rollout, violating the intended
VPC-address association contract. After the UPDATE backfill statement completes,
add an ALTER TABLE statement to the migration to enforce the vpc_id column as
NOT NULL, ensuring data consistency and preventing future null values from being
inserted into this critical field.

In `@crates/api-db/src/vpc.rs`:
- Around line 286-293: The instance_address_count_query contains a redundant
EXISTS subquery that checks if the VPC still exists (is not deleted). Since
instance_addresses are hard-deleted during instance release, this defensive
check is unnecessary. Simplify the query string to remove the EXISTS clause and
only check WHERE vpc_id=$1, reducing the query to a single simple condition.
Update the instance_address_count_query variable to contain only the simplified
SQL without the subquery that validates VPC existence.

In `@crates/machine-a-tron/src/api_client.rs`:
- Around line 291-293: Remove the #[allow(deprecated)] annotation and refactor
the InstanceNetworkConfig construction to use Default::default() instead of
manually setting the deprecated fields auto and auto_config. Keep only the
non-deprecated fields that need to be explicitly set, allowing the deprecated
fields to use their default values without suppressing the deprecation warning.
This maintains clippy cleanliness and follows coding guidelines that avoid
unnecessary lint suppressions.

---

Outside diff comments:
In `@crates/admin-cli/src/instance/allocate/args.rs`:
- Around line 25-54: The help documentation for the instance allocate command is
missing an example demonstrating the --flat-vpc-id flag, which is a key
allocation option. Add a new example in the after_long_help string that shows
how to allocate an instance using --flat-vpc-id instead of subnet or
vpc-prefix-id. Follow the existing example format and include it alongside the
other allocation examples to make this allocation path discoverable for users.
This example should demonstrate the zero-DPU replacement path as mentioned in
the comment.

In `@crates/api-core/src/instance/mod.rs`:
- Around line 935-944: The error message in the zero-DPU host validation block
references the outdated field name `InstanceNetworkConfig.auto` but the code is
now checking for `auto_config`. Update the error message text to correctly
reference `InstanceNetworkConfig.auto_config` instead of
`InstanceNetworkConfig.auto` to match the actual field being validated.
Additionally, apply the same correction to any other similar error messages at
lines 1022-1028 that still reference the old field name.

In `@crates/api-db/src/instance_network_config.rs`:
- Around line 75-80: The error message in the DatabaseError::InvalidArgument
call references the deprecated field name `auto`, but the field has been renamed
to `auto_config`. Update the error message string on line 76 to replace
"InstanceNetworkConfig.auto" with "InstanceNetworkConfig.auto_config" so the
error message accurately reflects the current field name and prevents misleading
operators during troubleshooting.

In `@crates/api-db/src/vpc.rs`:
- Around line 240-255: The find_by_segment function in the VPC query builder
returns soft-deleted VPCs because it lacks a filter to exclude them, allowing
allocation/validation logic to treat deleted VPC membership as active. Add a
filter to the FilterableQueryBuilder in find_by_segment to exclude soft-deleted
VPCs by filtering on the appropriate soft-delete column (such as deleted_at or
similar field) in the vpcs table, similar to how the function already filters by
segment_id using filter_relation.

In `@crates/rpc/src/model/instance/config/network.rs`:
- Around line 1002-1029: In the test function
`test_auto_rejects_explicit_interfaces`, the `auto_config` field is currently
set to `None`, which causes `auto` to be derived as `false` and prevents the
intended rejection check from triggering. Change `auto_config: None` to
`auto_config: Some(...)` (with any valid auto configuration structure) so that
the condition `if auto && !config.interfaces.is_empty()` at line 142 is properly
evaluated and the rejection behavior is actually tested.

---

Nitpick comments:
In `@crates/api-core/src/tests/instance_config_update.rs`:
- Around line 901-903: Create a single test helper function that encapsulates
the initialization of the deprecated `auto` field and `auto_config` field.
Instead of repeating the `#[allow(deprecated)]` attribute and the pattern `auto:
false, auto_config: None` at multiple test locations (lines 901-903, 969-971,
1028-1030, 1115-1117, 1188-1190, 1276-1278, 1344-1346, 1475-1477, 1543-1545,
1660-1662, 1702-1704, 1746-1748, 1884-1886, 1944-1946, 2004-2006), define a
helper function that builds the configuration struct with the deprecated field
handling contained in one place, then replace all occurrences of the scattered
`#[allow(deprecated)] auto: false, auto_config: None` pattern with calls to this
helper function.

In `@crates/api-core/src/tests/instance.rs`:
- Around line 1033-1035: Centralize the deprecated field handling across all
instances in the test file. Create a single test helper function (e.g.,
explicit_network_config) that contains the `#[allow(deprecated)]` annotation and
constructs the rpc::InstanceNetworkConfig with the deprecated auto and
auto_config fields set to false and None respectively. Then replace all
instances of the scattered #[allow(deprecated)] blocks at lines 1033-1035,
1849-1851, 2442-2444, 3030-3032, 3416-3418, 3506-3508, 3593-3595, 3749-3751,
3827-3829, 3965-3967, 4143-4145, 4242-4244, 4384-4386, 4522-4524, 4589-4591,
4745-4747, and 5333-5335 with calls to this centralized helper function,
removing the redundant local allow annotations entirely.
- Around line 167-178: The test code now computes expected VPC IDs from segments
using the join_all and db::vpc::find_by_segment pattern, but the subsequent
record assertions for instance_addresses rows only validate address and prefix
fields. Add direct assertions that check record.vpc_id against the expected VPC
IDs that were computed in this block to ensure the database associations are
correctly set. Apply this fix at all locations where instance_addresses records
are being validated after computing VPC IDs, including the primary location
(167-178) and the other affected test locations mentioned in the comment
(273-286, 355-359, 513-524, 2641-2641, 2671-2682).

In `@crates/rpc/proto/forge.proto`:
- Around line 2749-2755: Update the contract comment for the `interfaces` field
(lines 2749-2751) to remove references to the deprecated `auto` field and
instead describe the mutual exclusivity relationship with `auto_config`. The
comment should make clear that `auto_config` is the authoritative mechanism for
controlling whether interfaces are automatically configured or explicitly
specified, ensuring API consumers and generated documentation reflect the
current contract rather than referencing deprecated fields.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ae5d57a1-9f74-4c1a-ba99-6b896f9b9dea

📥 Commits

Reviewing files that changed from the base of the PR and between 7eb1969 and 208de58.

📒 Files selected for processing (46)
  • crates/admin-cli/src/instance/allocate/args.rs
  • crates/admin-cli/src/instance/allocate/cmd.rs
  • crates/admin-cli/src/instance/show/cmd.rs
  • crates/admin-cli/src/machine/show/cmd.rs
  • crates/admin-cli/src/rpc.rs
  • crates/agent/src/tests/full.rs
  • crates/api-core/src/cfg/file.rs
  • crates/api-core/src/ethernet_virtualization.rs
  • crates/api-core/src/handlers/dpu.rs
  • crates/api-core/src/handlers/instance.rs
  • crates/api-core/src/instance/mod.rs
  • crates/api-core/src/setup.rs
  • crates/api-core/src/test_support/network_segment.rs
  • crates/api-core/src/tests/client_resolution.rs
  • crates/api-core/src/tests/common/api_fixtures/instance.rs
  • crates/api-core/src/tests/common/api_fixtures/mod.rs
  • crates/api-core/src/tests/common/api_fixtures/test_managed_host.rs
  • crates/api-core/src/tests/common/network_segment.rs
  • crates/api-core/src/tests/instance.rs
  • crates/api-core/src/tests/instance_allocate.rs
  • crates/api-core/src/tests/instance_config_update.rs
  • crates/api-core/src/tests/machine_dhcp.rs
  • crates/api-core/src/tests/machine_network.rs
  • crates/api-core/src/tests/network_security_group.rs
  • crates/api-core/src/tests/network_segment.rs
  • crates/api-core/src/tests/vpc.rs
  • crates/api-core/src/tests/vpc_find.rs
  • crates/api-core/src/tests/vpc_peering.rs
  • crates/api-core/src/tests/vpc_prefix.rs
  • crates/api-db/migrations/20260615120000_instance_addresses_vpc_id.sql
  • crates/api-db/src/dpa_interface.rs
  • crates/api-db/src/instance.rs
  • crates/api-db/src/instance_address.rs
  • crates/api-db/src/instance_network_config.rs
  • crates/api-db/src/network_security_group.rs
  • crates/api-db/src/vpc.rs
  • crates/api-model/src/instance/config/network.rs
  • crates/api-model/src/instance/status/network.rs
  • crates/api-model/src/instance_address.rs
  • crates/machine-a-tron/src/api_client.rs
  • crates/network/src/virtualization.rs
  • crates/rpc/build.rs
  • crates/rpc/proto/forge.proto
  • crates/rpc/src/model/instance/config/network.rs
  • crates/rpc/src/model/instance/status/network.rs
  • crates/test-harness/src/network/controller.rs

Comment thread crates/admin-cli/src/instance/show/cmd.rs Outdated
Comment thread crates/admin-cli/src/machine/show/cmd.rs
Comment thread crates/admin-cli/src/rpc.rs Outdated
Comment thread crates/api-core/src/handlers/dpu.rs
Comment thread crates/api-core/src/instance/mod.rs Outdated
Comment thread crates/api-db/src/vpc.rs Outdated
Comment thread crates/machine-a-tron/src/api_client.rs
@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown

🔍 Container Scan Summary

Service Total Critical High Medium Low Other
boot-artifacts-aarch64 3 0 0 3 0 0
boot-artifacts-x86_64 3 0 0 3 0 0
forge-admin-cli-x86_64 264 6 24 98 6 130
machine-validation-runner 712 32 184 267 35 194
machine_validation 712 32 184 267 35 194
nvmetal-carbide 712 32 184 267 35 194
TOTAL 2406 102 576 905 111 712

Per-CVE detail lives in the per-service grype-* artifacts (JSON + SARIF). Severity counts only — no CVE IDs published here.

@copy-pr-bot

copy-pr-bot Bot commented Jun 22, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@Matthias247

Copy link
Copy Markdown
Contributor

This change adds a vpc_id column to instance_addresses

Quick question: Why does instance_addresses matter at all for no-dpu? I assumed since the instance just uses any available underlay IP, there might not be an entry needed in that table. All the instance logic could just return the machines IPs directly.

Is this route just because using instance_addresses simplifies something else and keeps it more in common with the DPU path?

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/api-core/src/handlers/instance.rs (1)

1397-1401: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Update deprecated field wording in the zero-DPU auto-path error.

Line 1399 still says auto: true; the public contract now uses auto_config, so this message is misleading during troubleshooting.

Suggested fix
-                "instance was allocated with `auto: true` but host {} is no longer zero-DPU; cannot update via the auto path",
+                "instance was allocated with `auto_config` but host {} is no longer zero-DPU; cannot update via the auto path",
                 instance.machine_id,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/api-core/src/handlers/instance.rs` around lines 1397 - 1401, The error
message in the InvalidArgument error within the mh_snapshot.has_managed_dpus()
conditional block uses the deprecated field name `auto: true` which no longer
matches the public contract terminology. Update the error message string to
replace `auto: true` with `auto_config` to provide accurate guidance to users
during troubleshooting.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@crates/api-core/src/handlers/instance.rs`:
- Around line 1397-1401: The error message in the InvalidArgument error within
the mh_snapshot.has_managed_dpus() conditional block uses the deprecated field
name `auto: true` which no longer matches the public contract terminology.
Update the error message string to replace `auto: true` with `auto_config` to
provide accurate guidance to users during troubleshooting.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: dc5a1a29-6389-457b-a81c-43fadc3b382d

📥 Commits

Reviewing files that changed from the base of the PR and between 208de58 and 114e944.

📒 Files selected for processing (26)
  • crates/admin-cli/src/instance/show/cmd.rs
  • crates/admin-cli/src/machine/show/cmd.rs
  • crates/admin-cli/src/rpc.rs
  • crates/agent/src/tests/full.rs
  • crates/api-core/src/cfg/file.rs
  • crates/api-core/src/ethernet_virtualization.rs
  • crates/api-core/src/handlers/dpu.rs
  • crates/api-core/src/handlers/instance.rs
  • crates/api-core/src/instance/mod.rs
  • crates/api-core/src/setup.rs
  • crates/api-core/src/tests/common/api_fixtures/mod.rs
  • crates/api-core/src/tests/instance_allocate.rs
  • crates/api-core/src/tests/machine_dhcp.rs
  • crates/api-core/src/tests/machine_network.rs
  • crates/api-core/src/tests/network_segment.rs
  • crates/api-core/src/tests/vpc.rs
  • crates/api-core/src/tests/vpc_find.rs
  • crates/api-core/src/tests/vpc_peering.rs
  • crates/api-core/src/tests/vpc_prefix.rs
  • crates/api-db/migrations/20260615120000_instance_addresses_vpc_id.sql
  • crates/api-db/src/dpa_interface.rs
  • crates/api-db/src/instance_address.rs
  • crates/api-db/src/vpc.rs
  • crates/api-model/src/instance_address.rs
  • crates/rpc/build.rs
  • crates/rpc/proto/forge.proto
💤 Files with no reviewable changes (2)
  • crates/rpc/build.rs
  • crates/rpc/proto/forge.proto
✅ Files skipped from review due to trivial changes (2)
  • crates/api-core/src/tests/machine_dhcp.rs
  • crates/api-core/src/tests/vpc.rs
🚧 Files skipped from review as they are similar to previous changes (20)
  • crates/api-db/migrations/20260615120000_instance_addresses_vpc_id.sql
  • crates/api-core/src/handlers/dpu.rs
  • crates/api-core/src/tests/vpc_find.rs
  • crates/api-core/src/setup.rs
  • crates/admin-cli/src/machine/show/cmd.rs
  • crates/api-core/src/tests/common/api_fixtures/mod.rs
  • crates/api-db/src/vpc.rs
  • crates/api-core/src/cfg/file.rs
  • crates/api-db/src/dpa_interface.rs
  • crates/api-core/src/tests/vpc_prefix.rs
  • crates/api-core/src/ethernet_virtualization.rs
  • crates/api-core/src/tests/machine_network.rs
  • crates/api-core/src/tests/network_segment.rs
  • crates/api-core/src/tests/vpc_peering.rs
  • crates/agent/src/tests/full.rs
  • crates/admin-cli/src/rpc.rs
  • crates/admin-cli/src/instance/show/cmd.rs
  • crates/api-db/src/instance_address.rs
  • crates/api-core/src/instance/mod.rs
  • crates/api-core/src/tests/instance_allocate.rs

@kensimon

Copy link
Copy Markdown
Contributor Author

Is this route just because using instance_addresses simplifies something else and keeps it more in common with the DPU path?

Basically, yes. It would be more complex to say "in these cases we query instance_addresses, and in these cases we query machine_interface_addresses", so for zero-dpu cases, we copy the machine addresses over to instance_addresses: see crates/api-db/src/instance_address.rs:337.

As for why the VPC ID is being attached to instance_addresses at all, we need to store it somewhere on the instance, because it's the only way at all that we know what VPC an instance belongs to, if the network segment has no vpc_id. So if we want to know "what VPC(s) is this instance supposed to be in", we're using the instance address as the jumping off point.

@github-actions

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/api-test-helper/src/instance.rs (1)

39-83: 🎯 Functional Correctness | 🟠 Major | 🏗️ Heavy lift

Thread the flat VPC ID through this helper.

The segment_id == None path still emits only deprecated auto: true, but flat-VPC segments may have no segment-level VPC for the handler to hydrate from. Add a flat_vpc_id/auto_config.vpc_id path here, and keep deprecated auto coverage in a dedicated compatibility test if needed. As per path instructions, review test-support crates for “deterministic behavior, clear fixtures, reusable helpers.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/api-test-helper/src/instance.rs` around lines 39 - 83, The None branch
of the segment_id match statement (zero-DPU path) currently only uses the
deprecated auto: true in the network configuration, but flat-VPC segments
require a vpc_id to be specified. Add a flat_vpc_id parameter to the function,
then in the None branch where the instance_config is constructed, replace or
supplement the "auto": true with an "auto_config" object that includes "vpc_id"
set to the flat_vpc_id value. Keep the deprecated auto: true path available in a
separate compatibility test if needed.

Source: Path instructions

🧹 Nitpick comments (2)
crates/api-core/src/handlers/instance.rs (2)

187-192: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick win

Hydrate the batch inside one transaction.

This loop opens and commits a transaction per instance request before the actual batch allocation. Since hydration is a read-only pass over the batch, wrap the loop in one with_txn call to avoid avoidable DB round trips and keep the hydration pass together.

Proposed refactor
-    for request in &mut batch_request.instance_requests {
-        api.with_txn(|txn| {
-            hydrate_instance_allocation_request_from_deprecated_fields(txn, request).boxed()
-        })
-        .await??;
-    }
+    api.with_txn(|txn| {
+        async {
+            for request in &mut batch_request.instance_requests {
+                hydrate_instance_allocation_request_from_deprecated_fields(txn, request).await?;
+            }
+
+            Ok(())
+        }
+        .boxed()
+    })
+    .await??;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/api-core/src/handlers/instance.rs` around lines 187 - 192, The loop
iterating over batch_request.instance_requests currently opens a separate
transaction for each request when calling api.with_txn, causing unnecessary
database round trips. Move the api.with_txn call outside the loop so that it
wraps the entire for loop, allowing all calls to
hydrate_instance_allocation_request_from_deprecated_fields to execute within a
single read-only transaction. This keeps the hydration pass together and
eliminates avoidable DB round trips.

1939-1966: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Keep deprecated-hydration warnings attributable.

These warning paths can run while hydrating a batch, but the emitted records do not include the machine_id, and the multi-segment case drops the segment count. Add those as structured fields so operators can identify the failing request without correlating nearby log lines. As per coding guidelines, “When writing log messages, prefer placing common fields as attributes passed to tracing functions,” and as per path instructions, review crates/**/*.rs for “structured tracing fields.”

Proposed logging update
         [] => {
             tracing::warn!(
+                %machine_id,
                 "Assigned machine is not in a HostInband network, cannot auto-assign VPC ID"
             );
             return Ok(());
         }
-        _ => {
+        segments => {
             tracing::warn!(
-                "Assigned machine is in multiple a HostInband networks, cannot auto-assign VPC ID"
+                %machine_id,
+                count = segments.len(),
+                "Assigned machine is in multiple HostInband networks, cannot auto-assign VPC ID"
             );
             return Ok(());
         }
@@
     } else {
-        tracing::warn!(network_segment_id = %ns_id, "Network segment does not have a VPC ID, cannot auto-assign VPC ID");
+        tracing::warn!(%machine_id, network_segment_id = %ns_id, "Network segment does not have a VPC ID, cannot auto-assign VPC ID");
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/api-core/src/handlers/instance.rs` around lines 1939 - 1966, The
warning messages in the network segment handling code are missing structured
tracing fields needed for operators to identify failing requests. Add the
machine_id as a structured field to all three warning calls: the single
tracing::warn for machines not in a HostInband network, the multi-segment case
which should also include the count of networks found, and the existing
network_segment warning. Include these as attributes to the tracing::warn! macro
calls following the pattern already shown in the info log for vpc_id assignment,
ensuring machine_id is consistently available across all warning paths to avoid
requiring log line correlation for debugging.

Sources: Coding guidelines, Path instructions

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@crates/api-test-helper/src/instance.rs`:
- Around line 39-83: The None branch of the segment_id match statement (zero-DPU
path) currently only uses the deprecated auto: true in the network
configuration, but flat-VPC segments require a vpc_id to be specified. Add a
flat_vpc_id parameter to the function, then in the None branch where the
instance_config is constructed, replace or supplement the "auto": true with an
"auto_config" object that includes "vpc_id" set to the flat_vpc_id value. Keep
the deprecated auto: true path available in a separate compatibility test if
needed.

---

Nitpick comments:
In `@crates/api-core/src/handlers/instance.rs`:
- Around line 187-192: The loop iterating over batch_request.instance_requests
currently opens a separate transaction for each request when calling
api.with_txn, causing unnecessary database round trips. Move the api.with_txn
call outside the loop so that it wraps the entire for loop, allowing all calls
to hydrate_instance_allocation_request_from_deprecated_fields to execute within
a single read-only transaction. This keeps the hydration pass together and
eliminates avoidable DB round trips.
- Around line 1939-1966: The warning messages in the network segment handling
code are missing structured tracing fields needed for operators to identify
failing requests. Add the machine_id as a structured field to all three warning
calls: the single tracing::warn for machines not in a HostInband network, the
multi-segment case which should also include the count of networks found, and
the existing network_segment warning. Include these as attributes to the
tracing::warn! macro calls following the pattern already shown in the info log
for vpc_id assignment, ensuring machine_id is consistently available across all
warning paths to avoid requiring log line correlation for debugging.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 522e8444-254f-4c46-92ef-f4a2b259e077

📥 Commits

Reviewing files that changed from the base of the PR and between 06b9d20 and 08ac837.

📒 Files selected for processing (2)
  • crates/api-core/src/handlers/instance.rs
  • crates/api-test-helper/src/instance.rs

@Matthias247

Copy link
Copy Markdown
Contributor

Is this route just because using instance_addresses simplifies something else and keeps it more in common with the DPU path?

Basically, yes. It would be more complex to say "in these cases we query instance_addresses, and in these cases we query machine_interface_addresses", so for zero-dpu cases, we copy the machine addresses over to instance_addresses: see crates/api-db/src/instance_address.rs:337.

Thanks for the explanation. I am fine either way: With this extension to instance_addresses or by changing other code not to use it. We won't get around special handling this in some way, so whatever you, @chet and @bcavnvidia and decide is fine 👍🏻

@kensimon kensimon enabled auto-merge (squash) June 23, 2026 23:50
@kensimon kensimon merged commit b45196f into NVIDIA:main Jun 24, 2026
55 checks passed
ajf pushed a commit that referenced this pull request Jun 25, 2026
Building on #2666:

Passing `auto=true` as part of InstanceNetworkConfig when allocating an
Instance on NICo is now deprecated in favor of passing a VPC ID as part
of a InstanceNetworkConfig.auto_config message, so that NICo knows what
VPC ID to assign to the network interface if the network segment doesn't
belong to any VPC. (This is so multiple "flat" VPC's can share a single
network segment for single-dpu hosts: An instance's network interface
now carries the VPC ID instead of the network segment.)

Since the REST API's APIInstanceCreateRequest already requires callers
to pass a VPC ID, we can pass it directly as part of the call to
AllocateInstance, which makes change this pretty easy, and not require
any changes for callers of the REST API.

## Related issues
#2647

## Type of Change
- [ ] **Add** - New feature or capability
- [ ] **Change** - Changes in existing functionality
- [ ] **Fix** - Bug fixes
- [ ] **Remove** - Removed features or deprecated functionality
- [ ] **Internal** - Internal changes (refactoring, tests, docs, etc.)

## Breaking Changes
- [ ] **This PR contains breaking changes**

## Testing
- [X] Unit tests added/updated
- [ ] Integration tests added/updated
- [ ] Manual testing performed
- [ ] No testing required (docs, internal refactor, etc.)

## Additional Notes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants