Skip to content

Commit 92d912b

Browse files
[multicast] connect MGD and DDM to Omicron
Wires MGD (MRIB programming) and DDM (live peer topology for sled-to-switch-port resolution) into the multicast reconciler RPW. The reconciler resolves sled-to-port mapping via DDM peers (primary, live source) and falls back to inventory + DPD backplane when DDM is unavailable. MRIB routes are advertised through MGD and withdrawn when no "Joined" members remain. Multicast is *instance networking* under the planned migration of system-level networking from Nexus RPWs to sled-agent reconcilers ([omicron#10167](#10167 )). ### Sled-side underlay NIC filter programming - `set_mcast_m2p` / `clear_mcast_m2p` in the OPTE port manager hold UDP sockets joined to the underlay multicast group on each underlay NIC. Joining the group on a held socket triggers `mac_multicast_add` in the kernel, which programs the per-NIC multicast MAC filter so cxgbe delivers frames to xde. Workaround for opte#908. - Eager rehydration at sled-agent startup reopens those filter sockets for M2P entries that survive in xde across a restart. Rehydration failures clear the surviving M2P entry so convergence retries on the next pass instead of black-holing the group. ### Switch-zone integration - New `MulticastSwitchZoneClient` fans out per-switch MGD and DDM clients, discovered via internal DNS SRV records. The reconciler uses it for MRIB writes and live peer queries (consuming the ddm-admin-client `GET /peers` endpoint that returns `if_name` / port info per peer). - `ServiceName::Ddm` registered in internal DNS via `host_zone_switch` (now takes a `ddm_port`) so cross-sled consumers can discover `ddmd` in switch zones. RSS, the test starter, and `overridables_for_test` thread the new port through. The multicast reconciler is the first cross-sled consumer; previously, all `DdmAdminClient` callers were sled-local via `DdmAdminClient::localhost`. - Resolver helper preserves SRV target names alongside resolved sockets, enabling per-target correlation when multiple switch zones share an address but differ by port. *Note*: the first reconciler pass after upgrade publishes one new `_ddm._tcp` SRV record per switch zone, causing a one-time DNS generation bump. ### Instance-scoped multicast subscriptions - v36 (`VERSION_MCAST_M2P_FORWARDING`) introduces `PUT/DELETE /instances/{instance_id}/multicast-group`, replacing the earlier VMM-keyed `/vmms/{propolis_id}/multicast-group` shape. Sled-agent resolves the active VMM under its instance-state lock and dispatches to OPTE atomically, eliminating a Nexus-side lookup-vs-call race where a migration commit could land subscriptions on a stale propolis. - v7 endpoints remain on the trait as deprecated shims that perform the propolis-to-instance lookup and delegate to the new handler. - Nexus drops `cached_propolis_id` and `lookup_propolis_id` plumbing through the reconciler entirely. `subscribe_vmm` / `unsubscribe_vmm` become `subscribe_instance` / `unsubscribe_instance`. ### Per-pass sled-to-port resolution Delivers the design captured in the prior TODO: prefer DDM's authoritative view of sled-to-port reachability over inventory, with inventory as cross-validation rather than the primary input. - Replaces the previous TTL'd sled-mapping cache with a single-pass amortization built once at the top of the member reconciler pass and threaded through the per-pass reconciler context. - DDM peer topology is the primary source. Inventory + DPD backplane is the fallback and supplements partial DDM coverage (per-sled gap-fill) rather than being all-or-nothing. - Parsed peer port IDs are cross-validated against the DPD backplane map. - Sequential per-switch fallback for shared-state DPD reads (backplane map, underlay group fetch), so a single unhealthy switch can't fail the whole read. ### Saga and RPW interaction - Saga state guard widened: the DPD-ensure saga accepts "Active" as well as "Creating" so crash-recovery re-execution doesn't roll back already-applied DPD state. - `instance_stop` detaches multicast members and activates the reconciler only after sled-agent acknowledges the Stop request, avoiding M2P / forwarding teardown for a still-running guest if Stop fails. ### Test updates - Integration coverage for MRIB programming, DDM-vs-inventory drift, saga idempotent crash-recovery, per-switch invariant checks, and underlay MAC filter lifecycle. - New `populate_ddm_peers` test helper synthesizes DDM peer topology from datastore + inventory so tests exercise the production primary path instead of the inventory fallback that an empty `DdmInstance` would otherwise force. Cache keyed on the in-service sled-set so multi-sled fixtures rebuild on sled transitions.
1 parent 346debd commit 92d912b

72 files changed

Lines changed: 4897 additions & 2288 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

Cargo.lock

Lines changed: 33 additions & 31 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -493,7 +493,7 @@ digest = "0.10.7"
493493
dns-server = { path = "dns-server" }
494494
dns-server-api = { path = "dns-server-api" }
495495
dns-service-client = { path = "clients/dns-service-client" }
496-
dpd-client = { git = "https://github.com/oxidecomputer/dendrite", rev = "1ddaa5d6b101fbaa2c29eca847111cbef1a272ad" }
496+
dpd-client = { git = "https://github.com/oxidecomputer/dendrite", rev = "e10e4f5a993fe950ab1b478abb5dcbfa7aa92091" }
497497
dropshot = { version = "0.16.6", features = [ "usdt-probes" ] }
498498
dropshot-api-manager = "0.6.0"
499499
dropshot-api-manager-types = "0.6.0"
@@ -599,8 +599,8 @@ ntp-admin-api = { path = "ntp-admin/api" }
599599
ntp-admin-client = { path = "clients/ntp-admin-client" }
600600
ntp-admin-types = { path = "ntp-admin/types" }
601601
ntp-admin-types-versions = { path = "ntp-admin/types/versions" }
602-
mg-admin-client = { git = "https://github.com/oxidecomputer/maghemite", rev = "4d1f20f793da102b29b914569725ebc9fdf746dd" }
603-
ddm-admin-client = { git = "https://github.com/oxidecomputer/maghemite", rev = "4d1f20f793da102b29b914569725ebc9fdf746dd" }
602+
mg-admin-client = { git = "https://github.com/oxidecomputer/maghemite", rev = "c3c3032f8bdc91d6faf2b36e05b8375a0980765c" }
603+
ddm-admin-client = { git = "https://github.com/oxidecomputer/maghemite", rev = "c3c3032f8bdc91d6faf2b36e05b8375a0980765c" }
604604
multimap = "0.10.1"
605605
nexus-auth = { path = "nexus/auth" }
606606
nexus-background-task-interface = { path = "nexus/background-task-interface" }
@@ -737,7 +737,7 @@ rats-corim = { git = "https://github.com/oxidecomputer/rats-corim.git", rev = "f
737737
raw-cpuid = { git = "https://github.com/oxidecomputer/rust-cpuid.git", rev = "a4cf01df76f35430ff5d39dc2fe470bcb953503b" }
738738
rayon = "1.10"
739739
rcgen = "0.12.1"
740-
rdb-types = { git = "https://github.com/oxidecomputer/maghemite", rev = "4d1f20f793da102b29b914569725ebc9fdf746dd" }
740+
rdb-types = { git = "https://github.com/oxidecomputer/maghemite", rev = "c3c3032f8bdc91d6faf2b36e05b8375a0980765c" }
741741
reconfigurator-cli = { path = "dev-tools/reconfigurator-cli" }
742742
reedline = "0.40.0"
743743
ref-cast = "1.0"

clients/ddm-admin-client/src/lib.rs

Lines changed: 35 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
// License, v. 2.0. If a copy of the MPL was not distributed with this
33
// file, You can obtain one at https://mozilla.org/MPL/2.0/.
44

5-
// Copyright 2023 Oxide Computer Company
5+
// Copyright 2026 Oxide Computer Company
66

77
#![allow(clippy::redundant_closure_call)]
88
#![allow(clippy::needless_lifetimes)]
@@ -107,6 +107,40 @@ impl Client {
107107
self.inner.enable_stats(request).await.map(|resp| resp.into_inner())
108108
}
109109

110+
/// Returns DDM peer information including interface names.
111+
///
112+
/// The `if_name` field on each peer provides a live sled-to-port
113+
/// mapping, identifying which switch port a peer sled is connected
114+
/// through (e.g., `"tfportrear0_0"`).
115+
pub async fn get_peers(
116+
&self,
117+
) -> Result<
118+
std::collections::HashMap<String, types::PeerInfo>,
119+
Error<types::Error>,
120+
> {
121+
self.inner.get_peers().await.map(|resp| resp.into_inner())
122+
}
123+
124+
/// Returns multicast routes learned from DDM peers.
125+
///
126+
/// Each route includes the origin (overlay/underlay mapping),
127+
/// the nexthop peer that advertised it, and the path vector.
128+
pub async fn get_multicast_groups(
129+
&self,
130+
) -> Result<Vec<types::MulticastRoute>, Error<types::Error>> {
131+
self.inner.get_multicast_groups().await.map(|resp| resp.into_inner())
132+
}
133+
134+
/// Returns multicast origins that this DDM instance is advertising.
135+
pub async fn get_originated_multicast_groups(
136+
&self,
137+
) -> Result<Vec<types::MulticastOrigin>, Error<types::Error>> {
138+
self.inner
139+
.get_originated_multicast_groups()
140+
.await
141+
.map(|resp| resp.into_inner())
142+
}
143+
110144
/// Returns the addresses of connected sleds.
111145
///
112146
/// Note: These sleds have not yet been verified.

common/src/api/external/mod.rs

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2543,6 +2543,8 @@ impl Vni {
25432543
///
25442544
/// This is a low-numbered VNI to avoid colliding with user VNIs.
25452545
/// However, it is not in the Oxide-reserved range yet.
2546+
///
2547+
/// Should match `oxide_vpc::api::DEFAULT_MULTICAST_VNI`.
25462548
pub const DEFAULT_MULTICAST_VNI: Self = Self(77);
25472549

25482550
/// Oxide reserves a slice of initial VNIs for its own use.

dev-tools/ls-apis/tests/api_dependencies.out

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ Crucible Pantry (client: crucible-pantry-client)
2929
Maghemite DDM Admin (client: ddm-admin-client)
3030
consumed by: installinator (omicron/installinator) via 1 path
3131
consumed by: mgd (maghemite/mgd) via 1 path
32+
consumed by: omicron-nexus (omicron/nexus) via 1 path
3233
consumed by: omicron-sled-agent (omicron/sled-agent) via 1 path
3334
consumed by: wicketd (omicron/wicketd) via 1 path
3435

dev-tools/omdb/tests/successes.out

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -656,7 +656,7 @@ task: "bfd_manager"
656656
configured period: every <REDACTED_DURATION>s
657657
last completed activation: <REDACTED ITERATIONS>, triggered by <TRIGGERED_BY_REDACTED>
658658
started at <REDACTED_TIMESTAMP> (<REDACTED DURATION>s ago) and ran for <REDACTED DURATION>ms
659-
last completion reported error: failed to resolve addresses for Dendrite services: proto error: no records found for Query { name: Name("_dendrite._tcp.control-plane.oxide.internal."), query_type: SRV, query_class: IN }
659+
last completion reported error: failed to resolve addresses for Dendrite services: proto error: no records found for Query { name: Name("_mgs._tcp.control-plane.oxide.internal."), query_type: SRV, query_class: IN }
660660

661661
task: "blueprint_planner"
662662
configured period: every <REDACTED_DURATION>m
@@ -1342,7 +1342,7 @@ task: "bfd_manager"
13421342
configured period: every <REDACTED_DURATION>s
13431343
last completed activation: <REDACTED ITERATIONS>, triggered by <TRIGGERED_BY_REDACTED>
13441344
started at <REDACTED_TIMESTAMP> (<REDACTED DURATION>s ago) and ran for <REDACTED DURATION>ms
1345-
last completion reported error: failed to resolve addresses for Dendrite services: proto error: no records found for Query { name: Name("_dendrite._tcp.control-plane.oxide.internal."), query_type: SRV, query_class: IN }
1345+
last completion reported error: failed to resolve addresses for Dendrite services: proto error: no records found for Query { name: Name("_mgs._tcp.control-plane.oxide.internal."), query_type: SRV, query_class: IN }
13461346

13471347
task: "blueprint_planner"
13481348
configured period: every <REDACTED_DURATION>m

illumos-utils/src/opte/mod.rs

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,17 @@ use std::net::IpAddr;
4141
use std::net::Ipv4Addr;
4242
use std::net::Ipv6Addr;
4343

44+
// `oxide_vpc::api::DEFAULT_MULTICAST_VNI` and
45+
// `omicron_common::api::external::Vni::DEFAULT_MULTICAST_VNI` live in sibling
46+
// crates that cannot reference each other's constant. They must stay
47+
// numerically equal: the MRIB, M2P mappings, and OPTE all route on this
48+
// value, so any divergence would black-hole multicast traffic.
49+
const _: () = assert!(
50+
oxide_vpc::api::DEFAULT_MULTICAST_VNI
51+
== omicron_common::api::external::Vni::DEFAULT_MULTICAST_VNI.as_u32(),
52+
"oxide_vpc::api::DEFAULT_MULTICAST_VNI must equal omicron_common Vni::DEFAULT_MULTICAST_VNI",
53+
);
54+
4455
/// Information about the gateway for an OPTE port
4556
#[derive(Debug, Clone, Copy)]
4657
#[allow(dead_code)]

0 commit comments

Comments
 (0)