Skip to content

Commit 3b54e16

Browse files
[ddmd] add --no-state-machine flag for test fixtures and Linux build
Omicron's oxidecomputer/omicron#10381 introduces a stubbed `ddmd` admin endpoint because spawning a real `ddmd` in a generic test toolchain is not viable: the routing state machine (discovery, exchange, route synchronization) depends on illumos networking facilities the toolchain does not provide. Consumers of the stub, e.g., Nexus RPW (multicast members), sled-agent's DDM reconciler, and anything that resolves the DDM internal-DNS service name, cannot exercise the real admin surface from Omicron's test harness. This work adds an opt-in `--no-state-machine` flag to `ddmd` that runs only the admin API server and skips the state machine entirely, allowing the fixture to spawn the real binary. This is analogous to `mgd --no-bgp-dispatcher`, which Omicron's `MgdInstance` already uses for the same purpose. To make the fixture path usable on Linux, `ddmd` itself must build on Linux. The previous code pulled the illumos-only crates `libnet`, `dpd-client`, `opte-ioctl`, and `oxide-vpc` unconditionally through `ddm`, which failed to link on Linux (`-lzfs`, `-ldlpi`). This change introduces an `illumos` feature in both `ddm` and `ddmd` (default-on, mirroring `mgd`'s `mg-lower` pattern) that marks those four crates optional. The buildomat `linux.sh` job now builds `ddmd` and `ddmadm`, with `ddmd` invoked as `cargo build --bin ddmd --no-default-features`. The illumos-only halves of `ddm` are isolated by the feature gate: - The routing state machine implementation moves from `sm.rs` into `sm/state.rs`. - The exchange runtime (HTTP push/pull and route programming) moves from `exchange.rs` into `exchange/runtime.rs`. - The discovery runtime (UDPv6 solicitation/advertisement loops) moves from `discovery.rs` into `discovery/runtime.rs`. Each parent `mod.rs` keeps the platform-agnostic types and re-exports the runtime surface so existing call sites resolve unchanged on illumos. The runtime submodules are gated as a unit by `#[cfg(all(feature = "illumos", target_os = "illumos"))]`. We also remove the single-function `ddm/src/util.rs`, inlining the function into `discovery/runtime.rs`, where its sole caller lives. The SIGTERM cleanup handler is installed regardless of the flag, so Ctrl-C still exits cleanly in `--no-state-machine` mode. The imported route sets are empty in that mode, so the cleanup itself is a noop. Passing `--addr` alongside `--no-state-machine` is harmless but ignored, with a warning logged.
1 parent 81f662f commit 3b54e16

13 files changed

Lines changed: 914 additions & 680 deletions

File tree

.github/buildomat/jobs/linux.sh

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,26 @@
2828
#: series = "linux"
2929
#: name = "mgadm.sha256.txt"
3030
#: from_output = "/work/release/mgadm.sha256.txt"
31+
#:
32+
#: [[publish]]
33+
#: series = "linux"
34+
#: name = "ddmd"
35+
#: from_output = "/work/release/ddmd"
36+
#:
37+
#: [[publish]]
38+
#: series = "linux"
39+
#: name = "ddmd.sha256.txt"
40+
#: from_output = "/work/release/ddmd.sha256.txt"
41+
#:
42+
#: [[publish]]
43+
#: series = "linux"
44+
#: name = "ddmadm"
45+
#: from_output = "/work/release/ddmadm"
46+
#:
47+
#: [[publish]]
48+
#: series = "linux"
49+
#: name = "ddmadm.sha256.txt"
50+
#: from_output = "/work/release/ddmadm.sha256.txt"
3151

3252
set -o errexit
3353
set -o pipefail
@@ -64,3 +84,21 @@ popd
6484
cp target/debug/mgadm /work/debug
6585
cp target/release/mgadm /work/release
6686
digest /work/release/mgadm > /work/release/mgadm.sha256.txt
87+
88+
banner "ddmd"
89+
pushd ddmd
90+
cargo build --bin ddmd --no-default-features
91+
cargo build --bin ddmd --no-default-features --release
92+
popd
93+
cp target/debug/ddmd /work/debug
94+
cp target/release/ddmd /work/release
95+
digest /work/release/ddmd > /work/release/ddmd.sha256.txt
96+
97+
banner "ddmadm"
98+
pushd ddmadm
99+
cargo build --bin ddmadm
100+
cargo build --bin ddmadm --release
101+
popd
102+
cp target/debug/ddmadm /work/debug
103+
cp target/release/ddmadm /work/release
104+
digest /work/release/ddmadm > /work/release/ddmadm.sha256.txt

ddm/Cargo.toml

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,10 +21,6 @@ hyper.workspace = true
2121
hyper-util.workspace = true
2222
http-body-util.workspace = true
2323
serde_json.workspace = true
24-
libnet.workspace = true
25-
dpd-client.workspace = true
26-
opte-ioctl.workspace = true
27-
oxide-vpc.workspace = true
2824
sled.workspace = true
2925
mg-common.workspace = true
3026
chrono.workspace = true
@@ -35,3 +31,15 @@ oxnet.workspace = true
3531
uuid.workspace = true
3632
ddm-api.workspace = true
3733
ddm-types.workspace = true
34+
35+
# illumos-only deps used by the routing state machine and platform sys layer.
36+
# Gated by the `illumos` feature so non-illumos builds (e.g. Linux test
37+
# fixtures running ddmd with `--no-state-machine`) link cleanly.
38+
libnet = { workspace = true, optional = true }
39+
dpd-client = { workspace = true, optional = true }
40+
opte-ioctl = { workspace = true, optional = true }
41+
oxide-vpc = { workspace = true, optional = true }
42+
43+
[features]
44+
default = ["illumos"]
45+
illumos = ["dep:libnet", "dep:dpd-client", "dep:opte-ioctl", "dep:oxide-vpc"]

ddm/src/admin.rs

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,6 @@ use ddm_types::exchange::PathVector;
1212
use dropshot::ApiDescription;
1313
use dropshot::ApiDescriptionBuildErrors;
1414
use dropshot::ConfigDropshot;
15-
use dropshot::ConfigLogging;
16-
use dropshot::ConfigLoggingLevel;
1715
use dropshot::HttpError;
1816
use dropshot::HttpResponseOk;
1917
use dropshot::HttpResponseUpdatedNoContent;
@@ -23,7 +21,7 @@ use dropshot::TypedBody;
2321
use mg_common::lock;
2422
use mg_common::net::TunnelOrigin;
2523
use oxnet::Ipv6Net;
26-
use slog::{Logger, error, info};
24+
use slog::{Logger, error, info, o};
2725
use std::collections::{HashMap, HashSet};
2826
use std::net::{IpAddr, SocketAddr, SocketAddrV4, SocketAddrV6};
2927
use std::sync::Arc;
@@ -35,6 +33,8 @@ use tokio::task::JoinHandle;
3533

3634
pub const DDM_STATS_PORT: u16 = 8001;
3735

36+
const UNIT_API_SERVER: &str = "api_server";
37+
3838
#[derive(Default)]
3939
pub struct RouterStats {
4040
pub originated_underlay_prefixes: AtomicU64,
@@ -68,11 +68,11 @@ pub fn handler(
6868
..Default::default()
6969
};
7070

71-
let ds_log = ConfigLogging::StderrTerminal {
72-
level: ConfigLoggingLevel::Error,
73-
}
74-
.to_logger("admin")
75-
.map_err(|e| e.to_string())?;
71+
let ds_log = log.new(o!(
72+
"component" => crate::COMPONENT_DDM,
73+
"module" => crate::MOD_ADMIN,
74+
"unit" => UNIT_API_SERVER,
75+
));
7676

7777
let api = api_description().map_err(|e| e.to_string())?;
7878

ddm/src/discovery/mod.rs

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
// This Source Code Form is subject to the terms of the Mozilla Public
2+
// License, v. 2.0. If a copy of the MPL was not distributed with this
3+
// file, You can obtain one at https://mozilla.org/MPL/2.0/.
4+
5+
//! This module implements the ddm router discovery mechanisms. These
6+
//! mechanisms are responsible for three primary things
7+
//!
8+
//! 1. Soliciting other routers through UDP/IPv6 link local multicast.
9+
//! 2. Sending out router advertisements in response to solicitations.
10+
//! 3. Continuously soliciting link-local at a configurable rate to keep
11+
//! sessions alive and sending out notifications when peering arrangements
12+
//! expire due to not getting a solicitation response within a configurable
13+
//! time threshold.
14+
//!
15+
//! [`Version`] and [`DiscoveryError`] are platform-agnostic and stay in this
16+
//! module so the state machine type definitions in [`crate::sm`] continue to
17+
//! compile when the routing runtime is gated out (e.g. Linux test fixtures
18+
//! running ddmd with `--no-state-machine`). The runtime helpers that drive
19+
//! the protocol over UDPv6 sockets live in the [`runtime`] submodule and
20+
//! are illumos-only.
21+
//!
22+
//! ## Protocol
23+
//!
24+
//! The general sequence of events is depicted in the following diagram.
25+
//!
26+
//! *==========* *==========*
27+
//! | violin | | piano |
28+
//! *==========* *==========*
29+
//! | |
30+
//! | solicit(ff02::dd) |
31+
//! |-------------------------->|
32+
//! | advertise(fe80::47) |
33+
//! |<--------------------------|
34+
//! | |
35+
//! | ... |
36+
//! | |
37+
//! | |
38+
//! | solicit(ff02::dd) |
39+
//! |-------------------------->|
40+
//! | advertise(fe80::47) |
41+
//! |<--------------------------|
42+
//! | |
43+
//! | solicit(ff02::dd) |
44+
//! |-------------------------->|
45+
//! | solicit(ff02::dd) |
46+
//! |-------------------------->|
47+
//! | solicit(ff02::dd) |
48+
//! |-------------------------->|
49+
//! | |
50+
//! +----| |
51+
//! expire | | |
52+
//! piano | | |
53+
//! +--->| |
54+
//!
55+
//! This shows violin sending a link-local multicast solicitation over the wire.
56+
//! That solicitation is received by piano and piano respons with an
57+
//! advertisement to violin's link-local unicast address. From this point
58+
//! forward solicitations and responses continue. Each time violin gets a
59+
//! response from piano, it updates the last seen timestamp for piano. If at
60+
//! some point piano stops responding to solicitations and the last seen
61+
//! timestamp is older than the expiration threshold, violin will expire the
62+
//! session and send out a notification to the ddm state machine that started
63+
//! it. Violin will continue to send out solicitations in case piano comes back.
64+
//!
65+
//! In the event that piano undergoes renumbering e.g. it's link-local unicast
66+
//! address changes, this will be detected by violin and an advertisement update
67+
//! will be sent to the ddm state machine through the notification channel
68+
//! provided to the discovery subsystem.
69+
//!
70+
//! The DDM discovery multicast address is ff02::dd. Discovery packets are sent
71+
//! over UDP using port number 0xddd.
72+
//!
73+
//! ## Packets
74+
//!
75+
//! Discovery packets follow a very simple format
76+
//!
77+
//! 1 2 3
78+
//! 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
79+
//! +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
80+
//! | version |S A r r r r r r| router kind | hostname len |
81+
//! +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
82+
//! | hostname :
83+
//! +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
84+
//! : .... :
85+
//! +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
86+
//!
87+
//! The first byte indicates the version. The only valid version at present is
88+
//! version 1. The second byte is a flags bitfield. The first position `S`
89+
//! indicates a solicitation. The second position `A` indicates and
90+
//! advertisement. All other positions are reserved for future use. The third
91+
//! byte indicates the kind of router. Current values are 0 for a server router
92+
//! and 1 for a transit routers. The fourth byte is a hostname length followed
93+
//! directly by a hostname of up to 255 bytes in length.
94+
95+
use thiserror::Error;
96+
97+
#[cfg(all(feature = "illumos", target_os = "illumos"))]
98+
mod runtime;
99+
100+
#[cfg(all(feature = "illumos", target_os = "illumos"))]
101+
pub(crate) use runtime::handler;
102+
103+
#[derive(Debug, Copy, Clone)]
104+
#[repr(u8)]
105+
pub enum Version {
106+
V2 = 2,
107+
V3 = 3,
108+
}
109+
110+
#[derive(Error, Debug)]
111+
pub enum DiscoveryError {
112+
#[error("io error: {0}")]
113+
Io(#[from] std::io::Error),
114+
115+
#[error("serialization error: {0}")]
116+
Serialization(#[from] ispf::Error),
117+
}

0 commit comments

Comments
 (0)