[cfgmgrd] Add Monitor Link Group feature by srodd-nexthop · Pull Request #4523 · sonic-net/sonic-swss

srodd-nexthop · 2026-04-27T09:38:50Z

What I did

Implement the Monitor Link Group feature in cfgmgr. A monitor link group tracks a set of uplink interfaces and automatically forces downlink interfaces admin-down when the operational uplink count drops below the configured min-uplinks threshold, preventing traffic black-holing when upstream connectivity is lost.

Files changed:

File	Change
`cfgmgr/monitorlinkgroupmgr.cpp`	New — core state machine
`cfgmgr/monitorlinkgroupmgr.h`	New — class definition
`cfgmgr/intfmgrd.cpp`	Register MonitorLinkGroupMgr in the Select loop
`cfgmgr/portmgr.cpp` + `.h`	applyEffectiveAdminStatus, doMonitorLinkGroupMemberTask
`cfgmgr/portmgrd.cpp`	Subscribe to STATE_MONITOR_LINK_GROUP_MEMBER_TABLE
`cfgmgr/teammgr.cpp` + `.h`	Same pattern for PortChannel interfaces
`cfgmgr/teammgrd.cpp`	Subscribe to STATE_MONITOR_LINK_GROUP_MEMBER_TABLE
`tests/mock_tests/portmgr_ut.cpp`	Unit test update for new PortMgr constructor parameter
`tests/test_monitorlinkgroupmgr.py`	New VS integration tests

Why I did it

Prevents traffic black-holing in multi-homed server, aggregation, and border leaf deployments. When uplinks fail, servers connected to downlinks continue forwarding into a black hole. Monitor Link Group takes the downlinks admin-down to signal the failure and trigger failover.

Design

MonitorLinkGroupMgr runs as a sibling Orch inside intfmgrd. It subscribes to CONFIG_DB:MONITOR_LINK_GROUP and STATE_DB:PORT_TABLE/LAG_TABLE. The group state machine has three states:

DOWN: uplink count < min-uplinks; downlinks receive force_down
PENDING: uplink count >= min-uplinks AND link-up-delay > 0; timer running; downlinks remain force_down
UP: threshold met and timer elapsed (or link-up-delay = 0); downlinks receive allow_up

On initial group creation the link-up-delay is skipped if uplinks already meet the threshold.

Enforcement split: MonitorLinkGroupMgr writes allow_up / force_down per-interface to STATE_DB:MONITOR_LINK_GROUP_MEMBER. PortMgr and TeamMgr subscribe and merge this with the configured admin_status before applying the effective state to the kernel netdev. MonitorLinkGroupMgr never calls ip link set directly.

Multi-group membership: an interface can be uplink in one group and downlink in another simultaneously. force_down wins if any group forces it down, enabling transitive cascade topologies.

LAG identity is confirmed via CONFIG_DB lookup rather than name-prefix matching.

How I verified it

tests/test_monitorlinkgroupmgr.py VS integration tests: group lifecycle, uplink-down force-down and recovery, link-up-delay PENDING state, config admin_status override, cross-group cascade topology
intfmgrd, portmgrd, and teammgrd build cleanly

Companion PRs

sonic-swss-common: STATE_DB table name macros
sonic-yang-models: YANG model
sonic-utilities: show monitor-link CLI
SONiC/SONiC: HLD

HLD: sonic-net/SONiC#2308
Yang Changes: sonic-net/sonic-buildimage#27004
Show: sonic-net/sonic-utilities#4497
swss-common: sonic-net/sonic-swss-common#1181

mssonicbld · 2026-04-27T09:38:58Z

/azp run

azure-pipelines · 2026-04-27T09:39:08Z

Azure Pipelines successfully started running 1 pipeline(s).

mssonicbld · 2026-04-27T15:19:32Z

/azp run

azure-pipelines · 2026-04-27T15:19:43Z

Azure Pipelines successfully started running 1 pipeline(s).

mssonicbld · 2026-04-28T08:00:36Z

/azp run

azure-pipelines · 2026-04-28T08:00:51Z

Azure Pipelines successfully started running 1 pipeline(s).

mssonicbld · 2026-05-14T04:00:42Z

/azp run

azure-pipelines · 2026-05-14T04:00:53Z

Azure Pipelines successfully started running 1 pipeline(s).

mssonicbld · 2026-05-14T05:04:07Z

/azp run

azure-pipelines · 2026-05-14T05:04:18Z

Azure Pipelines successfully started running 1 pipeline(s).

mssonicbld · 2026-05-14T06:31:45Z

/azp run

azure-pipelines · 2026-05-14T06:31:55Z

Azure Pipelines successfully started running 1 pipeline(s).

Adds the Monitor Link Group feature to cfgmgr. A monitor link group tracks a set of uplink interfaces and automatically forces downlink interfaces admin-down when operational uplink count drops below the configured min-uplinks threshold, preventing traffic black-holing when upstream connectivity is lost. Design: - monitorlinkgroupmgr (new, runs inside intfmgrd): Subscribes to CONFIG_DB:MONITOR_LINK_GROUP and STATE_DB PORT/LAG tables. Maintains per-group state (up/down/pending) derived from the min-uplinks threshold. Writes STATE_DB:MONITOR_LINK_GROUP_STATE (group visibility) and STATE_DB:MONITOR_LINK_GROUP_MEMBER (per-port force-down signal consumed by portmgrd/teammgrd). Uses a SelectableTimer per group for the link-up-delay, which defers recovery to suppress churn from short uplink flaps. On initial group creation the delay is skipped if uplinks already meet the threshold. A port may serve as uplink in one group and downlink in another, enabling chain topologies where a force-down cascades transitively through a stack of groups. - portmgr: subscribes to STATE_DB:MONITOR_LINK_GROUP_MEMBER. applyEffectiveAdminStatus() combines CONFIG_DB admin_status with the monitor-link member state; the result is what actually reaches the kernel netdev. Both CONFIG_DB and STATE_DB change paths call the same helper. - teammgr: same pattern for PortChannel interfaces. LAG identity is confirmed via CONFIG_DB lookup rather than name-prefix matching. - portmgrd / teammgrd: pass STATE_MONITOR_LINK_GROUP_MEMBER_TABLE as a subscribed TableConnector so the consumer loop wakes on changes. - tests/test_monitorlinkgroupmgr.py: VS integration tests covering group lifecycle, uplink-down force-down and recovery, link-up-delay pending state, config admin_status interaction, and cross-group cascade topology. Companion PRs: sonic-yang-models (YANG model), sonic-utilities (show/config commands), SONiC/SONiC (HLD). Signed-off-by: Satishkumar Rodd <srodd@nexthop.ai>

Prioritize netdev_oper_status over oper_status when both fields are present in a STATE_DB notification. netdev_oper_status is authoritative for Ethernet interfaces; oper_status is used as fallback for PortChannels which do not have a netdev_oper_status field. Previously both fields triggered independent updateMonitorLinkInterfaceState calls within the same notification, which could cause redundant state transitions. Now only one call is made per notification using the correct field for the interface type. Signed-off-by: Satishkumar Rodd <srodd@nexthop.ai>

Bug fixes: - handleLinkupDelayExpired else-branch: STATE_DB stays "pending" after timer fires with insufficient uplinks; add writeMonitorLinkGroupStateToDb - doPortTableTask: DEL on STATE_PORT/LAG_TABLE ignored, causing ghost uplink counts; treat port removal as link-down - removeInterfaceFromGroup: called updateMonitorLinkGroupState on each removal causing intermediate state evaluations during interface list swap; deferred to end of updateGroupInterfaceLists - removeInterfaceFromGroup: role membership check tested any-role instead of the specified link_type; now role-specific - getInterfaceOperState: used PORTCHANNEL_PREFIX name check; replaced with try-port-then-lag table lookup Type / cleanup fixes: - down_group_count: declared as int, changed to uint32_t - doMonitorLinkGroupTask: retry return path was dead code; converted to void and removed the conditional wrapper in doTask - startLinkupDelayTimer: SelectableTimer constructed with 1s placeholder; use actual delay_seconds at construction time - PortMgr m_cfgLagMemberTable: initialized but never used; removed - doTask(SelectableTimer&): O(N) scan to find owning group; replaced with m_timerToGroup reverse map for O(1) lookup Refactor: - TeamMgr doLagTask and doMonitorLinkGroupMemberTask inlined the same monitor-link + config admin_status merge logic twice; extracted into applyEffectiveLagAdminStatus() helper, matching PortMgr's pattern Signed-off-by: Satishkumar Rodd <srodd@nexthop.ai>

…ycle detection Two HLD review action items wrapped together to land before opening the swss PR: R-4 (terminology rename) — reviewers (Judson Wilson @ Nvidia, seconded by Sudheer and Eswaran) flagged that the "uplink/downlink" wording is confusing because the same port can be uplink in one group and downlink in another (cross-role). Rename: MonitorLinkGroupInfo:: uplink_interfaces -> monitored_interfaces downlink_interfaces -> managed_interfaces uplink_up_count -> monitored_up_count min_uplinks -> min_monitored_links MonitorLinkInterfaceInfo:: uplink_groups -> monitored_groups downlink_groups -> managed_groups Method: updateDownlinkInterfacesForGroupStateChange -> updateManagedInterfacesForGroupStateChange CONFIG_DB field names: "uplinks" -> "monitored-links" "downlinks" -> "managed-links" "min-uplinks" -> "min-monitored-links" STATE_DB field names (MONITOR_LINK_GROUP_STATE): "uplinks" -> "monitored-links" "downlinks" -> "managed-links" Internal link_type literal: "uplink"->"monitored", "downlink"->"managed". R-6 (cycle detection) — reviewers (Anders Linn, seconded by Sudheer) flagged that chained groups with cross-referenced monitored/managed sets can deadlock. New MonitorLinkGroupMgr::wouldCreateDependencyCycle runs a DFS from the proposed group: edge G -> H exists iff some port appears in G.monitored AND H.managed (G's recovery waits on H). A cycle implies a no-recovery deadlock; the daemon rejects the SET (no STATE_DB write, SWSS_LOG_ERROR). The existing m_monitorLinkGroups is acyclic by induction (this check runs on every SET), so any newly introduced cycle must involve the proposed group — DFS starts there and back-edges signal a cycle. Helper parseInterfaceList factored out so the check and updateGroupInterfaceLists share the same parsing. VS test file updated for the rename (5 existing test cases). The R-6 VS tests have been moved to sonic-mgmt-mlg (cycle detection is end-to-end-testable; VS is sufficient for R-4 coverage). Signed-off-by: Satishkumar Rodd <srodd@nexthop.ai>

R-A from the HLD review action items. Operators today can read MLG state (UP/DOWN/PENDING) but not "when did this last change?", "how far into the link-up-delay are we?", or "is this flapping?" without grepping syslog. This patch surfaces that in STATE_DB so `show monitor-link` (companion PR) can render it directly. MonitorLinkGroupInfo gets seven new fields: bool has_state_change // false until first transition observed std::string last_state_change_from/to // "up" | "down" | "pending" std::string last_state_change_reason // human-readable system_clock::time_point last_state_change_time system_clock::time_point pending_start_time uint64_t up_to_down_count // direct UP -> DOWN only uint64_t down_to_up_count // any -> UP Two new methods: recordStateTransition(group, from, to, reason) Sets last_state_change_*, sets pending_start_time when entering PENDING, bumps counters per the rules above, emits SWSS_LOG_NOTICE. No-op when from == to. restoreGroupCountersFromStateDb(group) Reload up/down counters from STATE_DB on first sighting of a group in this process, so the counters survive an intfmgrd restart without going non-monotonic from a telemetry/SNMP consumer's perspective. Call sites wired up: updateMonitorLinkGroupState: DOWN -> PENDING, DOWN -> UP, PENDING -> DOWN (cancel), UP -> DOWN. The !should_be_up branch is restructured to make the (is_pending && !current_state) vs current_state cases mutually exclusive; adds a defensive SWSS_LOG_ERROR self-heal for the impossible state where both is_up and pending_up are true. handleLinkupDelayExpired: PENDING -> UP (timer expired) and PENDING -> DOWN (threshold lost late). handleGroupDelayChange: PENDING -> UP via delay -> 0 and via reduced delay already elapsed. handleMonitorLinkGroupSet: calls restoreGroupCountersFromStateDb on is_new_group right after updateGroupConfiguration. writeMonitorLinkGroupStateToDb emits new STATE_DB fields on MONITOR_LINK_GROUP_STATE: last_state_change_time (epoch seconds; only when has_state_change) last_state_change_from / _to / _reason (same condition) pending_start_time (epoch seconds; only when state == "pending") up_to_down_count, down_to_up_count (always) An explicit hdel of pending_start_time happens on every non-pending write so external consumers never see stale data after the timer fires. Reason text uses the renamed terminology, e.g.: "monitored N/M >= min-monitored-links K, starting link-up-delay" "link-up-delay expired (monitored N/M >= min-monitored-links K)" All new STATE_DB fields are optional; groups that never transitioned render exactly as before, preserving backward compat with the existing show CLI parser. Signed-off-by: Satishkumar Rodd <srodd@nexthop.ai>

…, drop reason field Two design simplifications on top of PR-A's transition-tracking work. A. Collapse up_to_down_count + down_to_up_count into a single total_transitions. Given the rules of the original asymmetric scheme: d2u: bumped on DOWN->UP and PENDING->UP u2d: bumped on UP->DOWN only the invariant d2u - u2d in {0, 1} held in all reachable states (d2u == u2d when currently DOWN or PENDING; d2u == u2d + 1 when currently UP). The two fields carried one bit of information beyond last_state_change_to + state. The replacement single counter is incremented on every recorded transition (DOWN->PENDING, PENDING->UP, PENDING->DOWN, DOWN->UP, UP->DOWN). A full UP->DOWN->UP cycle bumps it by 2. PENDING->DOWN now counts as a transition too -- previously it was a "near-miss-that-failed" silently excluded from u2d, but distinguishing it from a regular UP->DOWN is already done by last_state_change_from so the counter doesn't need to encode it. B. Drop last_state_change_reason. With last_state_change_{from,to} present, reason is derivable: to=='down' always means threshold lost, to=='up' or 'pending' always means threshold met. The only diagnostic detail the free-text reason carried was the snapshot count "monitored N/M" at transition time and the path (timer-expired vs delay-reduced for PENDING->UP). Neither answers the HLD review's core asks ("when did this last change?" / "is it flapping?") and both can be inferred from the pre-existing SWSS_LOG_NOTICE lines if needed. This drops the std::ostringstream reason; reason << ...; block at all 8 recordStateTransition call sites (5 in updateMonitorLinkGroupState, 2 in handleLinkupDelayExpired, 1 in handleGroupDelayChange wait... actually 2 in handleGroupDelayChange after counting). Net ~50 lines removed from the cpp. STATE_DB schema change on MONITOR_LINK_GROUP_STATE: removed: up_to_down_count, down_to_up_count, last_state_change_reason added: total_transitions All three removed fields were introduced in the prior PR-A commit (239e99c9), so no operator workflow depended on them yet. Signed-off-by: Satishkumar Rodd <srodd@nexthop.ai>

…nsitional shim STATE_MONITOR_LINK_GROUP_STATE_TABLE_NAME and STATE_MONITOR_LINK_GROUP_MEMBER_TABLE_NAME are added to sonic-swss-common/common/schema.h by sonic-net/sonic-swss-common#1181. Until that PR merges and the swss-common artifact pulled by sonic-swss CI carries those macros, this PR's build fails with 'STATE_MONITOR_LINK_GROUP_MEMBER_TABLE_NAME not declared in scope' in cfgmgr/teammgr*, portmgr*, and intfmgrd. Add #ifndef-guarded fallback definitions in cfgmgr/monitorlinkgroupmgr.h and include that header from the four cfgmgr files that reference the macros (portmgr.cpp, portmgrd.cpp, teammgr.cpp, teammgrd.cpp; the MLG manager already includes its own header). Once sonic-swss-common#1181 merges and schema.h carries these macros, the #ifndef guards skip the fallback and the upstream definitions take over. The shim can then be removed as a cleanup, but the #include lines stay since the files legitimately consume MLG-MEMBER state. Signed-off-by: Satishkumar Rodd <srodd@nexthop.ai>

The standalone sonic-swss CI uses a published sonic-swss-common artifact that does not yet contain the CFG_ macro (added by buildimage#27004 via YANG schema generation) or the STATE_ macros (added by swss-common#1181). Add CFG_MONITOR_LINK_GROUP_TABLE_NAME to the existing #ifndef-guarded shim block so the build links cleanly until both dependencies merge. Signed-off-by: Satishkumar Rodd <srodd@nexthop.ai>

mssonicbld · 2026-05-14T09:19:15Z

/azp run

azure-pipelines · 2026-05-14T09:19:26Z

Azure Pipelines successfully started running 1 pipeline(s).

The original doTask paths initialise admin_status to DEFAULT_ADMIN_STATUS_STR ("down") when the port/LAG first appears in CONFIG_DB without an explicit admin_status. The new applyEffective[Lag]AdminStatus helpers re-read CONFIG_DB and previously defaulted to up when the field was absent, which silently flipped a fresh port from down to up. The portmgr_ut DoTask test catches exactly this case (sets only speed/index, expects 'ip link set Ethernet0 down'). Switch both helpers to default to false to match DEFAULT_ADMIN_STATUS_STR. Signed-off-by: Satishkumar Rodd <srodd@nexthop.ai>

mssonicbld · 2026-05-18T11:52:41Z

/azp run

azure-pipelines · 2026-05-18T11:52:51Z

Azure Pipelines successfully started running 1 pipeline(s).

chrono::duration<int64_t>::count() returns long long on 32-bit ARM where long is 32-bit. Switch the two SWSS_LOG_INFO sites that print elapsed seconds to %lld with an explicit cast so -Werror=format doesn't trip the BuildArm armhf job. Signed-off-by: Satishkumar Rodd <srodd@nexthop.ai>

mssonicbld · 2026-05-19T03:42:18Z

/azp run

azure-pipelines · 2026-05-19T03:42:29Z

Azure Pipelines successfully started running 1 pipeline(s).

…linkgroupmgr The monitorlinkgroupmgr daemon prefers netdev_oper_status over oper_status when reading port state from STATE_DB:PORT_TABLE — netdev_oper_status is the authoritative kernel-netdev signal for Ethernet ports (see comment in doPortTableTask). Writing only oper_status is silently ignored if portsyncd in the dvs has already populated netdev_oper_status. This caused the 5 test_monitorlinkgroupmgr cases to fail in CI: with the real portsyncd writing netdev_oper_status=down at dvs boot, our test writes of oper_status=up were dropped, so the cfgmgr never saw the monitored interface come up and the group state machine never transitioned. Fix: write both netdev_oper_status and oper_status in set_port_oper. The netdev_oper_status write is what the cfgmgr will actually consume; the oper_status write keeps the entry self-consistent. Signed-off-by: Satishkumar Rodd <srodd@nexthop.ai>

mssonicbld · 2026-05-19T03:52:09Z

/azp run

azure-pipelines · 2026-05-19T03:52:19Z

Azure Pipelines successfully started running 1 pipeline(s).

Retrigger Test vstest + TestAsan vstest stages: previous run hit test_acl_mark.py flake plus Azure agent OOM (exit 137, 70 Killed pytest processes). All Build stages (amd64, BuildAsan, BuildArm arm64/armhf, BuildTrixie) pass on this head; only vstest stages need a healthier Azure agent. Signed-off-by: Satishkumar Rodd <srodd@nexthop.ai>

mssonicbld · 2026-05-19T15:35:09Z

/azp run

azure-pipelines · 2026-05-19T15:35:21Z

Azure Pipelines successfully started running 1 pipeline(s).

srodd-nexthop force-pushed the srodd.monitor-link-group.v2 branch from 049c3d8 to badc6b2 Compare April 28, 2026 08:00

srodd-nexthop mentioned this pull request May 12, 2026

[monitor-link-group] Add sonic-mgmt tests for Monitor Link Group feature sonic-net/sonic-mgmt#24555

Open

6 tasks

srodd-nexthop marked this pull request as ready for review May 12, 2026 12:18

srodd-nexthop requested review from judyjoseph and prsunny as code owners May 12, 2026 12:18

srodd-nexthop force-pushed the srodd.monitor-link-group.v2 branch from e311841 to c6f660a Compare May 14, 2026 04:00

srodd-nexthop added 8 commits May 14, 2026 09:18

srodd-nexthop force-pushed the srodd.monitor-link-group.v2 branch from d284807 to 5cb0148 Compare May 14, 2026 09:19

Conversation

srodd-nexthop commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What I did

Why I did it

Design

How I verified it

Companion PRs

Uh oh!

mssonicbld commented Apr 27, 2026

Uh oh!

azure-pipelines Bot commented Apr 27, 2026

Uh oh!

mssonicbld commented Apr 27, 2026

Uh oh!

azure-pipelines Bot commented Apr 27, 2026

Uh oh!

mssonicbld commented Apr 28, 2026

Uh oh!

azure-pipelines Bot commented Apr 28, 2026

Uh oh!

mssonicbld commented May 14, 2026

Uh oh!

azure-pipelines Bot commented May 14, 2026

Uh oh!

mssonicbld commented May 14, 2026

Uh oh!

azure-pipelines Bot commented May 14, 2026

Uh oh!

mssonicbld commented May 14, 2026

Uh oh!

azure-pipelines Bot commented May 14, 2026

Uh oh!

mssonicbld commented May 14, 2026

Uh oh!

azure-pipelines Bot commented May 14, 2026

Uh oh!

mssonicbld commented May 18, 2026

Uh oh!

azure-pipelines Bot commented May 18, 2026

Uh oh!

mssonicbld commented May 19, 2026

Uh oh!

azure-pipelines Bot commented May 19, 2026

Uh oh!

mssonicbld commented May 19, 2026

Uh oh!

azure-pipelines Bot commented May 19, 2026

Uh oh!

mssonicbld commented May 19, 2026

Uh oh!

azure-pipelines Bot commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

srodd-nexthop commented Apr 27, 2026 •

edited

Loading