Skip to content

Config/Show commands (newly added) for BMC, leak detection, policy#4417

Merged
judyjoseph merged 19 commits into
sonic-net:masterfrom
judyjoseph:bmc_impl
May 12, 2026
Merged

Config/Show commands (newly added) for BMC, leak detection, policy#4417
judyjoseph merged 19 commits into
sonic-net:masterfrom
judyjoseph:bmc_impl

Conversation

@judyjoseph
Copy link
Copy Markdown
Contributor

@judyjoseph judyjoseph commented Apr 2, 2026

What I did

Introduce config/show CLI commands for BMC chassis module management and liquid cooling leak detection, based on sonic-net/SONiC#2215, sonic-net/SONiC#2294


CLI Commands Added / Changed

NEW — config chassis modules power-on-delay SWITCH-HOST
config chassis modules power-on-delay <module_name> <seconds>
Argument Type Constraint Description
module_name string Must start with SWITCH-HOST Target chassis module
seconds integer >= 0 (default: 0) Delay before BMC powers on Switch-Host after boot. 0 = power on immediately.

Writes power_on_delay to CHASSIS_MODULE|<module_name> in CONFIG_DB.


NEW — config chassis modules shutdown-timeout SWITCH-HOST
config chassis modules shutdown-timeout <module_name> <seconds>
Argument Type Constraint Description
module_name string Must start with SWITCH-HOST Target chassis module
seconds integer >= 0 (default: 120) Graceful-shutdown timeout. 0 = immediate power-off, no graceful shutdown.

Writes graceful_shutdown_timeout to CHASSIS_MODULE|<module_name> in CONFIG_DB.


CHANGED — config chassis modules startup SWITCH-HOST

On BMC, uses mod_entry (instead of set_entry) to set admin_status = up while preserving power_on_delay and graceful_shutdown_timeout fields in the same CHASSIS_MODULE entry.

CHANGED — config chassis modules shutdown SWITCH-HOST

On BMC, uses mod_entry to set admin_status = down while preserving other fields. admin_status = down is the default for SWITCH-HOST (keeps it powered off at boot until explicitly started).


NEW — config liquid-cool leak-control
config liquid-cool leak-control [system|rack_mgr] [enabled|disabled]
Argument Options Description
policy_type system, rack_mgr Which policy scope to configure
state enabled, disabled Enable or disable leak detection for that scope

Writes system_leak_policy or rack_mgr_leak_policy to LEAK_CONTROL_POLICY|policy in CONFIG_DB.


NEW — config liquid-cool leak-action
config liquid-cool leak-action [system|rack_mgr] [critical|minor] [syslog_only|graceful_shutdown|power_off]
Argument Options Description
policy_type system, rack_mgr Which policy scope
severity critical, minor Leak event severity level
action syslog_only, graceful_shutdown, power_off Action to take on that event

Writes the corresponding action field to LEAK_CONTROL_POLICY|policy in CONFIG_DB.


NEW — show platform leak status
show platform leak status

Displays leak sensor state from LIQUID_COOLING_INFO in STATE_DB.

Name             Leak  Leak-sensor-status  leak-sensor-type  leak-severity
-----------  --------  ------------------  ----------------  -------------
leak_sensor1      YES                  OK              rope          MINOR
leak_sensor2       NO               FAULTY              spot             NA

NEW — show platform leak control-policy
show platform leak control-policy

Displays leak control policy from LEAK_CONTROL_POLICY|policy in CONFIG_DB.

 system_leak_policy              : enabled
 system_critical_leak_action     : power_off
 system_minor_leak_action        : syslog_only
 rack_mgr_leak_policy            : enabled
 rack_mgr_critical_alert_action  : syslog_only
 rack_mgr_minor_alert_action     : syslog_only

NEW — show platform leak rack-manager alerts
show platform leak rack-manager alerts

Displays rack-manager leak alerts from STATE_DB.

Alert                      Severity    Timestamp
-------------------------  ----------  -------------------------
Inlet_liquid_temperature   NORMAL      2026-03-25 22:10:00
Rack_level_leak            CRITICAL    2026-03-25 22:10:00

NEW — show platform leak profiles
show platform leak profiles

Displays leak sensor profiles (sensor type → max minor duration) from CONFIG_DB.

Sensor-Type    Max-Minor-Duration-Sec
-----------    ----------------------
rope                              300
spot                              600
flex_pcb                          180

CHANGED — show chassis modules status [module_name]

On BMC, Oper-Status is now sourced from the platform API (ModuleHelper.get_module_oper_status()) with fallback to STATE_DB. On non-BMC platforms, behaviour is unchanged.


How I did it

config/chassis_modules.py

  • Replaced is_switch_bmc with is_bmc() from utilities_common.chassis at all call sites
  • Removed unused datetime and timezone imports
  • Added power-on-delay subcommand (BMC-only, IntRange(min=0), default 0)
  • Added shutdown-timeout subcommand (BMC-only, IntRange(min=0), 0 = immediate power-off, default 120)
  • BMC path for startup/shutdown uses mod_entry to preserve sibling fields

config/liquid_cool.py (new file)

  • New CLI group liquid-cool with leak-control and leak-action subcommands
  • Writes to LEAK_CONTROL_POLICY|policy in CONFIG_DB

show/chassis_modules.py

  • On BMC: calls ModuleHelper.get_module_oper_status() (platform API) per module; falls back to STATE_DB value if N/A is returned

show/platform.py

  • Added leak command group under show platform with subcommands: status, control-policy, rack-manager alerts, profiles
  • LEAK_CONTROL_POLICY_KEY = 'policy'

utilities_common/chassis.py

  • Added is_bmc() helper wrapping device_info.is_switch_bmc() with hasattr guard

utilities_common/module.py

  • Added get_module_oper_status(module_name) to ModuleHelper; returns N/A on any failure

How to verify it

Tested the new CLIs

Previous command output (if the output of a command-line utility has changed)

New command output (if the output of a command-line utility has changed)

~

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

- show/platform.py: extract local variables to shorten lines >120 chars
- tests/liquidcool_test.py: add noqa: E402 for post-sys.path import
- tests/show_platform_leak_test.py: remove unused pytest import (F401),
  add noqa: E402 for post-sys.path import

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Judy Joseph <jujoseph@microsoft.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

judyjoseph and others added 2 commits April 2, 2026 18:46
…NG_INFO

Replace LIQUID_COOLING_DEVICE table with LIQUID_COOLING_INFO everywhere
to use a single unified table for liquid cooling/leak sensor state.

- show/platform.py: rename LIQUID_COOLING_DEVICE_TABLE constant to
  LIQUID_COOLING_INFO_TABLE and update leak_status() query
- tests/show_platform_leak_test.py: update mock table keys
- tests/leakage_status_test.py: update mock table keys

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Judy Joseph <jujoseph@microsoft.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@judyjoseph judyjoseph marked this pull request as ready for review April 3, 2026 20:11
Copilot AI review requested due to automatic review settings April 3, 2026 20:11
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds new SONiC CLI surfaces for liquid-cooling leak detection/policy controls and extends chassis module configuration for BMC-managed switch-host timing, aligning config/ and show/ command sets with the referenced SONiC changes.

Changes:

  • Introduces config liquidcool {leak-control, leak-action} to write leak policy/action settings to LEAK_CONTROL_POLICY.
  • Adds show platform leak {control-policy, profiles, status, rack-manager alerts} to display leak-related config/state.
  • Extends config chassis modules with BMC-only power-on-delay and shutdown-timeout for SWITCH-HOST, plus unit tests.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
show/platform.py Adds new show platform leak ... command group and DB access helpers.
config/liquidcool.py New config liquidcool group for leak policy/action configuration.
config/main.py Registers the new liquidcool config command group.
config/chassis_modules.py Allows SWITCH-HOST module name and adds BMC-only timing commands.
tests/show_platform_leak_test.py New unit tests for the show platform leak ... commands.
tests/liquidcool_test.py New unit tests for config liquidcool ... commands.
tests/leakage_status_test.py Updates leak status test to target the new show platform leak status.
tests/chassis_modules_test.py Adds unit tests for power-on-delay / shutdown-timeout commands.

Comment thread show/platform.py Outdated
Comment thread show/platform.py Outdated
Comment thread show/platform.py
Comment thread tests/chassis_modules_test.py Outdated
Comment thread config/chassis_modules.py Outdated
Comment thread config/chassis_modules.py Outdated
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

judyjoseph and others added 3 commits April 9, 2026 01:33
…imeout field, power-on-delay default -1

- Rename DB field 'shutdown_timeout' to 'graceful_shutdown_timeout' to match
  CHASSIS_MODULE schema in SONiC#2215 HLD
- Update power-on-delay default to -1 (Switch-Host remains powered off);
  allow -1 as a valid sentinel value while rejecting values < -1
- Update tests to match new field name and -1 sentinel behavior

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Judy Joseph <jujoseph@microsoft.com>
…unction

- Change Click argument type from float to int for power-on-delay and shutdown-timeout
- Rename set_shutdown_timeout -> set_graceful_shutdown_timeout
- Update test assertions to use integer string values (not float)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Judy Joseph <jujoseph@microsoft.com>
… decide Switch-Host powering on behavior on first boot
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Copy Markdown
Contributor

@yxieca yxieca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that these issues need to be addressed:

  • power_on_delay default set to -1 but CLI enforces non‑negative; make sure defaults and -- rxpected semantics match (doc says default 0 in help string).,
  • config/chassis_modules.py still imports datetime, timezone but not used (minor).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

config/chassis_modules.py:70

  • get_config_module_state() returns fvs['admin_status'] when a CHASSIS_MODULE entry exists. The new BMC timing commands (power-on-delay / shutdown-timeout) can create an entry that contains only power_on_delay / graceful_shutdown_timeout and no admin_status, which will raise KeyError and break startup/shutdown flows. Please make get_config_module_state() robust to missing admin_status (e.g., use .get() with the same platform-specific defaults), and/or ensure the timing commands populate admin_status when creating a new entry.
def get_config_module_state(db, chassis_module_name):
    config_db = db.cfgdb
    fvs = config_db.get_entry('CHASSIS_MODULE', chassis_module_name)
    if not fvs:
        if is_smartswitch():
            return 'down'
        elif is_bmc() and chassis_module_name.startswith("SWITCH-HOST"):
            # On BMC, SWITCH-HOST default is 'down' to keep it powered off on boot
            return 'down'
        else:
            return 'up'
    else:
        return fvs['admin_status']

Comment thread show/chassis_modules.py Outdated
Comment thread config/chassis_modules.py Outdated
Comment thread config/chassis_modules.py Outdated
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

config/chassis_modules.py:69

  • get_config_module_state() assumes that any existing CHASSIS_MODULE|<name> entry always contains admin_status (return fvs['admin_status']). With the newly added BMC timing commands (power-on-delay / shutdown-timeout) it’s possible to create/update an entry that only contains power_on_delay or graceful_shutdown_timeout, which will make this line raise KeyError and break startup/shutdown and any other caller. Consider using fvs.get('admin_status', <platform-appropriate default>) here and/or ensuring the timing commands always set a default admin_status when missing (e.g., down for SWITCH-HOST on BMC).
            return 'down'
        else:
            return 'up'
    else:
        return fvs['admin_status']

Comment thread show/chassis_modules.py
@judyjoseph
Copy link
Copy Markdown
Contributor Author

@benle7 @oleksandrivantsiv Could you reveiew this PR, just wanted to make sure it doesn't break any existing CLI for leak/liquid cooling

@judyjoseph
Copy link
Copy Markdown
Contributor Author

@benle7 @oleksandrivantsiv Could you reveiew this PR, just wanted to make sure it doesn't break any existing CLI for leak/liquid cooling

taking this change in, will raise a follow up PR to do adjustments

@judyjoseph judyjoseph merged commit 8711953 into sonic-net:master May 12, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants