From 8665761a8770c2cf271139cfeebf7e9b166d57b8 Mon Sep 17 00:00:00 2001 From: Judy Joseph Date: Thu, 16 Apr 2026 06:22:46 +0000 Subject: [PATCH 01/11] Revise BMC platform configuration details --- doc/bmc/sonicBMC/pmon-bmc-design.md | 56 ++++++++++++++--------------- 1 file changed, 28 insertions(+), 28 deletions(-) diff --git a/doc/bmc/sonicBMC/pmon-bmc-design.md b/doc/bmc/sonicBMC/pmon-bmc-design.md index 3b715c67939..af62f81691a 100644 --- a/doc/bmc/sonicBMC/pmon-bmc-design.md +++ b/doc/bmc/sonicBMC/pmon-bmc-design.md @@ -84,19 +84,17 @@ The SONiC in BMC interoperates with the SONiC in Switch-Host as in below diagram ## 2. Detailed Architecture and workflows ### 2.1 BMC platform -Update the //platform_env.conf with the following flags, +Update the [vendor]/[platform]/platform_env.conf with the following flags, ``` switch_host=1 -liquid_cooled=true ``` ``` switch_bmc=1 -liquid_cooled=true ``` -* "liquid_cooled" flag is set to true on a liquid cooled switch OR hybrid cooled switch. * "switch_host" flag is set to 1 on the switch host, "switch_bmc" flag is set to 1 on the switch BMC. - +* "liquid_cooled" flag is not set in this file as the same BMC platform/sku could be used for both air cooled and liquid cooled devices. + So instead we will introduce a platform API to get whether this is a liquid cooled or air cooled device. #### 2.1.1 BMC platform power up When device is powered ON, the BMC powers first, boots up the sonic BMC which starts the various containers @@ -313,7 +311,7 @@ The bmc controller daemon "bmcctld" is started first in BMC pmon container. It a Detailed workflow below ``` -Sleep for SWITCH_HOST_POWER_ON_DELAY (this is configurable value in config_db) +Sleep for power_on_delay configured in CHASSIS_MODULE|SWITCH-HOST (this is configurable value in config_db) This is to make sure the Rack Manager is up and Liquid flow rate is good. Check for any CRITICAL alert/leak in RACK_MANAGER_ALERT* tables or system SYSTEM_LEAK_STATUS table (device_leak_status == CRITICAL_SYSTEM_LEAK) in STATE_DB @@ -377,7 +375,7 @@ On an Event ``` - use GNOI framework to issue remote SOFT shutdown. The gnmi and sysmgr docker needs to be running on Switch-Host REF: https://github.com/sonic-net/SONiC/blob/master/doc/mgmt/gnmi/gnoi_system_hld.md, https://github.com/sonic-net/SONiC/pull/1489 - - start a timer based on graceful_shutdown_timeout configured in SWITCH_HOST_SHUTDOWN_TIMEOUT|default table. + - start a timer based on graceful_shutdown_timeout configured in CHASSIS_MODULE|SWITCH-HOST. - if GNOI request came back SUCCESS or No response for GNOI request + Timer expired - call platform API module->set_admin_state(DOWN) to power down the Switch-Host - update the HOST_STATE|switch-host with the device_power_state. @@ -396,10 +394,13 @@ On an Event This section covers the various tables which this daemon creates/uses in Redis DB on BMC ``` -key = SWITCH_HOST_POWER_ON_DELAY |default ; Config DB on BMC +key = CHASSIS_MODULE|SWITCH-HOST ; Config DB on BMC ; field = value -power_on_delay = integer ; Time in secs after power on the device, switch BMC can power on the Switch-Host. ( default = -1, Switch-Host remain powered off ). - ; If non-zero and BMC receives POWER ON from Rack manager before this timeout + there are no critical events, BMC will power on Switch-Host. +admin_status = up | down ; default is down, keeps SWITCH-HOST powered down when device powers up. +power_on_delay = integer ; Time in secs BMC waits before powering on Switch-Host when device is powered ON. + ; If BMC receives POWER ON from Rack manager before this timeout + there are no critical events, BMC will power on Switch-Host. +graceful_shutdown_timeout = integer ; Time in secs BMC waits for graceful shutdown before forcing power-off (default = 120sec). + ; if set to 0, BMC will NOT do a graceful shutdown, instead will do POWER_OFF with platform API. key = HOST_STATE|switch-host ; STATE_DB on BMC to store state of Switch-Host ; field = value @@ -407,13 +408,6 @@ device_power_state = POWER_ON | POWER_OFF| GRACEFUL_SHUT | POWER_CYCLE ; device_status = ONLINE | OFFLINE ; current oper status of device, can use the platform API module->get_oper_state() last_change_timestamp = STR - -key = SWITCH_HOST_SHUTDOWN_TIMEOUT|default ; Config DB on BMC -; field = value -graceful_shutdown_timeout = integer ; Time in secs the BMC will wait after SHUTDOWN command sent to Switch-Host. ( default = 120 sec ). - ; if this timer expires, BMC will go ahead and direct POWER OFF switch-host with platform API - ; if shutdown_timeout is 0, BMC will NOT do a graceful shutdown, instead will do POWER_OFF with platform API - ``` #### 2.2.2 thermalctld @@ -611,7 +605,8 @@ This base class is already defined in sonic-platform-common. | Method | Present | Action | |---------|---------|----------| | get_all_modules() | Y | Fetch managed modules here, Switch-Host Module object | - +| is_bmc() | New | Retrieves whether the sonic chassis instance is/has a BMC module | +| is_liquid_cooled_chassis() | New | Is this chassis liquid/hybrid cooled ? | ### 2.3 BMC CLI Commands @@ -626,19 +621,22 @@ CLI to enable user to graceful power on/off the Switch-Host, and to configure po Applicable to (LC, AC) ``` -config chassis modules startup +config chassis modules startup SWITCH-HOST - This command is to POWER ON the Switch Host from BMC + - Sets the "admin_status to up -config chassis modules shutdown +config chassis modules shutdown SWITCH-HOST - This command is to graceful POWER OFF the Switch Host from BMC + - Sets the "admin_status to down + - Default admin_status of SWITCH-HOST is down which keeps SWITCH-HOST powered down when device powers up. -config chassis modules power-on-delay +config chassis modules power-on-delay SWITCH-HOST - Configure the delay (in seconds) BMC waits after power-on before powering on the Switch-Host. - - Default = -1, Switch-Host remain powered off. This default value is selected as -1 so that in SI phase Switch-Host needs to be powered on manually. + - Default = 0, default is 0 secs which tells Switch-Host to power on immediately if admin_status is up - If non-zero BMC receives a POWER ON from Rack Manager before this timeout elapses (and no critical events exist), Switch-Host will be powered on immediately. -config chassis modules shutdown-timeout +config chassis modules shutdown-timeout SWITCH-HOST - Configure the graceful-shutdown timeout (in seconds) BMC waits after sending a shutdown command to the Switch-Host before forcing a hard power-off via the platform API. - Default = 120sec. @@ -652,11 +650,13 @@ config chassis modules shutdown-timeout ``` "CHASSIS_MODULE": { "SWITCH-HOST": { - "admin_status": "up", - "power_on_delay": "300", ; Time in secs BMC waits before powering on Switch-Host (default = -1, Switch-Host remain powered off) - "graceful_shutdown_timeout" : "120" ; Time in secs BMC waits for graceful shutdown before forcing power-off (default = 120sec) + "admin_status": "up", ; admin_status up/down; default is down which keeps SWITCH-HOST powered down when device powers up. + "power_on_delay": "300", ; Time in secs BMC waits before powering on Switch-Host when device is powered ON. + ; If BMC receives POWER ON from Rack manager before this timeout + there are no critical events, BMC will power on Switch-Host. + "graceful_shutdown_timeout" : "120" ; Time in secs BMC waits for graceful shutdown before forcing power-off (default = 120sec). + ; if set to 0, BMC will NOT do a graceful shutdown, instead will do POWER_OFF with platform API. } - } + } ``` * **config liquid-cool leak-control** @@ -680,7 +680,7 @@ Applicable to (LC) config liquid-cool leak-action [system|rack_mgr] [critical|minor] [syslog_only|graceful_shutdown|power_off] - syslog_only : Log the event; no Switch-Host power action taken. - - graceful_shutdown: Issue a graceful GNOI shutdown to Switch-Host; force power-off after SWITCH_HOST_SHUTDOWN_TIMEOUT/graceful_shutdown_timeout if unresponsive. + - graceful_shutdown: Issue a graceful GNOI shutdown to Switch-Host; force power-off after graceful_shutdown_timeout (CHASSIS_MODULE|SWITCH-HOST) if unresponsive. - power_off : Immediately power off Switch-Host via platform API module->set_admin_state(DOWN). ``` From 060fc3237884c319791d5f007636298f01a7330a Mon Sep 17 00:00:00 2001 From: Judy Joseph Date: Sat, 18 Apr 2026 23:07:21 +0000 Subject: [PATCH 02/11] Updated device_power_state in HOST_STATE|switch-host STATE_DB - to make it clear --- doc/bmc/sonicBMC/pmon-bmc-design.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/doc/bmc/sonicBMC/pmon-bmc-design.md b/doc/bmc/sonicBMC/pmon-bmc-design.md index af62f81691a..e0893a399d2 100644 --- a/doc/bmc/sonicBMC/pmon-bmc-design.md +++ b/doc/bmc/sonicBMC/pmon-bmc-design.md @@ -404,8 +404,10 @@ graceful_shutdown_timeout = integer ; key = HOST_STATE|switch-host ; STATE_DB on BMC to store state of Switch-Host ; field = value -device_power_state = POWER_ON | POWER_OFF| GRACEFUL_SHUT | POWER_CYCLE ; What was the last action done on Switch-Host -device_status = ONLINE | OFFLINE ; current oper status of device, can use the platform API module->get_oper_state() +device_power_state = POWER_ON | POWER_OFF | GRACEFUL_SHUT | POWER_CYCLE ; Stable: last completed power action on Switch-Host + | POWERING_ON | POWERING_OFF | POWER_CYCLING ; Transitional: written immediately before the platform call; + ; replaced by the stable value once the action completes +device_status = ONLINE | OFFLINE ; current oper status of device, from platform API module->get_oper_status() last_change_timestamp = STR ``` From 0c354818a2c4239a28bc75b4cad9b131c26de6cc Mon Sep 17 00:00:00 2001 From: Judy Joseph Date: Sat, 18 Apr 2026 23:20:46 +0000 Subject: [PATCH 03/11] Updated device_power_state in HOST_STATE|switch-host STATE_DB - to make it clear --- doc/bmc/sonicBMC/pmon-bmc-design.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/doc/bmc/sonicBMC/pmon-bmc-design.md b/doc/bmc/sonicBMC/pmon-bmc-design.md index e0893a399d2..7db1dee66af 100644 --- a/doc/bmc/sonicBMC/pmon-bmc-design.md +++ b/doc/bmc/sonicBMC/pmon-bmc-design.md @@ -404,10 +404,10 @@ graceful_shutdown_timeout = integer ; key = HOST_STATE|switch-host ; STATE_DB on BMC to store state of Switch-Host ; field = value -device_power_state = POWER_ON | POWER_OFF | GRACEFUL_SHUT | POWER_CYCLE ; Stable: last completed power action on Switch-Host - | POWERING_ON | POWERING_OFF | POWER_CYCLING ; Transitional: written immediately before the platform call; - ; replaced by the stable value once the action completes -device_status = ONLINE | OFFLINE ; current oper status of device, from platform API module->get_oper_status() +device_power_state = POWERED_ON | POWERED_OFF | GRACEFUL_SHUTDOWN | ; Represents the final and transitional power state of Switch-host + POWER_CYCLE | POWERING_ON | POWERING_OFF | + GRACEFUL_SHUTTING_DOWN | POWER_CYCLING +device_status = ONLINE | OFFLINE ; current oper status of device, from platform API module->get_oper_status() last_change_timestamp = STR ``` From cb98cb3dd4bfb25e03be74465cc2a4c29031e908 Mon Sep 17 00:00:00 2001 From: Judy Joseph Date: Sun, 19 Apr 2026 00:50:24 +0000 Subject: [PATCH 04/11] Update is_liquid_cooled API --- doc/bmc/sonicBMC/pmon-bmc-design.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/bmc/sonicBMC/pmon-bmc-design.md b/doc/bmc/sonicBMC/pmon-bmc-design.md index 7db1dee66af..18da72dad78 100644 --- a/doc/bmc/sonicBMC/pmon-bmc-design.md +++ b/doc/bmc/sonicBMC/pmon-bmc-design.md @@ -608,7 +608,7 @@ This base class is already defined in sonic-platform-common. |---------|---------|----------| | get_all_modules() | Y | Fetch managed modules here, Switch-Host Module object | | is_bmc() | New | Retrieves whether the sonic chassis instance is/has a BMC module | -| is_liquid_cooled_chassis() | New | Is this chassis liquid/hybrid cooled ? | +| is_liquid_cooled() | New | Is this chassis liquid/hybrid cooled ? | ### 2.3 BMC CLI Commands From 9524ae2d0e8008586f520d914a9bd89cf5aaef1e Mon Sep 17 00:00:00 2001 From: Judy Joseph Date: Tue, 28 Apr 2026 03:13:38 +0000 Subject: [PATCH 05/11] Update RTC Clock time management in BMC and usecase for get_reboot_cause() to check if it is cold boot --- doc/bmc/sonicBMC/pmon-bmc-design.md | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/doc/bmc/sonicBMC/pmon-bmc-design.md b/doc/bmc/sonicBMC/pmon-bmc-design.md index 18da72dad78..20580400a0e 100644 --- a/doc/bmc/sonicBMC/pmon-bmc-design.md +++ b/doc/bmc/sonicBMC/pmon-bmc-design.md @@ -290,6 +290,16 @@ The Leak detection is applicable only to Liquid cooling platform. The action is The general syslogs will be placed in /var/log/syslog where /var/log directory will be mounted on **tmpfs**. Syslogs will be sent to remote server as well. The Leak, Switch-Host state and interactions, Rack-manager interactions will be persistently stored on disk/eMMC in "/host/bmc/event.log" with log rotation enabled. +#### 2.1.7 RTC Clock in BMC + +On most vendor platforms, the BMC RTC does not have a battery backup. As a result, the clock does not retain time across power cycles. When the BMC powers on, the system time is initialized using the following priority order: + +1. Use the file `/host/image-${IMAGE_VER}/rw/usr/lib/clock-epoch` if available, as the initial time source for the chrony service. (Coming in SONiC release 202605; installed by the switch upgrade service during software upgrade.) +2. Use the platform API if supported, to retrieve the current time from the switch host to refine or override the initial time. +3. Finally, the chrony systemd service synchronizes with remote NTP servers to obtain and maintain accurate time on the BMC. + +This sequence ensures a reasonable initial timestamp at boot, followed by progressively more accurate time sources. + ### 2.2 BMC Platform Management @@ -311,8 +321,9 @@ The bmc controller daemon "bmcctld" is started first in BMC pmon container. It a Detailed workflow below ``` -Sleep for power_on_delay configured in CHASSIS_MODULE|SWITCH-HOST (this is configurable value in config_db) -This is to make sure the Rack Manager is up and Liquid flow rate is good. +if the previous reboot was a Cold Boot (Full Power Cycle) + - Sleep for power_on_delay configured in CHASSIS_MODULE|SWITCH-HOST (this is configurable value in config_db) + - This is to make sure the Rack Manager is up and Liquid flow rate is good. Check for any CRITICAL alert/leak in RACK_MANAGER_ALERT* tables or system SYSTEM_LEAK_STATUS table (device_leak_status == CRITICAL_SYSTEM_LEAK) in STATE_DB NO External/System LEAK present @@ -607,6 +618,7 @@ This base class is already defined in sonic-platform-common. | Method | Present | Action | |---------|---------|----------| | get_all_modules() | Y | Fetch managed modules here, Switch-Host Module object | +| get_reboot_cause()| Y | Fetch previous reboot cause, check if it is Cold Boot(Full Power Cycle) | | is_bmc() | New | Retrieves whether the sonic chassis instance is/has a BMC module | | is_liquid_cooled() | New | Is this chassis liquid/hybrid cooled ? | From d09720e28cbcce26512c49878cc5c463c037e45d Mon Sep 17 00:00:00 2001 From: Judy Joseph Date: Tue, 28 Apr 2026 21:01:05 +0000 Subject: [PATCH 06/11] Update the file which will have the time stored(/usr/lib/clock-epoch). Remove the platform API requirement to retrieve the time from Switch-Host, as it will be difficult to do it before systemd starts --- doc/bmc/sonicBMC/pmon-bmc-design.md | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/doc/bmc/sonicBMC/pmon-bmc-design.md b/doc/bmc/sonicBMC/pmon-bmc-design.md index 20580400a0e..2a00c5c0bce 100644 --- a/doc/bmc/sonicBMC/pmon-bmc-design.md +++ b/doc/bmc/sonicBMC/pmon-bmc-design.md @@ -17,6 +17,7 @@ * [2.1.4 BMC-Switch Host Interaction](#214-bmc-switch-host-interaction) * [2.1.5 BMC leak_detection_and_thermal policy](#215-bmc-leak-detection-and-thermal-policy) * [2.1.6 BMC event logging](#216-bmc-event-logging) + * [2.1.7 RTC clock in BMC](#217-rtc-clock-in-bmc) * [2.2 BMC Platform Management](#22-bmc-platform-management) * [2.2.1 BMC controller-bmcctld](#221-bmc-controller---bmcctld) * [2.2.1.1 bmcctld on bmc](#2211-bmcctld-on-bmc) @@ -291,14 +292,12 @@ The general syslogs will be placed in /var/log/syslog where /var/log directory w The Leak, Switch-Host state and interactions, Rack-manager interactions will be persistently stored on disk/eMMC in "/host/bmc/event.log" with log rotation enabled. #### 2.1.7 RTC Clock in BMC +On most vendor platforms, the BMC RTC does not have a battery backup. As a result, the clock does not retain time across power cycles. When the BMC powers on, the system time is initialized as follows: -On most vendor platforms, the BMC RTC does not have a battery backup. As a result, the clock does not retain time across power cycles. When the BMC powers on, the system time is initialized using the following priority order: +1. Use the clock epoch file at /usr/lib/clock-epoch as the initial system time, if available. This is read by systemd during startup. (Coming in SONiC release 202605) +2. The chrony sysyemd service when it starts later synchronizes with remote NTP servers to obtain and maintain accurate time. -1. Use the file `/host/image-${IMAGE_VER}/rw/usr/lib/clock-epoch` if available, as the initial time source for the chrony service. (Coming in SONiC release 202605; installed by the switch upgrade service during software upgrade.) -2. Use the platform API if supported, to retrieve the current time from the switch host to refine or override the initial time. -3. Finally, the chrony systemd service synchronizes with remote NTP servers to obtain and maintain accurate time on the BMC. - -This sequence ensures a reasonable initial timestamp at boot, followed by progressively more accurate time sources. +This sequence provides a reasonable initial timestamp at boot, followed by synchronization to an accurate time source via NTP. ### 2.2 BMC Platform Management From c2a3b71c089863beb5770930a3db672a95a1920a Mon Sep 17 00:00:00 2001 From: judyjoseph <53951155+judyjoseph@users.noreply.github.com> Date: Tue, 28 Apr 2026 15:07:05 -0700 Subject: [PATCH 07/11] Update RTC Clock initialization details in documentation Clarified initialization of system time in BMC RTC section and added details about the clock epoch file update. --- doc/bmc/sonicBMC/pmon-bmc-design.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/doc/bmc/sonicBMC/pmon-bmc-design.md b/doc/bmc/sonicBMC/pmon-bmc-design.md index 2a00c5c0bce..50a50c696a9 100644 --- a/doc/bmc/sonicBMC/pmon-bmc-design.md +++ b/doc/bmc/sonicBMC/pmon-bmc-design.md @@ -292,10 +292,12 @@ The general syslogs will be placed in /var/log/syslog where /var/log directory w The Leak, Switch-Host state and interactions, Rack-manager interactions will be persistently stored on disk/eMMC in "/host/bmc/event.log" with log rotation enabled. #### 2.1.7 RTC Clock in BMC -On most vendor platforms, the BMC RTC does not have a battery backup. As a result, the clock does not retain time across power cycles. When the BMC powers on, the system time is initialized as follows: +On most vendor platforms, the BMC RTC does not have a battery backup. As a result, the clock does not retain time across power cycles. -1. Use the clock epoch file at /usr/lib/clock-epoch as the initial system time, if available. This is read by systemd during startup. (Coming in SONiC release 202605) -2. The chrony sysyemd service when it starts later synchronizes with remote NTP servers to obtain and maintain accurate time. +When the BMC powers on, the system time is initialized as follows: + +1. Use the clock epoch file at "/usr/lib/clock-epoch" as the initial system time, if available. This is read by systemd during startup. (This file is updated regularly with a systemd timer service, this feature is coming in SONiC release 202605) +2. The chrony systemd service when it starts later synchronizes with remote NTP servers to obtain and maintain accurate time. This sequence provides a reasonable initial timestamp at boot, followed by synchronization to an accurate time source via NTP. From 2cc79b0bf4c8caa4bd00bee698552c28c5f44565 Mon Sep 17 00:00:00 2001 From: Judy Joseph Date: Sat, 2 May 2026 01:11:19 +0000 Subject: [PATCH 08/11] Update to make COLD_BOOT requirement clear and add faulty sensor handling as a future enhancement --- doc/bmc/sonicBMC/pmon-bmc-design.md | 24 +++++++++++++++++++++--- 1 file changed, 21 insertions(+), 3 deletions(-) diff --git a/doc/bmc/sonicBMC/pmon-bmc-design.md b/doc/bmc/sonicBMC/pmon-bmc-design.md index 50a50c696a9..ec2423bc040 100644 --- a/doc/bmc/sonicBMC/pmon-bmc-design.md +++ b/doc/bmc/sonicBMC/pmon-bmc-design.md @@ -322,7 +322,7 @@ The bmc controller daemon "bmcctld" is started first in BMC pmon container. It a Detailed workflow below ``` -if the previous reboot was a Cold Boot (Full Power Cycle) +if the previous reboot was a Cold Boot (Full Power Cycle, reboot cause : [REBOOT_CAUSE_POWER_LOSS](https://github.com/sonic-net/sonic-platform-common/blob/master/sonic_platform_base/chassis_base.py#L20)) - Sleep for power_on_delay configured in CHASSIS_MODULE|SWITCH-HOST (this is configurable value in config_db) - This is to make sure the Rack Manager is up and Liquid flow rate is good. @@ -663,7 +663,7 @@ config chassis modules shutdown-timeout SWITCH-HOST ##### DB schema ``` - "CHASSIS_MODULE": { + "CHASSIS_MODULE": { ; In CONFIG_DB "SWITCH-HOST": { "admin_status": "up", ; admin_status up/down; default is down which keeps SWITCH-HOST powered down when device powers up. "power_on_delay": "300", ; Time in secs BMC waits before powering on Switch-Host when device is powered ON. @@ -702,7 +702,7 @@ config liquid-cool leak-action [system|rack_mgr] [critical|minor] [syslog_only| ##### DB schema ``` - "LEAK_CONTROL_POLICY": { + "LEAK_CONTROL_POLICY": { ; In CONFIG_DB "system_leak_policy" : "enabled | disabled", ; enabled by default "system_critical_leak_action" : "power_off", ; default is power_off "system_minor_leak_action" : "syslog_only", ; default is syslog_only @@ -756,6 +756,21 @@ show chassis module status ``` +##### DB schema + +``` + "CHASSIS_MODULE_TABLE": { ; In STATE_DB + "SWITCH-HOST": { + "name": "SWITCH-HOST" + "desc": "Switch Host managed by BMC" + "slot": "N/A" + "serial": "[Serial-number]" + "oper_status": "ONLINE" + "admin_status": "up" + } + } +``` + * **show platform leak control-policy** Command to show leak control policy configuration @@ -898,4 +913,7 @@ In case of a firmware upgrade which needs reboot of both Switch-Host and BMC, wi 2. Add support for more Rack manager commands via Redfish for reset_type like ForceRestart, GracefulRestart 3. Add support for ipv6 address to Host-Bmc-Link 4. Introduced the Hybrid cooling skus in this design document. Add more details on requirements and actions of various platform daemons in Switch-Host. +5. Add more details on the actions ( eg: DC personal checkup, RMA etc ) incase if there is a leak sensor faulty. + - Can we run the switch with one or more faulty sensor ? + - if there a faulty sensor and if location tells close to CPU/ASIC, should the switch be powered down and RAMed ? From df5c9c2ef104e3da7d35b3f4f9ed7ca54ab5a450 Mon Sep 17 00:00:00 2001 From: Judy Joseph Date: Sat, 2 May 2026 01:15:22 +0000 Subject: [PATCH 09/11] minor updates --- doc/bmc/sonicBMC/pmon-bmc-design.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/bmc/sonicBMC/pmon-bmc-design.md b/doc/bmc/sonicBMC/pmon-bmc-design.md index ec2423bc040..5236e1f5898 100644 --- a/doc/bmc/sonicBMC/pmon-bmc-design.md +++ b/doc/bmc/sonicBMC/pmon-bmc-design.md @@ -322,7 +322,7 @@ The bmc controller daemon "bmcctld" is started first in BMC pmon container. It a Detailed workflow below ``` -if the previous reboot was a Cold Boot (Full Power Cycle, reboot cause : [REBOOT_CAUSE_POWER_LOSS](https://github.com/sonic-net/sonic-platform-common/blob/master/sonic_platform_base/chassis_base.py#L20)) +if the previous reboot was a Cold Boot (Full Power Cycle, reboot cause : REBOOT_CAUSE_POWER_LOSS) - Sleep for power_on_delay configured in CHASSIS_MODULE|SWITCH-HOST (this is configurable value in config_db) - This is to make sure the Rack Manager is up and Liquid flow rate is good. @@ -619,7 +619,7 @@ This base class is already defined in sonic-platform-common. | Method | Present | Action | |---------|---------|----------| | get_all_modules() | Y | Fetch managed modules here, Switch-Host Module object | -| get_reboot_cause()| Y | Fetch previous reboot cause, check if it is Cold Boot(Full Power Cycle) | +| get_reboot_cause()| Y | Fetch previous reboot cause, check if it is Cold Boot(Full Power Cycle, reboot cause : REBOOT_CAUSE_POWER_LOSS) | | is_bmc() | New | Retrieves whether the sonic chassis instance is/has a BMC module | | is_liquid_cooled() | New | Is this chassis liquid/hybrid cooled ? | From 0b8cb54b01f57383f6d58ee1291477e39027ad09 Mon Sep 17 00:00:00 2001 From: judyjoseph <53951155+judyjoseph@users.noreply.github.com> Date: Mon, 4 May 2026 18:40:22 -0700 Subject: [PATCH 10/11] Refine leak sensor function description and formatting --- doc/bmc/sonicBMC/pmon-bmc-design.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/doc/bmc/sonicBMC/pmon-bmc-design.md b/doc/bmc/sonicBMC/pmon-bmc-design.md index 5236e1f5898..5576916b923 100644 --- a/doc/bmc/sonicBMC/pmon-bmc-design.md +++ b/doc/bmc/sonicBMC/pmon-bmc-design.md @@ -508,7 +508,7 @@ This base class is already defined in sonic-platform-common, additional new plat | is_leak_sensor_ok() | New | Is the leak sensor OK or faulty ? | | get_leak_sensor_type() | New | What type of leak sensor is this rope, flex_pcb, spot etc | | get_leak_sensor_location() | New | Location of leak sensor | -| get_leak_severity() | New | Get the severity based on the criticality of the zone or how severe the leak is for a sensor for eg: more liquid presence | +| get_leak_severity() | New | Get the severity based on the criticality of the location/zone or how severe the leak is for a sensor for eg: more liquid presence | | get_leak_profile() | New | Returns the leak sensor profile associated with this leak sensor type. there will be a profile created per leak sensor type rope, flex_pcb, spot etc | **Note** @@ -638,11 +638,11 @@ Applicable to (LC, AC) ``` config chassis modules startup SWITCH-HOST - This command is to POWER ON the Switch Host from BMC - - Sets the "admin_status to up + - Sets the "admin_status" attribute to up config chassis modules shutdown SWITCH-HOST - This command is to graceful POWER OFF the Switch Host from BMC - - Sets the "admin_status to down + - Sets the "admin_status" attribute to down - Default admin_status of SWITCH-HOST is down which keeps SWITCH-HOST powered down when device powers up. config chassis modules power-on-delay SWITCH-HOST From cef555d059cf92898fb0202e8bc42ec72ff1406e Mon Sep 17 00:00:00 2001 From: Judy Joseph Date: Thu, 7 May 2026 16:56:01 +0000 Subject: [PATCH 11/11] Update the wording of is_leak platform API --- doc/bmc/sonicBMC/pmon-bmc-design.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/bmc/sonicBMC/pmon-bmc-design.md b/doc/bmc/sonicBMC/pmon-bmc-design.md index 5576916b923..a8a581a68d8 100644 --- a/doc/bmc/sonicBMC/pmon-bmc-design.md +++ b/doc/bmc/sonicBMC/pmon-bmc-design.md @@ -504,7 +504,7 @@ This base class is already defined in sonic-platform-common, additional new plat | Method | Present | Action | |---------|---------|----------| | get_name() | Y | Get leak sensor name | -| is_leak() | Y | Is there a leak detected? **Applies debounce logic defined by platform before reporting or clearing a leak** | +| is_leak() | Y | Is there a leak detected? **Only stable leak conditions** are asserted or cleared. This could be done by debounce logic in platform/firmware | | is_leak_sensor_ok() | New | Is the leak sensor OK or faulty ? | | get_leak_sensor_type() | New | What type of leak sensor is this rope, flex_pcb, spot etc | | get_leak_sensor_location() | New | Location of leak sensor |