diff --git a/doc/bmc/sonicBMC/pmon-bmc-design.md b/doc/bmc/sonicBMC/pmon-bmc-design.md index 3b715c67939..a8a581a68d8 100644 --- a/doc/bmc/sonicBMC/pmon-bmc-design.md +++ b/doc/bmc/sonicBMC/pmon-bmc-design.md @@ -17,6 +17,7 @@ * [2.1.4 BMC-Switch Host Interaction](#214-bmc-switch-host-interaction) * [2.1.5 BMC leak_detection_and_thermal policy](#215-bmc-leak-detection-and-thermal-policy) * [2.1.6 BMC event logging](#216-bmc-event-logging) + * [2.1.7 RTC clock in BMC](#217-rtc-clock-in-bmc) * [2.2 BMC Platform Management](#22-bmc-platform-management) * [2.2.1 BMC controller-bmcctld](#221-bmc-controller---bmcctld) * [2.2.1.1 bmcctld on bmc](#2211-bmcctld-on-bmc) @@ -84,19 +85,17 @@ The SONiC in BMC interoperates with the SONiC in Switch-Host as in below diagram ## 2. Detailed Architecture and workflows ### 2.1 BMC platform -Update the //platform_env.conf with the following flags, +Update the [vendor]/[platform]/platform_env.conf with the following flags, ``` switch_host=1 -liquid_cooled=true ``` ``` switch_bmc=1 -liquid_cooled=true ``` -* "liquid_cooled" flag is set to true on a liquid cooled switch OR hybrid cooled switch. * "switch_host" flag is set to 1 on the switch host, "switch_bmc" flag is set to 1 on the switch BMC. - +* "liquid_cooled" flag is not set in this file as the same BMC platform/sku could be used for both air cooled and liquid cooled devices. + So instead we will introduce a platform API to get whether this is a liquid cooled or air cooled device. #### 2.1.1 BMC platform power up When device is powered ON, the BMC powers first, boots up the sonic BMC which starts the various containers @@ -292,6 +291,16 @@ The Leak detection is applicable only to Liquid cooling platform. The action is The general syslogs will be placed in /var/log/syslog where /var/log directory will be mounted on **tmpfs**. Syslogs will be sent to remote server as well. The Leak, Switch-Host state and interactions, Rack-manager interactions will be persistently stored on disk/eMMC in "/host/bmc/event.log" with log rotation enabled. +#### 2.1.7 RTC Clock in BMC +On most vendor platforms, the BMC RTC does not have a battery backup. As a result, the clock does not retain time across power cycles. + +When the BMC powers on, the system time is initialized as follows: + +1. Use the clock epoch file at "/usr/lib/clock-epoch" as the initial system time, if available. This is read by systemd during startup. (This file is updated regularly with a systemd timer service, this feature is coming in SONiC release 202605) +2. The chrony systemd service when it starts later synchronizes with remote NTP servers to obtain and maintain accurate time. + +This sequence provides a reasonable initial timestamp at boot, followed by synchronization to an accurate time source via NTP. + ### 2.2 BMC Platform Management @@ -313,8 +322,9 @@ The bmc controller daemon "bmcctld" is started first in BMC pmon container. It a Detailed workflow below ``` -Sleep for SWITCH_HOST_POWER_ON_DELAY (this is configurable value in config_db) -This is to make sure the Rack Manager is up and Liquid flow rate is good. +if the previous reboot was a Cold Boot (Full Power Cycle, reboot cause : REBOOT_CAUSE_POWER_LOSS) + - Sleep for power_on_delay configured in CHASSIS_MODULE|SWITCH-HOST (this is configurable value in config_db) + - This is to make sure the Rack Manager is up and Liquid flow rate is good. Check for any CRITICAL alert/leak in RACK_MANAGER_ALERT* tables or system SYSTEM_LEAK_STATUS table (device_leak_status == CRITICAL_SYSTEM_LEAK) in STATE_DB NO External/System LEAK present @@ -377,7 +387,7 @@ On an Event ``` - use GNOI framework to issue remote SOFT shutdown. The gnmi and sysmgr docker needs to be running on Switch-Host REF: https://github.com/sonic-net/SONiC/blob/master/doc/mgmt/gnmi/gnoi_system_hld.md, https://github.com/sonic-net/SONiC/pull/1489 - - start a timer based on graceful_shutdown_timeout configured in SWITCH_HOST_SHUTDOWN_TIMEOUT|default table. + - start a timer based on graceful_shutdown_timeout configured in CHASSIS_MODULE|SWITCH-HOST. - if GNOI request came back SUCCESS or No response for GNOI request + Timer expired - call platform API module->set_admin_state(DOWN) to power down the Switch-Host - update the HOST_STATE|switch-host with the device_power_state. @@ -396,24 +406,22 @@ On an Event This section covers the various tables which this daemon creates/uses in Redis DB on BMC ``` -key = SWITCH_HOST_POWER_ON_DELAY |default ; Config DB on BMC +key = CHASSIS_MODULE|SWITCH-HOST ; Config DB on BMC ; field = value -power_on_delay = integer ; Time in secs after power on the device, switch BMC can power on the Switch-Host. ( default = -1, Switch-Host remain powered off ). - ; If non-zero and BMC receives POWER ON from Rack manager before this timeout + there are no critical events, BMC will power on Switch-Host. +admin_status = up | down ; default is down, keeps SWITCH-HOST powered down when device powers up. +power_on_delay = integer ; Time in secs BMC waits before powering on Switch-Host when device is powered ON. + ; If BMC receives POWER ON from Rack manager before this timeout + there are no critical events, BMC will power on Switch-Host. +graceful_shutdown_timeout = integer ; Time in secs BMC waits for graceful shutdown before forcing power-off (default = 120sec). + ; if set to 0, BMC will NOT do a graceful shutdown, instead will do POWER_OFF with platform API. key = HOST_STATE|switch-host ; STATE_DB on BMC to store state of Switch-Host ; field = value -device_power_state = POWER_ON | POWER_OFF| GRACEFUL_SHUT | POWER_CYCLE ; What was the last action done on Switch-Host -device_status = ONLINE | OFFLINE ; current oper status of device, can use the platform API module->get_oper_state() +device_power_state = POWERED_ON | POWERED_OFF | GRACEFUL_SHUTDOWN | ; Represents the final and transitional power state of Switch-host + POWER_CYCLE | POWERING_ON | POWERING_OFF | + GRACEFUL_SHUTTING_DOWN | POWER_CYCLING +device_status = ONLINE | OFFLINE ; current oper status of device, from platform API module->get_oper_status() last_change_timestamp = STR - -key = SWITCH_HOST_SHUTDOWN_TIMEOUT|default ; Config DB on BMC -; field = value -graceful_shutdown_timeout = integer ; Time in secs the BMC will wait after SHUTDOWN command sent to Switch-Host. ( default = 120 sec ). - ; if this timer expires, BMC will go ahead and direct POWER OFF switch-host with platform API - ; if shutdown_timeout is 0, BMC will NOT do a graceful shutdown, instead will do POWER_OFF with platform API - ``` #### 2.2.2 thermalctld @@ -496,11 +504,11 @@ This base class is already defined in sonic-platform-common, additional new plat | Method | Present | Action | |---------|---------|----------| | get_name() | Y | Get leak sensor name | -| is_leak() | Y | Is there a leak detected? **Applies debounce logic defined by platform before reporting or clearing a leak** | +| is_leak() | Y | Is there a leak detected? **Only stable leak conditions** are asserted or cleared. This could be done by debounce logic in platform/firmware | | is_leak_sensor_ok() | New | Is the leak sensor OK or faulty ? | | get_leak_sensor_type() | New | What type of leak sensor is this rope, flex_pcb, spot etc | | get_leak_sensor_location() | New | Location of leak sensor | -| get_leak_severity() | New | Get the severity based on the criticality of the zone or how severe the leak is for a sensor for eg: more liquid presence | +| get_leak_severity() | New | Get the severity based on the criticality of the location/zone or how severe the leak is for a sensor for eg: more liquid presence | | get_leak_profile() | New | Returns the leak sensor profile associated with this leak sensor type. there will be a profile created per leak sensor type rope, flex_pcb, spot etc | **Note** @@ -611,7 +619,9 @@ This base class is already defined in sonic-platform-common. | Method | Present | Action | |---------|---------|----------| | get_all_modules() | Y | Fetch managed modules here, Switch-Host Module object | - +| get_reboot_cause()| Y | Fetch previous reboot cause, check if it is Cold Boot(Full Power Cycle, reboot cause : REBOOT_CAUSE_POWER_LOSS) | +| is_bmc() | New | Retrieves whether the sonic chassis instance is/has a BMC module | +| is_liquid_cooled() | New | Is this chassis liquid/hybrid cooled ? | ### 2.3 BMC CLI Commands @@ -626,19 +636,22 @@ CLI to enable user to graceful power on/off the Switch-Host, and to configure po Applicable to (LC, AC) ``` -config chassis modules startup +config chassis modules startup SWITCH-HOST - This command is to POWER ON the Switch Host from BMC + - Sets the "admin_status" attribute to up -config chassis modules shutdown +config chassis modules shutdown SWITCH-HOST - This command is to graceful POWER OFF the Switch Host from BMC + - Sets the "admin_status" attribute to down + - Default admin_status of SWITCH-HOST is down which keeps SWITCH-HOST powered down when device powers up. -config chassis modules power-on-delay +config chassis modules power-on-delay SWITCH-HOST - Configure the delay (in seconds) BMC waits after power-on before powering on the Switch-Host. - - Default = -1, Switch-Host remain powered off. This default value is selected as -1 so that in SI phase Switch-Host needs to be powered on manually. + - Default = 0, default is 0 secs which tells Switch-Host to power on immediately if admin_status is up - If non-zero BMC receives a POWER ON from Rack Manager before this timeout elapses (and no critical events exist), Switch-Host will be powered on immediately. -config chassis modules shutdown-timeout +config chassis modules shutdown-timeout SWITCH-HOST - Configure the graceful-shutdown timeout (in seconds) BMC waits after sending a shutdown command to the Switch-Host before forcing a hard power-off via the platform API. - Default = 120sec. @@ -650,13 +663,15 @@ config chassis modules shutdown-timeout ##### DB schema ``` - "CHASSIS_MODULE": { + "CHASSIS_MODULE": { ; In CONFIG_DB "SWITCH-HOST": { - "admin_status": "up", - "power_on_delay": "300", ; Time in secs BMC waits before powering on Switch-Host (default = -1, Switch-Host remain powered off) - "graceful_shutdown_timeout" : "120" ; Time in secs BMC waits for graceful shutdown before forcing power-off (default = 120sec) + "admin_status": "up", ; admin_status up/down; default is down which keeps SWITCH-HOST powered down when device powers up. + "power_on_delay": "300", ; Time in secs BMC waits before powering on Switch-Host when device is powered ON. + ; If BMC receives POWER ON from Rack manager before this timeout + there are no critical events, BMC will power on Switch-Host. + "graceful_shutdown_timeout" : "120" ; Time in secs BMC waits for graceful shutdown before forcing power-off (default = 120sec). + ; if set to 0, BMC will NOT do a graceful shutdown, instead will do POWER_OFF with platform API. } - } + } ``` * **config liquid-cool leak-control** @@ -680,14 +695,14 @@ Applicable to (LC) config liquid-cool leak-action [system|rack_mgr] [critical|minor] [syslog_only|graceful_shutdown|power_off] - syslog_only : Log the event; no Switch-Host power action taken. - - graceful_shutdown: Issue a graceful GNOI shutdown to Switch-Host; force power-off after SWITCH_HOST_SHUTDOWN_TIMEOUT/graceful_shutdown_timeout if unresponsive. + - graceful_shutdown: Issue a graceful GNOI shutdown to Switch-Host; force power-off after graceful_shutdown_timeout (CHASSIS_MODULE|SWITCH-HOST) if unresponsive. - power_off : Immediately power off Switch-Host via platform API module->set_admin_state(DOWN). ``` ##### DB schema ``` - "LEAK_CONTROL_POLICY": { + "LEAK_CONTROL_POLICY": { ; In CONFIG_DB "system_leak_policy" : "enabled | disabled", ; enabled by default "system_critical_leak_action" : "power_off", ; default is power_off "system_minor_leak_action" : "syslog_only", ; default is syslog_only @@ -741,6 +756,21 @@ show chassis module status ``` +##### DB schema + +``` + "CHASSIS_MODULE_TABLE": { ; In STATE_DB + "SWITCH-HOST": { + "name": "SWITCH-HOST" + "desc": "Switch Host managed by BMC" + "slot": "N/A" + "serial": "[Serial-number]" + "oper_status": "ONLINE" + "admin_status": "up" + } + } +``` + * **show platform leak control-policy** Command to show leak control policy configuration @@ -883,4 +913,7 @@ In case of a firmware upgrade which needs reboot of both Switch-Host and BMC, wi 2. Add support for more Rack manager commands via Redfish for reset_type like ForceRestart, GracefulRestart 3. Add support for ipv6 address to Host-Bmc-Link 4. Introduced the Hybrid cooling skus in this design document. Add more details on requirements and actions of various platform daemons in Switch-Host. +5. Add more details on the actions ( eg: DC personal checkup, RMA etc ) incase if there is a leak sensor faulty. + - Can we run the switch with one or more faulty sensor ? + - if there a faulty sensor and if location tells close to CPU/ASIC, should the switch be powered down and RAMed ?