Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
103 changes: 68 additions & 35 deletions doc/bmc/sonicBMC/pmon-bmc-design.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
* [2.1.4 BMC-Switch Host Interaction](#214-bmc-switch-host-interaction)
* [2.1.5 BMC leak_detection_and_thermal policy](#215-bmc-leak-detection-and-thermal-policy)
* [2.1.6 BMC event logging](#216-bmc-event-logging)
* [2.1.7 RTC clock in BMC](#217-rtc-clock-in-bmc)
* [2.2 BMC Platform Management](#22-bmc-platform-management)
* [2.2.1 BMC controller-bmcctld](#221-bmc-controller---bmcctld)
* [2.2.1.1 bmcctld on bmc](#2211-bmcctld-on-bmc)
Expand Down Expand Up @@ -84,19 +85,17 @@ The SONiC in BMC interoperates with the SONiC in Switch-Host as in below diagram

## 2. Detailed Architecture and workflows
### 2.1 BMC platform
Update the <vendor>/<platform>/platform_env.conf with the following flags,
Update the [vendor]/[platform]/platform_env.conf with the following flags,
```
switch_host=1
liquid_cooled=true
```
```
switch_bmc=1
liquid_cooled=true
```

* "liquid_cooled" flag is set to true on a liquid cooled switch OR hybrid cooled switch.
* "switch_host" flag is set to 1 on the switch host, "switch_bmc" flag is set to 1 on the switch BMC.

* "liquid_cooled" flag is not set in this file as the same BMC platform/sku could be used for both air cooled and liquid cooled devices.
So instead we will introduce a platform API to get whether this is a liquid cooled or air cooled device.

#### 2.1.1 BMC platform power up
When device is powered ON, the BMC powers first, boots up the sonic BMC which starts the various containers
Expand Down Expand Up @@ -292,6 +291,16 @@ The Leak detection is applicable only to Liquid cooling platform. The action is
The general syslogs will be placed in /var/log/syslog where /var/log directory will be mounted on **tmpfs**. Syslogs will be sent to remote server as well.
The Leak, Switch-Host state and interactions, Rack-manager interactions will be persistently stored on disk/eMMC in "/host/bmc/event.log" with log rotation enabled.

#### 2.1.7 RTC Clock in BMC
On most vendor platforms, the BMC RTC does not have a battery backup. As a result, the clock does not retain time across power cycles.

When the BMC powers on, the system time is initialized as follows:

1. Use the clock epoch file at "/usr/lib/clock-epoch" as the initial system time, if available. This is read by systemd during startup. (This file is updated regularly with a systemd timer service, this feature is coming in SONiC release 202605)
Comment thread
judyjoseph marked this conversation as resolved.
2. The chrony systemd service when it starts later synchronizes with remote NTP servers to obtain and maintain accurate time.

This sequence provides a reasonable initial timestamp at boot, followed by synchronization to an accurate time source via NTP.


### 2.2 BMC Platform Management

Expand All @@ -313,8 +322,9 @@ The bmc controller daemon "bmcctld" is started first in BMC pmon container. It a
Detailed workflow below

```
Sleep for SWITCH_HOST_POWER_ON_DELAY (this is configurable value in config_db)
This is to make sure the Rack Manager is up and Liquid flow rate is good.
if the previous reboot was a Cold Boot (Full Power Cycle, reboot cause : REBOOT_CAUSE_POWER_LOSS)
- Sleep for power_on_delay configured in CHASSIS_MODULE|SWITCH-HOST (this is configurable value in config_db)
- This is to make sure the Rack Manager is up and Liquid flow rate is good.

Check for any CRITICAL alert/leak in RACK_MANAGER_ALERT* tables or system SYSTEM_LEAK_STATUS table (device_leak_status == CRITICAL_SYSTEM_LEAK) in STATE_DB
NO External/System LEAK present
Expand Down Expand Up @@ -377,7 +387,7 @@ On an Event
```
- use GNOI framework to issue remote SOFT shutdown. The gnmi and sysmgr docker needs to be running on Switch-Host
REF: https://github.com/sonic-net/SONiC/blob/master/doc/mgmt/gnmi/gnoi_system_hld.md, https://github.com/sonic-net/SONiC/pull/1489
- start a timer based on graceful_shutdown_timeout configured in SWITCH_HOST_SHUTDOWN_TIMEOUT|default table.
- start a timer based on graceful_shutdown_timeout configured in CHASSIS_MODULE|SWITCH-HOST.
- if GNOI request came back SUCCESS or No response for GNOI request + Timer expired
- call platform API module->set_admin_state(DOWN) to power down the Switch-Host
- update the HOST_STATE|switch-host with the device_power_state.
Expand All @@ -396,24 +406,22 @@ On an Event
This section covers the various tables which this daemon creates/uses in Redis DB on BMC

```
key = SWITCH_HOST_POWER_ON_DELAY |default ; Config DB on BMC
key = CHASSIS_MODULE|SWITCH-HOST ; Config DB on BMC
; field = value
power_on_delay = integer ; Time in secs after power on the device, switch BMC can power on the Switch-Host. ( default = -1, Switch-Host remain powered off ).
; If non-zero and BMC receives POWER ON from Rack manager before this timeout + there are no critical events, BMC will power on Switch-Host.
admin_status = up | down ; default is down, keeps SWITCH-HOST powered down when device powers up.
power_on_delay = integer ; Time in secs BMC waits before powering on Switch-Host when device is powered ON.
; If BMC receives POWER ON from Rack manager before this timeout + there are no critical events, BMC will power on Switch-Host.
graceful_shutdown_timeout = integer ; Time in secs BMC waits for graceful shutdown before forcing power-off (default = 120sec).
; if set to 0, BMC will NOT do a graceful shutdown, instead will do POWER_OFF with platform API.

key = HOST_STATE|switch-host ; STATE_DB on BMC to store state of Switch-Host
; field = value
device_power_state = POWER_ON | POWER_OFF| GRACEFUL_SHUT | POWER_CYCLE ; What was the last action done on Switch-Host
device_status = ONLINE | OFFLINE ; current oper status of device, can use the platform API module->get_oper_state()
device_power_state = POWERED_ON | POWERED_OFF | GRACEFUL_SHUTDOWN | ; Represents the final and transitional power state of Switch-host
POWER_CYCLE | POWERING_ON | POWERING_OFF |
Comment thread
judyjoseph marked this conversation as resolved.
GRACEFUL_SHUTTING_DOWN | POWER_CYCLING
device_status = ONLINE | OFFLINE ; current oper status of device, from platform API module->get_oper_status()
last_change_timestamp = STR


key = SWITCH_HOST_SHUTDOWN_TIMEOUT|default ; Config DB on BMC
; field = value
graceful_shutdown_timeout = integer ; Time in secs the BMC will wait after SHUTDOWN command sent to Switch-Host. ( default = 120 sec ).
; if this timer expires, BMC will go ahead and direct POWER OFF switch-host with platform API
; if shutdown_timeout is 0, BMC will NOT do a graceful shutdown, instead will do POWER_OFF with platform API

```

#### 2.2.2 thermalctld
Expand Down Expand Up @@ -496,11 +504,11 @@ This base class is already defined in sonic-platform-common, additional new plat
| Method | Present | Action |
|---------|---------|----------|
| get_name() | Y | Get leak sensor name |
| is_leak() | Y | Is there a leak detected? **Applies debounce logic defined by <vendor>platform before reporting or clearing a leak** |
| is_leak() | Y | Is there a leak detected? **Only stable leak conditions** are asserted or cleared. This could be done by debounce logic in <vendor>platform/firmware |
| is_leak_sensor_ok() | New | Is the leak sensor OK or faulty ? |
| get_leak_sensor_type() | New | What type of leak sensor is this rope, flex_pcb, spot etc |
| get_leak_sensor_location() | New | Location of leak sensor |
| get_leak_severity() | New | Get the severity based on the criticality of the zone or how severe the leak is for a sensor for eg: more liquid presence |
| get_leak_severity() | New | Get the severity based on the criticality of the location/zone or how severe the leak is for a sensor for eg: more liquid presence |
| get_leak_profile() | New | Returns the leak sensor profile associated with this leak sensor type. there will be a profile created per leak sensor type rope, flex_pcb, spot etc |

**Note**
Expand Down Expand Up @@ -611,7 +619,9 @@ This base class is already defined in sonic-platform-common.
| Method | Present | Action |
|---------|---------|----------|
| get_all_modules() | Y | Fetch managed modules here, Switch-Host Module object |

| get_reboot_cause()| Y | Fetch previous reboot cause, check if it is Cold Boot(Full Power Cycle, reboot cause : REBOOT_CAUSE_POWER_LOSS) |
| is_bmc() | New | Retrieves whether the sonic chassis instance is/has a BMC module |
| is_liquid_cooled() | New | Is this chassis liquid/hybrid cooled ? |

### 2.3 BMC CLI Commands

Expand All @@ -626,19 +636,22 @@ CLI to enable user to graceful power on/off the Switch-Host, and to configure po
Applicable to (LC, AC)

```
config chassis modules startup <Switch-Host>
config chassis modules startup SWITCH-HOST
- This command is to POWER ON the Switch Host from BMC
- Sets the "admin_status" attribute to up

config chassis modules shutdown <Switch-Host>
config chassis modules shutdown SWITCH-HOST
- This command is to graceful POWER OFF the Switch Host from BMC
- Sets the "admin_status" attribute to down
- Default admin_status of SWITCH-HOST is down which keeps SWITCH-HOST powered down when device powers up.

config chassis modules power-on-delay <Switch-Host> <seconds>
config chassis modules power-on-delay SWITCH-HOST <seconds>
- Configure the delay (in seconds) BMC waits after power-on before powering on the Switch-Host.
- Default = -1, Switch-Host remain powered off. This default value is selected as -1 so that in SI phase Switch-Host needs to be powered on manually.
- Default = 0, default is 0 secs which tells Switch-Host to power on immediately if admin_status is up
- If non-zero BMC receives a POWER ON from Rack Manager before this timeout elapses (and no critical events exist),
Switch-Host will be powered on immediately.

config chassis modules shutdown-timeout <Switch-Host> <seconds>
config chassis modules shutdown-timeout SWITCH-HOST <seconds>
- Configure the graceful-shutdown timeout (in seconds) BMC waits after sending a shutdown command
to the Switch-Host before forcing a hard power-off via the platform API.
- Default = 120sec.
Expand All @@ -650,13 +663,15 @@ config chassis modules shutdown-timeout <Switch-Host> <seconds>
##### DB schema

```
"CHASSIS_MODULE": {
"CHASSIS_MODULE": { ; In CONFIG_DB
"SWITCH-HOST": {
"admin_status": "up",
"power_on_delay": "300", ; Time in secs BMC waits before powering on Switch-Host (default = -1, Switch-Host remain powered off)
"graceful_shutdown_timeout" : "120" ; Time in secs BMC waits for graceful shutdown before forcing power-off (default = 120sec)
"admin_status": "up", ; admin_status up/down; default is down which keeps SWITCH-HOST powered down when device powers up.
Comment thread
judyjoseph marked this conversation as resolved.
"power_on_delay": "300", ; Time in secs BMC waits before powering on Switch-Host when device is powered ON.
; If BMC receives POWER ON from Rack manager before this timeout + there are no critical events, BMC will power on Switch-Host.
"graceful_shutdown_timeout" : "120" ; Time in secs BMC waits for graceful shutdown before forcing power-off (default = 120sec).
; if set to 0, BMC will NOT do a graceful shutdown, instead will do POWER_OFF with platform API.
}
}
}
```

* **config liquid-cool leak-control**
Expand All @@ -680,14 +695,14 @@ Applicable to (LC)
config liquid-cool leak-action [system|rack_mgr] [critical|minor] [syslog_only|graceful_shutdown|power_off]

- syslog_only : Log the event; no Switch-Host power action taken.
- graceful_shutdown: Issue a graceful GNOI shutdown to Switch-Host; force power-off after SWITCH_HOST_SHUTDOWN_TIMEOUT/graceful_shutdown_timeout if unresponsive.
- graceful_shutdown: Issue a graceful GNOI shutdown to Switch-Host; force power-off after graceful_shutdown_timeout (CHASSIS_MODULE|SWITCH-HOST) if unresponsive.
- power_off : Immediately power off Switch-Host via platform API module->set_admin_state(DOWN).
```

##### DB schema

```
"LEAK_CONTROL_POLICY": {
"LEAK_CONTROL_POLICY": { ; In CONFIG_DB
"system_leak_policy" : "enabled | disabled", ; enabled by default
"system_critical_leak_action" : "power_off", ; default is power_off
"system_minor_leak_action" : "syslog_only", ; default is syslog_only
Expand Down Expand Up @@ -741,6 +756,21 @@ show chassis module status

```

##### DB schema

```
"CHASSIS_MODULE_TABLE": { ; In STATE_DB
"SWITCH-HOST": {
"name": "SWITCH-HOST"
"desc": "Switch Host managed by BMC"
"slot": "N/A"
"serial": "[Serial-number]"
"oper_status": "ONLINE"
"admin_status": "up"
}
}
```

* **show platform leak control-policy**

Command to show leak control policy configuration
Expand Down Expand Up @@ -883,4 +913,7 @@ In case of a firmware upgrade which needs reboot of both Switch-Host and BMC, wi
2. Add support for more Rack manager commands via Redfish for reset_type like ForceRestart, GracefulRestart
3. Add support for ipv6 address to Host-Bmc-Link
4. Introduced the Hybrid cooling skus in this design document. Add more details on requirements and actions of various platform daemons in Switch-Host.
5. Add more details on the actions ( eg: DC personal checkup, RMA etc ) incase if there is a leak sensor faulty.
- Can we run the switch with one or more faulty sensor ?
- if there a faulty sensor and if location tells close to CPU/ASIC, should the switch be powered down and RAMed ?