[tempershow]: Read SFP temperature from xcvrd-managed tables#4522
Merged
Conversation
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Contributor
There was a problem hiding this comment.
Pull request overview
Updates the tempershow utility (invoked by show platform temperature) so SFP/transceiver temperature rows are sourced directly from xcvrd-managed STATE_DB tables, avoiding dependence on thermalctld publishing per-SFP rows into TEMPERATURE_INFO.
Changes:
- Add collection path for platform sensors from
TEMPERATURE_INFOand new SFP sensor collection fromTRANSCEIVER_DOM_TEMPERATURE+TRANSCEIVER_DOM_THRESHOLD. - Add SFP display-name mapping via
SfpUtilHelperto preserve legacyxSFP module <N> Templabels (with fallback behavior). - Make missing DB fields render as
N/Aand tighten DB key matching to<TABLE>|*.
a3ac7e8 to
0b2ded0
Compare
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
thermalctld no longer publishes per-SFP entries into TEMPERATURE_INFO. Update tempershow to additionally read SFP temperatures from the xcvrd-managed TRANSCEIVER_DOM_TEMPERATURE table, with thresholds from TRANSCEIVER_DOM_THRESHOLD: High TH <- temphighwarning Low TH <- templowwarning Crit High TH <- temphighalarm Crit Low TH <- templowalarm Existing TEMPERATURE_INFO consumers (chassis/PSU/fan sensors) continue to be displayed. Missing fields default to N/A instead of raising KeyError; the table key glob is also tightened to '<table>|*'. Signed-off-by: Vasundhara Volam <vvolam@microsoft.com>
0b2ded0 to
a1e3d9c
Compare
This was referenced May 6, 2026
- _init_sfp_util_helper: also catch SystemExit raised by platform_sfputil_helper load functions so the CLI falls back to logical port names instead of terminating. - _get_sfp_db_connections: keep successful per-namespace STATE_DB connections and skip only the failing namespace, instead of dropping all connections on any failure. - _collect_sfp_sensors: prefetch TRANSCEIVER_DOM_THRESHOLD and TRANSCEIVER_DOM_FLAG once per namespace via new _prefetch_table() to avoid an N+1 hgetall pattern per port. - _derive_sfp_warning: return 'True'/'False'/'N/A' strings to keep the JSON output's Warning field type consistent with platform sensor rows from TEMPERATURE_INFO. Signed-off-by: Vasundhara Volam <vvolam@microsoft.com>
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
prgeor
previously approved these changes
May 11, 2026
Junchao-Mellanox
approved these changes
May 11, 2026
Junchao-Mellanox
previously approved these changes
May 12, 2026
…e resolution On multi-ASIC platforms, SonicDBConfig.load_sonic_global_db_config() must be called before creating namespace-scoped SonicV2Connector instances. Without this, connect_to_all_dbs_for_ns() fails with 'validateNamespace: Initialize global DB config' and SFP temperatures from per-ASIC namespace STATE_DBs are not displayed. Add the standard isGlobalInit/load_sonic_global_db_config guard in TemperShow.__init__(), matching the pattern used by fdbshow, fdbclear, and nbrshow. Signed-off-by: Vasundhara Volam <vvolam@microsoft.com>
1c9ea92
Collaborator
|
/azp run |
1 similar comment
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
1 similar comment
|
Azure Pipelines successfully started running 1 pipeline(s). |
judyjoseph
approved these changes
May 13, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What I did
Update
tempershow(show platform temperature) to read SFP/transceiver temperatures from xcvrd-managed STATE_DB tables (TRANSCEIVER_DOM_TEMPERATUREfor the value,TRANSCEIVER_DOM_THRESHOLDfor thresholds,TRANSCEIVER_DOM_FLAGfor the warning state), instead of relying onTEMPERATURE_INFOentries published bythermalctld.This is the CLI-side companion to the
thermalctldchange that stops polling and publishing per-SFP temperature data intoTEMPERATURE_INFO(sonic-platform-daemons). After that change, SFP rows would silently disappear fromshow platform temperatureunless the CLI sources them directly from xcvrd's tables.Field mapping from xcvrd tables to tempershow columns:
TRANSCEIVER_DOM_TEMPERATUREtemperatureTRANSCEIVER_DOM_THRESHOLDtemphighwarningTRANSCEIVER_DOM_THRESHOLDtemplowwarningTRANSCEIVER_DOM_THRESHOLDtemphighalarmTRANSCEIVER_DOM_THRESHOLDtemplowalarmTRANSCEIVER_DOM_FLAGtempHWarn/tempLWarn/tempHAlarm/tempLAlarmNote: xcvrd's
TRANSCEIVER_DOM_FLAGtable uses camelCase field names (per the SFP APIget_transceiver_dom_flags()), whileTRANSCEIVER_DOM_THRESHOLDuses snake_case field names (perget_transceiver_threshold_info()). Both naming conventions are handled correctly.Additional behavior:
xSFP module <N> Temp, mapped via the sharedutilities_common.platform_sfputil_helper(used bysfpshow/sfputil), so multi-ASIC port-config handling stays consistent across CLIs. Falls back to the logical port name if the helper is unavailable.Warningcolumn isTruewhen any of the four temperature flags (tempHWarn,tempLWarn,tempHAlarm,tempLAlarm) inTRANSCEIVER_DOM_FLAGis asserted;Falsewhen all four are present and de-asserted;N/Awhen the flag table has no temperature flags for the port. This matches the legacy thermalctld semantics.Timestampcolumn uses the current wall-clock time formatted asYYYYMMDD HH:MM:SS, matching the legacy thermalctld behavior of stamping every poll with the current time.SonicDBConfig.load_sonic_global_db_config()to initialize global DB config before creating namespace-scoped connections. SFP rows are collected by iterating every front-end ASIC namespace'sSTATE_DB(viamulti_asic.get_front_end_namespaces()andmulti_asic.connect_to_all_dbs_for_ns()) and merging the results, so transceiver rows from per-namespace tables are not missed on multi-ASIC platforms.TEMPERATURE_INFO.N/Ainstead of raisingKeyError.<table>|*so unrelated keys are not accidentally matched.How I did it
_collect_platform_sensors()(existing TEMPERATURE_INFO data path) and a new_collect_sfp_sensors()that readsTRANSCEIVER_DOM_TEMPERATURE|*plus the matchingTRANSCEIVER_DOM_THRESHOLD|<port>andTRANSCEIVER_DOM_FLAG|<port>rows._init_sfp_util_helper()/_sfp_display_name()that delegate toutilities_common.platform_sfputil_helperfor logical -> physical port mapping, reusing the shared multi-ASIC-aware helper instead of re-implementing the porttab init logic._get_sfp_db_connections()to return oneSTATE_DBconnection per front-end ASIC namespace on multi-ASIC platforms (and a single host connection on single-ASIC), so transceiver tables are read from every namespace._derive_sfp_warning()to translate the four temperature flags inTRANSCEIVER_DOM_FLAGinto theTrue/False/N/Awarning state.SonicDBConfig.load_sonic_global_db_config()call (guarded byisGlobalInit()) in__init__to initialize global DB config for multi-ASIC namespace resolution. Without this,connect_to_all_dbs_for_ns()fails withvalidateNamespace: Initialize global DB config.Timestamponce pershowinvocation usingdatetime.now().strftime('%Y%m%d %H:%M:%S')so the format matches the legacyTEMPERATURE_INFO.timestampcolumn written by thermalctld.show()merges platform + SFP rows; both tabular and JSON (-j) outputs are supported with the same column set as before.How to verify it
tempershowand the companionthermalctld(which no longer publishes SFP entries toTEMPERATURE_INFO).show platform temperature.xSFP module <N> Temp.sonic-db-cli -n asic0 STATE_DB hget 'TRANSCEIVER_DOM_TEMPERATURE|<port>' temperature(on multi-ASIC) orredis-cli -n 6 hget 'TRANSCEIVER_DOM_TEMPERATURE|<port>' temperature(on single-ASIC).TRANSCEIVER_DOM_THRESHOLD|<port>fields.Warningreflects the temperature flag state inTRANSCEIVER_DOM_FLAG|<port>(Falsewhen all four temp flags are de-asserted,Trueif any is asserted,N/Aif the flags are not yet populated).Timestampis in the sameYYYYMMDD HH:MM:SSformat as the platform sensor rows and reflects the time the command was run.TEMPERATURE_INFO.show platform temperature -jstill produces well-formed JSON withSensor/Temperaturekeys.TEMPERATURE_INFO|*SFP*keys remaining:redis-cli -n 6 keys 'TEMPERATURE_INFO|*SFP*'.Tested on
asic0andasic1namespaces displayed correctly afterSonicDBConfig.load_sonic_global_db_config()fixPrevious command output (if the output of a command-line utility has changed)
(SFP rows sourced from
TEMPERATURE_INFOpopulated bythermalctld.)New command output (if the output of a command-line utility has changed)
Notes:
TRANSCEIVER_DOM_TEMPERATURE/TRANSCEIVER_DOM_THRESHOLD(values + thresholds) andTRANSCEIVER_DOM_FLAG(warning state) instead ofTEMPERATURE_INFO.xSFP module <N> Temp), so dashboards and parsers depending on the legacy naming continue to work.WarningandTimestampcolumns are populated for SFP rows with the same semantics and format as the legacy thermalctld output. Modules 65 and 66 showN/AforWarningbecause theirTRANSCEIVER_DOM_FLAGtable is not populated (no transceiver inserted on this device for those ports).