Skip to content

[linux-6.6.y] efi/cper: Add Zhaoxin CPER error decode support#1701

Open
leoliu-oc wants to merge 6 commits into
deepin-community:linux-6.6.yfrom
leoliu-oc:linux-6.6.y-131-cper-error-decode
Open

[linux-6.6.y] efi/cper: Add Zhaoxin CPER error decode support#1701
leoliu-oc wants to merge 6 commits into
deepin-community:linux-6.6.yfrom
leoliu-oc:linux-6.6.y-131-cper-error-decode

Conversation

@leoliu-oc
Copy link
Copy Markdown
Contributor

@leoliu-oc leoliu-oc commented May 14, 2026

This patch series extends EFI CPER error reporting for Zhaoxin/Centaur x86 processors, with specific support for the KH-50000 family.

The series achieves the following:

  • Adds Zhaoxin-specific ZDI/ZPI error decoding and printing.
  • Adds KH-50000-specific ZDI/ZPI parsing behavior.
  • Adds support for Zhaoxin non-standard CPER sections: SVID and HIF.
  • Adds Zhaoxin microarchitectural CPU/shutdown error decoding.
  • Adds Zhaoxin cache error decoding.
  • Adds KH-50000 memory error decoding for non-standard Requestor ID usage.

The new logic is restricted to x86 builds and only activates for Zhaoxin/Centaur vendors.

Patch queue:

0001-efi-cper-Refactor-Zhaoxin-ZDI-ZPI-error-print-code.patch
0002-efi-cper-Add-Zhaoxin-ZDI-ZPI-error-decode-for-KH-50000.patch
0003-efi-cper-Add-Zhaoxin-non-standard-cper-error-decode.patch
0004-efi-cper-Add-Zhaoxin-micro-architectural-error-decod.patch
0005-efi-cper-Add-Zhaoxin-cache-error-decode.patch
0006-efi-cper-Add-Zhaoxin-mem-error-decode-for-KH-50000.patch

Summary by Sourcery

Extend EFI CPER error reporting with Zhaoxin/Centaur-specific decoding for processor, memory, and new vendor sections.

New Features:

  • Add Zhaoxin KH-40000 and KH-50000 specific ZDI/ZPI processor error classification and printing.
  • Introduce Zhaoxin-specific decoding for microarchitectural CPU/shutdown and cache error types in generic processor CPER sections.
  • Add Zhaoxin KH-50000 memory error decoding using non-standard Requestor ID semantics.
  • Support decoding and printing of Zhaoxin non-standard CPER sections SVID and HIF, including CXL and SNT-related errors.

Enhancements:

  • Restrict Zhaoxin/Centaur CPER decoding logic to x86 builds and vendors via boot CPU vendor/model checks.

leoliu-oc added 6 commits May 14, 2026 10:23
1. Wrap ZDI/ZPI related functions and strings with CONFIG_X86
   to restrict the code to X86 architecture only.

2. Remove __maybe_unused attribute from cper_print_proc_generic_zdi_zpi()
   since the function is now guarded by CONFIG_X86 and always used on X86.

3. Directly use zdi_zpi->responder_id as error type parameter
   instead of redundant local variable etype.

4. Clean up code formatting for better readability.

5. Replace IS_ENABLED(CONFIG_X86) with #ifdef CONFIG_X86
   for consistent preprocessor usage.

Signed-off-by: LeoLiu-oc <leoliu-oc@zhaoxin.com>
zhaoxin inclusion
category: feature

--------------------

The Zhaoxin KH-50000 processor's ZDI and ZPI support reporting additional
error types with more detailed analysis, hence the addition of KH-50000 ZDI
and ZPI error parsing.

Signed-off-by: LeoLiu-oc <leoliu-oc@zhaoxin.com>
zhaoxin inclusion
category: feature

--------------------

The Zhaoxin processor's HIF and SVID error types are not defined in the
UEFI spec and are reported via a non-standard CPER structure. This patch
adds support to decode these proprietary error records.

Signed-off-by: LeoLiu-oc <leoliu-oc@zhaoxin.com>
zhaoxin inclusion
category: feature

--------------------

Zhaoxin processors report detailed micro-architectural error
classifications by re-purposing the Responder ID and Requestor
ID fields. This patch adds support to decode these proprietary
error types into human-readable messages.

Signed-off-by: LeoLiu-oc <leoliu-oc@zhaoxin.com>
zhaoxin inclusion
category: feature

--------------------

Zhaoxin processors report detailed cache error types by re-purposing
the Responder ID field. This patch adds a decoder to translate these
numeric values into human-readable strings.

Signed-off-by: LeoLiu-oc <leoliu-oc@zhaoxin.com>
zhaoxin inclusion
category: feature

--------------------

Some memory error types on the Zhaoxin KH-50000 do not fit the
are reported by repurposing the Requestor ID field. Add parsing
logic to decode this non-standard usage and present human-readable
error messages.

Signed-off-by: LeoLiu-oc <leoliu-oc@zhaoxin.com>
@deepin-ci-robot deepin-ci-robot requested review from myml and winnscode May 14, 2026 02:40
@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented May 14, 2026

Reviewer's Guide

Extends EFI CPER decoding/printing for Zhaoxin/Centaur x86 CPUs (notably KH-40000 and KH-50000) by refactoring the ZDI/ZPI handler into Zhaoxin-specific helpers, adding microarchitectural/cache/memory error decoding, and wiring in support for new Zhaoxin CPER sections (SVID and HIF) with their associated GUIDs, validation bits, and payload structures.

Flow diagram for Zhaoxin CPER processor and memory decode paths

flowchart TD
    A[cper_estatus_print_section] -->|proc section| B[cper_print_proc_generic]
    A -->|mem section| C[cper_print_mem]

    B -->|vendor is ZHAOXIN/CENTAUR and CONFIG_X86| D[cper_print_proc_generic_zx]
    C -->|vendor is ZHAOXIN/CENTAUR and CONFIG_X86| E[cper_print_mem_zx]

    D -->|proc_error_type 0x1| F[cper_print_proc_generic_zx_cache]
    D -->|proc_error_type 0x4| G[cper_print_proc_generic_zdi_zpi]
    D -->|proc_error_type 0x8| H[cper_print_proc_generic_zx_micro_arch]

    G -->|x86_model 0x5b| I[cper_print_proc_generic_zdi_zpi_kh40000]
    G -->|x86_model 0x7b| J[cper_print_proc_generic_zdi_zpi_kh50000]
Loading

File-Level Changes

Change Details Files
Refactor and specialize ZDI/ZPI error decoding for Zhaoxin KH-40000 and KH-50000 processors, with stricter model/field validation.
  • Extend the ZDI/ZPI error type string table with additional PCIe link speed/width statuses under CONFIG_X86.
  • Constrain cper_zdi_zpi_err_type_str() to only return table values for supported Zhaoxin models (0x5b, 0x7b) and otherwise fall back to "unknown error".
  • Split the generic ZDI/ZPI printer into KH-40000- and KH-50000-specific helpers that interpret requestor_id/responder_id differently and add socket/bus/device context.
  • Introduce a new cper_print_proc_generic_zdi_zpi() dispatcher that selects the appropriate KH-40000/KH-50000 implementation based on boot_cpu_data and ignores other models.
drivers/firmware/efi/cper.c
Add Zhaoxin-specific CPU microarchitectural and cache error decoding and integrate it into generic processor CPER printing.
  • Define zx_micro_arch_cpu_err_type_strs and zx_micro_arch_shutdown_err_type_strs lookup tables for Zhaoxin CPU/shutdown errors.
  • Implement cper_print_proc_generic_zx_micro_arch() to decode CPU vs shutdown microarchitectural errors based on requestor_id and responder_id, gated by validation bits.
  • Define zx_cache_err_type_strs and cper_print_proc_generic_zx_cache() to decode L2/LLC cache errors using level and responder_id.
  • Introduce cper_print_proc_generic_zx() to route Zhaoxin CPER processor sections to cache, ZDI/ZPI, or micro-arch decoders depending on proc_error_type, and call this from cper_print_proc_generic() for Zhaoxin/Centaur vendors.
drivers/firmware/efi/cper.c
Add Zhaoxin KH-50000-specific memory error decoding based on non-standard Requestor ID usage.
  • Introduce zx_mem_err_type_strs lookup table for Zhaoxin memory error types.
  • Implement cper_print_mem_zx() to decode KH-50000 memory errors using requestor_id as an error type index, with CPU model checks and validation bit gating.
  • Invoke cper_print_mem_zx() from cper_print_mem() for Zhaoxin/Centaur x86 systems under CONFIG_X86.
drivers/firmware/efi/cper.c
Introduce support for Zhaoxin-specific SVID CPER sections, including GUID, validation bits, structure, and printer.
  • Define CPER_SEC_SVID GUID, CPER_SVID_VALID_* bitmasks, and a new cper_sec_svid payload struct in cper.h.
  • Add svid_error_type_strs table and cper_print_svid_err() helper to format SVID-related fields respecting validation_bits.
  • Extend cper_estatus_print_section() to recognize CPER_SEC_SVID and call cper_print_svid_err() when the payload length is sufficient.
drivers/firmware/efi/cper.c
include/linux/cper.h
Introduce support for Zhaoxin-specific HIF CPER sections, including GUID, validation bits, structure, and printer with nested SNT/CXL decoding.
  • Define CPER_SEC_HIF GUID, CPER_HIF_VALID_* bitmasks, and a new cper_sec_hif payload struct in cper.h capturing SNT and CXL error context.
  • Add cxl_error_type_strs and snt_error_type_strs lookup tables, plus helpers dump_cxl_error_type() and dump_hif_error_type() to decode CXL and SNT-related error data/addresses.
  • Implement cper_print_hif_err() to print socket/hnode information, per-channel DVAD error addresses, SNT error details, and CXL port errors based on validation_bits.
  • Extend cper_estatus_print_section() to recognize CPER_SEC_HIF and call cper_print_hif_err() when the payload length is sufficient.
drivers/firmware/efi/cper.c
include/linux/cper.h
Tighten x86-specific scoping of Zhaoxin CPER logic via CONFIG_X86 guards.
  • Wrap Zhaoxin-specific error tables and helpers (ZDI/ZPI, ZX micro-arch, cache, memory) in CONFIG_X86 preprocessor guards instead of the broader IS_ENABLED(CONFIG_X86) usage.
  • Limit Zhaoxin/ZDI integration in cper_print_proc_generic() and cper_print_mem() to x86 builds with Zhaoxin or Centaur vendors.
drivers/firmware/efi/cper.c

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@deepin-ci-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign avenger-285714 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@deepin-ci-robot
Copy link
Copy Markdown

Hi @leoliu-oc. Thanks for your PR.

I'm waiting for a deepin-community member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The new logic in cper_zdi_zpi_err_type_str() now depends on boot_cpu_data.x86_model for an exported symbol without checking the vendor; consider gating the model-specific behavior on Zhaoxin/Centaur vendors (falling back to the generic table otherwise) to avoid surprising behavior on other x86 CPUs that might call this helper.
  • The checks for specific Zhaoxin CPU families/models (e.g., boot_cpu_data.x86 == 0x7, x86_model == 0x5b/0x7b) are currently bare literals; introducing named constants or macros for these KH-40000/KH-50000 IDs would improve readability and make future maintenance of additional models clearer.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new logic in `cper_zdi_zpi_err_type_str()` now depends on `boot_cpu_data.x86_model` for an exported symbol without checking the vendor; consider gating the model-specific behavior on Zhaoxin/Centaur vendors (falling back to the generic table otherwise) to avoid surprising behavior on other x86 CPUs that might call this helper.
- The checks for specific Zhaoxin CPU families/models (e.g., `boot_cpu_data.x86 == 0x7`, `x86_model == 0x5b/0x7b`) are currently bare literals; introducing named constants or macros for these KH-40000/KH-50000 IDs would improve readability and make future maintenance of additional models clearer.

## Individual Comments

### Comment 1
<location path="drivers/firmware/efi/cper.c" line_range="286" />
<code_context>
+	if (etype == 0xf) {
+		pr_info("%s general processor error(zpi port 0x%llx error)\n",
+			pfx, zdi_zpi->requestor_id & 0xf);
+	} else if (etype >= 0x0 && etype <= 0xb) {
+		switch (zdi_zpi->requestor_id & 0xf) {
+		case 0x0:
</code_context>
<issue_to_address>
**issue:** The log message for KH50000 ZDI errors prints `etype` where it appears to intend the ZDI port nibble.

In `cper_print_proc_generic_zdi_zpi_kh50000()`, `etype` is `(zdi_zpi->requestor_id & 0xff) >> 4` (error type), while the ZDI port is `(zdi_zpi->requestor_id & 0xf)`. The log currently passes `etype` to the `%x` placeholder labeled as "zdi port", which is misleading. Please pass `(zdi_zpi->requestor_id & 0xf)` instead, or assign it to a local `port` variable and log that for clarity.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

if (etype == 0xf) {
pr_info("%s general processor error(zpi port 0x%llx error)\n",
pfx, zdi_zpi->requestor_id & 0xf);
} else if (etype >= 0x0 && etype <= 0xb) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: The log message for KH50000 ZDI errors prints etype where it appears to intend the ZDI port nibble.

In cper_print_proc_generic_zdi_zpi_kh50000(), etype is (zdi_zpi->requestor_id & 0xff) >> 4 (error type), while the ZDI port is (zdi_zpi->requestor_id & 0xf). The log currently passes etype to the %x placeholder labeled as "zdi port", which is misleading. Please pass (zdi_zpi->requestor_id & 0xf) instead, or assign it to a local port variable and log that for clarity.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Extends the EFI CPER error decoder in drivers/firmware/efi/cper.c with Zhaoxin/Centaur-specific decoding for processor (ZDI/ZPI, micro-architectural CPU/shutdown, cache), memory (KH-50000), and two new non-standard CPER section types (SVID and HIF, including CXL and SNT sub-decoding). Decoding is gated to x86 builds and (in most paths) Zhaoxin/Centaur vendors, with KH-40000 (family 7, model 0x5b) and KH-50000 (family 7, model 0x7b) handled separately.

Changes:

  • Adds per-model ZDI/ZPI dispatch (KH-40000 vs KH-50000) and extra error-type strings for KH-50000.
  • Adds Zhaoxin micro-architectural CPU/shutdown, cache, and KH-50000 memory error decoding.
  • Adds new CPER section GUIDs and structures (CPER_SEC_SVID, CPER_SEC_HIF) plus decoding/printing routines (including CXL and SNT sub-fields).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File Description
include/linux/cper.h Adds SVID/HIF section GUIDs, validation-bit defines, and cper_sec_svid / cper_sec_hif structures.
drivers/firmware/efi/cper.c Implements Zhaoxin processor/memory/SVID/HIF decoding, dispatches by family/model, and wires new GUIDs into cper_estatus_print_section.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +342 to +348
"exit smm mode if the reserved bits in CR4 are writed to 1 in smm mode",
"exit smm mode if CR4.VMXE bit is writed to 1 in smm mode",
"exit smm mode if CR4.PCIDE bit is writed to 1 and EFER.LMA bit is writed to 0 in smm mode",
"software inject an UC error after MCE changed to SMI happened",
"reserved",
"MCE happened when CR4 MCE is 0",
"MCE happened again when didn't clear MCIP in the first MCE handler",
"correctable error",
"uncorrectable error",
"multi correctable error",
"multi Uncorrectable error",
if (etype == 0xf) {
pr_info("%s general processor error(zpi port 0x%llx error)\n",
pfx, zdi_zpi->requestor_id & 0xf);
} else if (etype >= 0x0 && etype <= 0xb) {
Comment on lines +317 to +324
break;
case 0x7b:
cper_print_proc_generic_zdi_zpi_kh50000(pfx, zdi_zpi);
break;
default:
return;
}
}
Comment on lines 236 to 254
const char *cper_zdi_zpi_err_type_str(unsigned int etype)
{
switch (boot_cpu_data.x86_model) {
case 0x5b:
if (etype >= 0x13)
return "unknown error";
break;
case 0x7b:
if (etype == 0x6 || (etype >= 0xb && etype <= 0x12))
return "unknown error";
break;
default:
return "unknown error";
}
return etype < ARRAY_SIZE(zdi_zpi_err_type_strs) ?
zdi_zpi_err_type_strs[etype] : "unknown error";
zdi_zpi_err_type_strs[etype] :
"unknown error";
}
EXPORT_SYMBOL_GPL(cper_zdi_zpi_err_type_str);
Comment thread include/linux/cper.h
Comment on lines +567 to +587
struct cper_sec_svid {
u8 validation_bits;
u8 socket_id;
u8 svid_id;
u8 vrm_number;
u16 error_type;
u16 reserved;
};

struct cper_sec_hif {
u16 validation_bits;
u8 socket_id;
u8 hnod_id;
u8 snt_location;
u8 snt_error_type;
u8 snt_error_data[5];
u8 snt_error_addr[5];
u64 dvad_error_addr[6];
u64 cxl_decode_error_addr[2];
u8 cxl_error_type[2];
};
Comment on lines +1066 to +1081
} else if (guid_equal(sec_type, &CPER_SEC_SVID)) {
struct cper_sec_svid *svid_err = acpi_hest_get_payload(gdata);

printk("%ssection_type: SVID Error\n", newpfx);
if (gdata->error_data_length >= sizeof(*svid_err))
cper_print_svid_err(newpfx, svid_err);
else
goto err_section_too_small;
} else if (guid_equal(sec_type, &CPER_SEC_HIF)) {
struct cper_sec_hif *hif_err = acpi_hest_get_payload(gdata);

printk("%ssection_type: HIF Error\n", newpfx);
if (gdata->error_data_length >= sizeof(*hif_err))
cper_print_hif_err(newpfx, hif_err);
else
goto err_section_too_small;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants