Skip to content

[linux-nvidia-6.17-next] Add CXL Type-2 device support, RAS error handling, reset, state save/restore, and interleaving support#342

Closed
JiandiAnNVIDIA wants to merge 157 commits into
NVIDIA:24.04_linux-nvidia-6.17-nextfrom
JiandiAnNVIDIA:cxl_2026-03-04
Closed

[linux-nvidia-6.17-next] Add CXL Type-2 device support, RAS error handling, reset, state save/restore, and interleaving support#342
JiandiAnNVIDIA wants to merge 157 commits into
NVIDIA:24.04_linux-nvidia-6.17-nextfrom
JiandiAnNVIDIA:cxl_2026-03-04

Conversation

@JiandiAnNVIDIA
Copy link
Copy Markdown

@JiandiAnNVIDIA JiandiAnNVIDIA commented Mar 12, 2026

Description

This patch series adds comprehensive CXL (Compute Express Link) support to the
nvidia-6.17 kernel, including:

  1. CXL Type-2 device support - Enables accelerator devices (like GPUs and
    SmartNICs) to use CXL for coherent memory access via firmware-provisioned
    regions
  2. CXL RAS (Reliability, Availability, Serviceability) error handling -
    Implements PCIe Port Protocol error handling and logging for CXL Root Ports,
    Downstream Switch Ports, and Upstream Switch Ports
  3. CXL DVSEC and HDM state save/restore - Preserves CXL DVSEC control/range
    registers and HDM decoder programming across PCI resets and link transitions,
    enabling device re-initialization after reset for firmware-provisioned
    configurations
  4. CXL Reset support - Implements the CXL Reset method (CXL Spec v3.2,
    Sections 8.1.3, 9.6, 9.7) via a sysfs interface for Type-2 devices,
    including memory offlining, cache flushing, multi-function sibling
    coordination, and DVSEC reset sequencing
  5. Multi-level interleaving fix - Supports firmware-configured CXL
    interleaving where lower levels use smaller granularities than parent ports
    (reverse HPA bit ordering)
  6. Prerequisite CXL and PCI driver updates - Cherry-picked commits from
    upstream torvalds/master covering the range from v6.17.9 to the merge
    point of Terry Bowman's v14 series into v7.0
  7. CXL DAX support - Enables direct memory access to CXL RAM regions and
    mapping CXL DAX devices as System-RAM

Key Features Added:

  • CXL Type-2 accelerator device registration and memory management
  • CXL region creation by Type-2 drivers
  • DPA (Device Physical Address) allocation interface for accelerators
  • HPA (Host Physical Address) free space enumeration
  • Multi-level CXL address translation (SPA↔HPA↔DPA)
  • CXL protocol error detection, forwarding, and recovery
  • CXL RAS error handling for Endpoints, RCH, and Switch Ports
    (replacing the old PCIEAER_CXL symbol with the new CXL_RAS def_bool)
  • CXL extended linear cache region support
  • CXL DVSEC and HDM decoder state save/restore across PCI resets
  • CXL Reset sysfs interface (/sys/bus/pci/devices/.../cxl_reset) for
    Type-2 devices with Reset Capable bit set
  • Multi-function sibling coordination during CXL reset via Non-CXL
    Function Map DVSEC
  • CPU cache flush using cpu_cache_invalidate_memregion() during reset
  • Multi-level interleaving with smaller granularities for lower decoder
    levels (firmware-provisioned configurations)
  • CXL DAX device access (DEV_DAX_CXL) and System-RAM mapping
    (DEV_DAX_KMEM)
  • CXL protocol error injection via APEI EINJ (ACPI_APEI_EINJ_CXL)

Justification

CXL Type-2 device support is critical for next-generation NVIDIA accelerators
and data center workloads:

  • Enables coherent memory sharing between CPUs and accelerators
  • Supports firmware-provisioned CXL regions for accelerator memory
  • Provides proper error handling and reporting for CXL fabric errors
  • Enables device reset and state recovery for CXL Type-2 devices
  • Preserves firmware-programmed DVSEC and HDM decoder state across resets
  • Required for upcoming NVIDIA hardware with CXL capabilities

Source

Patch Breakdown (153 patches + 1 revert):

# Category Count Source
1 Revert old CXL reset (f198764) 1 OOT (cleanup)
2 Upstream CXL/PCI prerequisite cherry-picks 103 Upstream torvalds/master (v6.17.9 → merge of Terry Bowman v14 into v7.0)
3 Smita Koralahalli's CXL EINJ series v6 patch 3/9 1 LKML (v6, not yet merged)
4 Alejandro Lucero's CXL Type-2 series v23 22 LKML (v23, not yet merged)
5 Robert Richter's multi-level interleaving fix 1 LKML (v1, not yet merged)
6 Srirangan Madhavan's CXL state save/restore series 5 LKML (v1, not yet merged)
7 Srirangan Madhavan's CXL reset series 7 LKML (v5, not yet merged)
8 Upstream fixes for ported commits 14 Upstream torvalds/master (merged fixes + 1 prerequisite)
9 Config annotations update 3 OOT (build config)
TOTAL 157

Notes on the upstream cherry-picks (item 2):

The 103 upstream commits span 1bfd0faa78d0 (v6.17.9) to
0da3050bdded (Merge of for-7.0/cxl-aer-prep into cxl-for-next).
This range includes 17 out of 34 patches from Terry Bowman's v14 series
that were reworked by the CXL maintainer and merged into v7.0 via the
for-7.0/cxl-aer-prep branch. The remaining 17 patches from Terry's v14
were refactored into v15 (9 patches, not yet merged) and are not included
in this port.

Notes on the save/restore and reset series (items 6–7):

Srirangan's patches were authored against upstream v7.0-rc1 (which does not
include Alejandro's v23 Type-2 series). For this port, the header
reorganization in patch 2/5 of the save/restore series was adapted to align
with Alejandro's v23 approach: HDM decoder and register map definitions were
moved to include/cxl/cxl.h (not include/cxl/pci.h as in the original
patch) to follow the convention established by Alejandro's series. Upstream
reviewers have indicated that Srirangan's series should be rebased on top of
Alejandro's once it merges.

Notes on the upstream fixes (item 8):

14 upstream commits cherry-picked from torvalds/master to fix bugs
in the ported commits from items 2 and 6–7. These include 13 fixes
(identified via Fixes: tags in upstream) plus 1 prerequisite helper
function (port_to_host()) required by one of the fixes:

Fix (upstream SHA) Date Subject Fixes
822655e6751d 2026-02-23 cxl/port: Introduce port_to_host() helper (prerequisite for 0066688dbcdc)
88c72bab77aa 2025-12-04 cxl/region: fix format string for resource_size_t d6602e25819d (extended linear cache)
8fdc61faa730 2025-12-10 soc: renesas: Fix missing dependency on CACHEMAINT_FOR_DMA 4d1608d0ab33 (cache Kconfig)
521cadb4b69e 2025-12-10 riscv: ERRATA_STARFIVE_JH7100: Fix missing dependency 4d1608d0ab33 (cache Kconfig)
8441c7d3bd6c 2026-01-07 cxl: Check for invalid addresses from translation funcs b78b9e7b7979 + c3dd67681c70
3e8aaacdad4f 2026-01-08 cxl/port: Fix target list for multiple decoders sharing dport 4f06d81e7c6a (defer dport)
49d106347913 2026-01-09 cxl/acpi: Restore HBIW check before dereferencing platform_data 4fe516d2ad1a (XOR calculations)
77b310bb7b5f 2026-02-02 cxl/region: Fix leakage in __construct_region() d6602e25819d (extended linear cache)
0066688dbcdc 2026-02-10 cxl/port: Hold port host lock during dport adding 4f06d81e7c6a (defer dport)
318c58852e68 2026-02-11 cxl/memdev: fix deadlock in cxl_memdev_autoremove() 29317f8dc6ed (cxl_memdev_attach)
0a70b7cd397e 2026-02-23 cxl: Test CXL_DECODER_F_LOCK as a bitmask 2230c4bdc412 (locked decoder)
9a6a2091324a 2026-03-01 cxl/mbox: Use proper endpoint validity check upon sanitize 29317f8dc6ed (cxl_memdev_attach)
3bfc213d4675 2026-03-09 soc: microchip: mpfs-mss-top-sysreg: Fix resource leak 4aac11c9a6e7 (microchip mfd)
27459f86a437 2026-03-09 soc: microchip: mpfs-control-scb: Fix resource leak 4aac11c9a6e7 (microchip mfd)

Lore Links:

Upstream Status:

Series Status
103 upstream cherry-picks ✅ Merged in torvalds/master (v7.0 range)
14 upstream fixes + prerequisite ✅ Merged in torvalds/master
Terry Bowman v14 (17 patches) ✅ Merged into v7.0 via for-7.0/cxl-aer-prep
Terry Bowman v15 (9 patches) ⏳ Under review, not needed for this port
Smita v6 patch 3/9 ⏳ Under review, not yet merged
Alejandro v23 (22 patches) ⏳ Under review, not yet merged
Robert Richter v1 (1 patch) ⏳ Under review, not yet merged
Srirangan save/restore (5 patches) ⏳ Under review, not yet merged
Srirangan cxl_reset v5 (7 patches) ⏳ Under review, not yet merged

Testing

Build Validation:

  • Built successfully for ARM64 4K page size kernel
  • Built successfully for ARM64 64K page size kernel

Config Verification:

CXL-related configs enabled as expected:

CONFIG_ACPI_APEI_EINJ_CXL=y
CONFIG_PCI_CXL=y
CONFIG_CXL_BUS=y
CONFIG_CXL_PCI=y
CONFIG_CXL_MEM_RAW_COMMANDS=y
CONFIG_CXL_ACPI=m
CONFIG_CXL_PMEM=m
CONFIG_CXL_MEM=y
CONFIG_CXL_FEATURES=y
# CONFIG_CXL_EDAC_MEM_FEATURES is not set
CONFIG_CXL_PORT=y
CONFIG_CXL_SUSPEND=y
CONFIG_CXL_REGION=y
# CONFIG_CXL_REGION_INVALIDATION_TEST is not set
CONFIG_CXL_RAS=y
# CONFIG_CACHEMAINT_FOR_HOTPLUG is not set
# CONFIG_SFC_CXL is not set
CONFIG_CXL_PMU=m
CONFIG_DEV_DAX=y
CONFIG_DEV_DAX_PMEM=m
CONFIG_DEV_DAX_HMEM=m
CONFIG_DEV_DAX_CXL=y
CONFIG_DEV_DAX_HMEM_DEVICES=y
CONFIG_DEV_DAX_KMEM=y
CONFIG_ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION=y
CONFIG_GENERIC_CPU_CACHE_MAINTENANCE=y

Runtime Testing:

  • Boot test on ARM64 system
  • CXL device enumeration test (ls /sys/bus/cxl/devices/)
  • CXL interleaving testing
  • CXL reset test (echo 1 > /sys/bus/pci/devices/<dev>/cxl_reset)
  • DVSEC save/restore verified (CXLCtl, Range registers preserved)

Notes

  • CONFIG_PCIEAER_CXL has been removed from Kconfig by upstream commit
    d18f1b7beadf (PCI/AER: Replace PCIEAER_CXL symbol with CXL_RAS).
    The debian.master annotation for PCIEAER_CXL=y is overridden to -
    in debian.nvidia-6.17/config/annotations.
  • CONFIG_CXL_BUS, CONFIG_CXL_PCI, CONFIG_CXL_MEM, CONFIG_CXL_PORT
    remain tristate (not bool) — the v14 series kept them as tristate,
    unlike earlier draft versions.
  • CONFIG_DEV_DAX, CONFIG_DEV_DAX_CXL, and CONFIG_DEV_DAX_KMEM are
    overridden from m (debian.master default) to y to support built-in
    CXL RAM region DAX access and System-RAM mapping.
  • CONFIG_PCI_CXL is a new hidden bool introduced by the save/restore
    series; auto-enabled when CXL_BUS=y. Gates compilation of
    drivers/pci/cxl.o for DVSEC and HDM state save/restore.
  • CONFIG_GENERIC_CPU_CACHE_MAINTENANCE and
    CONFIG_ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION are new configs
    introduced by the upstream cherry-picks; arm64 auto-selects both.
    cpu_cache_invalidate_memregion() is also used by the CXL reset
    series for cache flushing during reset.
  • Kernel config annotations updated in debian.nvidia-6.17/config/annotations
    to reflect all of the above changes.
  • Srirangan's save/restore series header reorganization was adapted to
    align with Alejandro's v23 approach (include/cxl/cxl.h instead of
    include/cxl/pci.h). See commit message on patch 2/5 for details.

@nvmochs
Copy link
Copy Markdown
Collaborator

nvmochs commented Mar 12, 2026

@JiandiAnNVIDIA Finished going through this PR and have some questions / comments...

862702c NVIDIA: VR: SAUCE: [Config] CXL config annotations for Type-2 device and RAS support

CONFIG_CXL_BUS:
debian.master/config/annotations:CONFIG_CXL_BUS                                  policy<{'amd64': 'm', 'arm64': 'm', 'armhf': 'm', 'ppc64el': 'm', 'riscv64': 'm', 's390x': 'n'}>

CONFIG_CXL_PCI:
debian.master/config/annotations:CONFIG_CXL_PCI                                  policy<{'amd64': 'm', 'arm64': 'm', 'armhf': 'm', 'ppc64el': 'm', 'riscv64': 'm'}>

CONFIG_CXL_MEM:
debian.master/config/annotations:CONFIG_CXL_MEM                                  policy<{'amd64': 'm', 'arm64': 'm', 'armhf': 'm', 'ppc64el': 'm', 'riscv64': 'm'}>

CONFIG_CXL_PORT:
debian.master/config/annotations:CONFIG_CXL_PORT                                 policy<{'amd64': 'm', 'arm64': 'm', 'armhf': 'm', 'ppc64el': 'm', 'riscv64': 'm'}>

CONFIG_FWCTL:
debian.master/config/annotations:CONFIG_FWCTL                                    policy<{'amd64': 'm', 'arm64': 'm', 'armhf': 'm', 'ppc64el': 'm', 'riscv64': 'm', 's390x': 'm'}>

CONFIG_ACPI_APEI_EINJ:
debian.master/config/annotations:CONFIG_ACPI_APEI_EINJ                           policy<{'amd64': 'm', 'arm64': 'm'}>

CONFIG_ACPI_APEI_EINJ_CXL:
debian.master/config/annotations:CONFIG_ACPI_APEI_EINJ_CXL                       policy<{'amd64': 'y', 'arm64': 'y’}>

These are already set in master annotations, why do we need to add them to nvidia annotations? They must be built-in now?

CONFIG_PCIEAER_CXL:
debian.master/config/annotations:CONFIG_PCIEAER_CXL                              policy<{'amd64': 'y', 'arm64': 'y', 'armhf': 'y', 'riscv64': 'y'}>

Should this be removed from master if no longer in the code?


I confirmed the LKML backports match their origin and appear to have proper tags.


1a7d28e PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h

Why is this pulling in all the PCI_IDE stuff? That is not part of the original commit.

This may pick easier if you also pick "f16469ee733a PCI/IDE: Enumerate Selective Stream IDE capabilities” before it. Or when fixing up the collision, don’t include the PCI_IDE content.


3639a51 cxl/test: Add cxl_test CFMWS support for extended linear cache
ad9e7da cxl: Simplify cxl_rd_ops allocation and handling

Since there was an adjustment with these, need to change “cherry picked from” to “backported from”.


8c829ab cxl/test: Add support for acpi extended linear cache
da588f1 cxl/test: Standardize CXL auto region size
7992ac7 acpi/hmat: Return when generic target is updated
d568723 cxl: Add a cached copy of target_map to cxl_decoder

Did these pick clean? There were some context differences in the diff, but git may have been able to handle the merge okay. Just want to double check.

@clsotog
Copy link
Copy Markdown
Collaborator

clsotog commented Mar 12, 2026

Do we need the last 3 annotations? they were not used at the last PR.

@jamieNguyenNVIDIA
Copy link
Copy Markdown
Collaborator

1a7d28e PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h

Why is this pulling in all the PCI_IDE stuff? That is not part of the original commit.

This may pick easier if you also pick "f16469ee733a PCI/IDE: Enumerate Selective Stream IDE capabilities” before it. Or when fixing up the collision, don’t include the PCI_IDE content.

Nice find. Not including the PCI_IDE content would seem like the cleaner approach here (in my opinion, at least).

Although if you wanted to go with the former, I think you may want this one "254599fc8301 PCI: Add PCIe Device 3 Extended Capability enumeration" in addition to the commit that Matt called out.

Other than that, all I have is this nit that Cursor found:

  1. Commit message typo in 197d61c ("PCI: Update CXL DVSEC definitions"): The conflict resolution note says "PCIE_DVSEC_CXL_CACHE_CAPABLE" but the code correctly uses "PCI_DVSEC_CXL_CACHE_CAPABLE" (no trailing 'E' in PCI). Harmless -- message only, code is correct.

@JiandiAnNVIDIA
Copy link
Copy Markdown
Author

JiandiAnNVIDIA commented Mar 13, 2026

@JiandiAnNVIDIA Finished going through this PR and have some questions / comments...

 CONFIG_CXL_BUS:
 debian.master/config/annotations:CONFIG_CXL_BUS                                  policy<{'amd64': 'm', 'arm64': 'm', 'armhf': 'm', 'ppc64el': 'm', 'riscv64': 'm', 's390x': 'n'}>
 
 CONFIG_CXL_PCI:
 debian.master/config/annotations:CONFIG_CXL_PCI                                  policy<{'amd64': 'm', 'arm64': 'm', 'armhf': 'm', 'ppc64el': 'm', 'riscv64': 'm'}>
 
 CONFIG_CXL_MEM:
 debian.master/config/annotations:CONFIG_CXL_MEM                                  policy<{'amd64': 'm', 'arm64': 'm', 'armhf': 'm', 'ppc64el': 'm', 'riscv64': 'm'}>
 
 CONFIG_CXL_PORT:
 debian.master/config/annotations:CONFIG_CXL_PORT                                 policy<{'amd64': 'm', 'arm64': 'm', 'armhf': 'm', 'ppc64el': 'm', 'riscv64': 'm'}>
 
 CONFIG_FWCTL:
 debian.master/config/annotations:CONFIG_FWCTL                                    policy<{'amd64': 'm', 'arm64': 'm', 'armhf': 'm', 'ppc64el': 'm', 'riscv64': 'm', 's390x': 'm'}>
 
 CONFIG_ACPI_APEI_EINJ:
 debian.master/config/annotations:CONFIG_ACPI_APEI_EINJ                           policy<{'amd64': 'm', 'arm64': 'm'}>
 
 CONFIG_ACPI_APEI_EINJ_CXL:
 debian.master/config/annotations:CONFIG_ACPI_APEI_EINJ_CXL                       policy<{'amd64': 'y', 'arm64': 'y’}>

These are already set in master annotations, why do we need to add them to nvidia annotations? They must be built-in now?

The HWQA and teams with NVBugs doing testing / debugging for type3 and type2 CXL devices need these as y.
I used cursor for help sorting through the dependencies combing through Kconfigs. Most of the cxl configs defaults to what CXL_BUS is.

Why m is NOT okay for CXL_BUS, CXL_PCI, CXL_MEM, CXL_PORT

These are tristate with default CXL_BUS. If CXL_BUS=m, everything defaults to m. With m, the CXL subsystem loads as modules — that's fine for normal memory expansion use cases but not for Type-2 device support, where the CXL infrastructure must be present before driver probe of the accelerator device.

CONFIG_PCIEAER_CXL:
debian.master/config/annotations:CONFIG_PCIEAER_CXL                              policy<{'amd64': 'y', 'arm64': 'y', 'armhf': 'y', 'riscv64': 'y'}>

Should this be removed from master if no longer in the code?

I initially removed this and added CONFIG_CXL_RAS in debian.master, thinking this is what Ubuntu kernel maintainer would do when they move to kernel v7.0 or above where Terry Bowman's patch that did this replacement is merged. But I thought for now Terry's patches (although merged in v7.0 already) is only applied to the linux-nvidia-6.17-next kernel. So I thought maybe just not change any debian.master and override in debian.nvidia-6.17.

I can remove it from debian.master and not add CONFIG_CXL_RAS to debian.master. Just add CONFIG_CXL_RAS to debian.nvidia-6.17

I confirmed the LKML backports match their origin and appear to have proper tags.

1a7d28e PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h

Why is this pulling in all the PCI_IDE stuff? That is not part of the original commit.

This may pick easier if you also pick "f16469ee733a PCI/IDE: Enumerate Selective Stream IDE capabilities” before it. Or when fixing up the collision, don’t include the PCI_IDE content.

Good catch. Will fix. This commit gave multiple conflicts because one of Nicolin's earlier commit "df59703f696a iommu/arm-smmu-v3: Allow ATS to be always on" added some stuff while this commit redefines it and moving it in a different place in the file. THE PCI_IDE_* stuff was the anchor of the upstream commit. Since the CXL DVSEC area also had a real conflict (from the NVIDIA SAUCE CXL_DVSEC_CACHE_CAPABLE commit 72bd823), the whole file had multiple conflict regions. The likely resolution was git checkout --theirs -- include/uapi/linux/pci_regs.h, which accepts the entire upstream file at the 0f7afd8 commit tree — including all the PCI_IDE_* content from f16469e that was never in the cherry-pick list.

3639a51 cxl/test: Add cxl_test CFMWS support for extended linear cache
ad9e7da cxl: Simplify cxl_rd_ops allocation and handling

Since there was an adjustment with these, need to change “cherry picked from” to “backported from”.

Will fix

8c829ab cxl/test: Add support for acpi extended linear cache
da588f1 cxl/test: Standardize CXL auto region size
7992ac7 acpi/hmat: Return when generic target is updated
d568723 cxl: Add a cached copy of target_map to cxl_decoder

Did these pick clean? There were some context differences in the diff, but git may have been able to handle the merge okay. Just want to double check.

These picks did not hit conflict. the git cherry-pick using 3-way merge was able to handle the merge. The 6.17 HWE branch already has Koba's commit and Vishal's zero size decoder fix commit. For example, Vishal's zero size decoder fix commit added something that'll cause lines shifts when picking d568723 cxl: Add a cached copy of target_map to cxl_decoder. And my conflict resolving changes on some commits causes subsequent commits to shift during 3-way merge.

@JiandiAnNVIDIA
Copy link
Copy Markdown
Author

Other than that, all I have is this nit that Cursor found:

1. Commit message typo in [197d61c](https://github.com/NVIDIA/NV-Kernels/commit/197d61c9b3ac5086340fac6a6a5764de6192ad27) ("PCI: Update CXL DVSEC definitions"): The conflict resolution note says "PCIE_DVSEC_CXL_CACHE_CAPABLE" but the code correctly uses "PCI_DVSEC_CXL_CACHE_CAPABLE" (no trailing 'E' in PCI). Harmless -- message only, code is correct.

Will fix.

@JiandiAnNVIDIA
Copy link
Copy Markdown
Author

Do we need the last 3 annotations? they were not used at the last PR.

Previous PR was picking anything touching drivers/cxl between 6.17.9 and 6.18-rc5 where Terry Bowman's v13 and Alejandro Lucero's v22's base was.
This PR is picking anything touching drivers/cxl between 6.17.9 and 7.0's point where Terry Bowman's v14's merge point was in 7.0.

So this PR pulled in more commits which include the following:

c460697 lib: Support ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION
which defines config GENERIC_CPU_CACHE_MAINTENANCE and config ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION in lib/Kconfig, adds the lib/cache_maint.c framework

4d873c5 arm64: Select GENERIC_CPU_CACHE_MAINTENANCE
which adds select GENERIC_CPU_CACHE_MAINTENANCE to arch/arm64/Kconfig, which in turn selects ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION

2ec3b54 cache: Support cache maintenance for HiSilicon SoC Hydra Home Agent
which creates drivers/cache/Kconfig with menuconfig CACHEMAINT_FOR_HOTPLUG and the HISI_SOC_HHA driver underneath it

  1. CONFIG_ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION — policy<{'amd64': 'y', 'arm64': 'y'}>
    debian.master only has amd64: y. But cherry-picked commit 4d873c5 (arm64: Select GENERIC_CPU_CACHE_MAINTENANCE) added a select ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION to arch/arm64/Kconfig, making it y on arm64 too. We override the policy to include arm64.

  2. CONFIG_GENERIC_CPU_CACHE_MAINTENANCE — policy<{'amd64': '-', 'arm64': 'y'}>
    Entirely new config introduced by the cherry-picked commits, selected by arch/arm64/Kconfig. Not in debian.master at all. x86 does not select it, so amd64: -. arm64 auto-selects it, so arm64: y.

  3. CONFIG_CACHEMAINT_FOR_HOTPLUG — policy<{'amd64': '-', 'arm64': 'n'}>
    Why: New optional config that becomes visible on arm64 once GENERIC_CPU_CACHE_MAINTENANCE is selected, but defaults to n. Not in debian.master. We explicitly set n on arm64 and - on amd64 (where it's hidden). This is the HiSilicon HHA driver menuconfig — not needed for NVIDIA platforms.

@nvmochs
Copy link
Copy Markdown
Collaborator

nvmochs commented Mar 13, 2026

@JiandiAnNVIDIA Finished going through this PR and have some questions / comments...

 CONFIG_CXL_BUS:
 debian.master/config/annotations:CONFIG_CXL_BUS                                  policy<{'amd64': 'm', 'arm64': 'm', 'armhf': 'm', 'ppc64el': 'm', 'riscv64': 'm', 's390x': 'n'}>
 
 CONFIG_CXL_PCI:
 debian.master/config/annotations:CONFIG_CXL_PCI                                  policy<{'amd64': 'm', 'arm64': 'm', 'armhf': 'm', 'ppc64el': 'm', 'riscv64': 'm'}>
 
 CONFIG_CXL_MEM:
 debian.master/config/annotations:CONFIG_CXL_MEM                                  policy<{'amd64': 'm', 'arm64': 'm', 'armhf': 'm', 'ppc64el': 'm', 'riscv64': 'm'}>
 
 CONFIG_CXL_PORT:
 debian.master/config/annotations:CONFIG_CXL_PORT                                 policy<{'amd64': 'm', 'arm64': 'm', 'armhf': 'm', 'ppc64el': 'm', 'riscv64': 'm'}>
 
 CONFIG_FWCTL:
 debian.master/config/annotations:CONFIG_FWCTL                                    policy<{'amd64': 'm', 'arm64': 'm', 'armhf': 'm', 'ppc64el': 'm', 'riscv64': 'm', 's390x': 'm'}>
 
 CONFIG_ACPI_APEI_EINJ:
 debian.master/config/annotations:CONFIG_ACPI_APEI_EINJ                           policy<{'amd64': 'm', 'arm64': 'm'}>
 
 CONFIG_ACPI_APEI_EINJ_CXL:
 debian.master/config/annotations:CONFIG_ACPI_APEI_EINJ_CXL                       policy<{'amd64': 'y', 'arm64': 'y’}>

These are already set in master annotations, why do we need to add them to nvidia annotations? They must be built-in now?

The HWQA and teams with NVBugs doing testing / debugging for type3 and type2 CXL devices need these as y. I used cursor for help sorting through the dependencies combing through Kconfigs. Most of the cxl configs defaults to what CXL_BUS is.

Why m is NOT okay for CXL_BUS, CXL_PCI, CXL_MEM, CXL_PORT

These are tristate with default CXL_BUS. If CXL_BUS=m, everything defaults to m. With m, the CXL subsystem loads as modules — that's fine for normal memory expansion use cases but not for Type-2 device support, where the CXL infrastructure must be present before driver probe of the accelerator device.

Understood. Thanks for explaining.


CONFIG_PCIEAER_CXL:
debian.master/config/annotations:CONFIG_PCIEAER_CXL                              policy<{'amd64': 'y', 'arm64': 'y', 'armhf': 'y', 'riscv64': 'y'}>

Should this be removed from master if no longer in the code?

I initially removed this and added CONFIG_CXL_RAS in debian.master, thinking this is what Ubuntu kernel maintainer would do when they move to kernel v7.0 or above where Terry Bowman's patch that did this replacement is merged. But I thought for now Terry's patches (although merged in v7.0 already) is only applied to the linux-nvidia-6.17-next kernel. So I thought maybe just not change any debian.master and override in debian.nvidia-6.17.

I can remove it from debian.master and not add CONFIG_CXL_RAS to debian.master. Just add CONFIG_CXL_RAS to debian.nvidia-6.17

Let's try this approach and see what Canonical says during their review.


8c829ab cxl/test: Add support for acpi extended linear cache
da588f1 cxl/test: Standardize CXL auto region size
7992ac7 acpi/hmat: Return when generic target is updated
d568723 cxl: Add a cached copy of target_map to cxl_decoder
Did these pick clean? There were some context differences in the diff, but git may have been able to handle the merge okay. Just want to double check.

These picks did not hit conflict. the git cherry-pick using 3-way merge was able to handle the merge. The 6.17 HWE branch already has Koba's commit and Vishal's zero size decoder fix commit. For example, Vishal's zero size decoder fix commit added something that'll cause lines shifts when picking d568723 cxl: Add a cached copy of target_map to cxl_decoder. And my conflict resolving changes on some commits causes subsequent commits to shift during 3-way merge.

Thanks for clarifying.

Copy link
Copy Markdown
Collaborator

@nirmoy nirmoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
patches looks fine to me. Make sure to do enough regression test on Grace and spark as discussed.

Comment thread drivers/cxl/core/region.c Outdated
if (cxld->interleave_ways != iw ||
(iw > 1 && cxld->interleave_granularity != ig) ||
!spa_maps_hpa(p, &cxld->hpa_range) ||
!region_res_match_cxl_range(p, &cxld->hpa_range) ||
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest revision introduces a compilation issue here:

drivers/cxl/core/region.c: In function ‘cxl_port_setup_targets’:
drivers/cxl/core/region.c:1716:22: error: implicit declaration of function ‘region_res_match_cxl_range’ [-Werror=implicit-function-declaration]
 1716 |                     !region_res_match_cxl_range(p, &cxld->hpa_range) ||

This function was renamed by 24366091ed5b

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Will fix this. I'm working to add save and restore, cxl reset series next. Was thinking to push as I go to get the interleaving, save and restore, and cxl reset series all applied then work through the issues. Wanted to push first as I had an accident previously where the entire directory with all my patch applied / conflict resolved was deleted and I had to start over.

@jamieNguyenNVIDIA
Copy link
Copy Markdown
Collaborator

Latest push looks good to me.

Acked-by: Jamie Nguyen <jamien@nvidia.com>

Copy link
Copy Markdown
Collaborator

@nirmoy nirmoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acked-by: Nirmoy Das <nirmoyd@nvidia.com>

@JiandiAnNVIDIA JiandiAnNVIDIA marked this pull request as draft March 20, 2026 14:54
@JiandiAnNVIDIA JiandiAnNVIDIA marked this pull request as ready for review March 23, 2026 05:51
@JiandiAnNVIDIA JiandiAnNVIDIA changed the title [linux-nvidia-6.17-next] Add CXL Type-2 device support and CXL RAS error handling [linux-nvidia-6.17-next] Add CXL Type-2 device support, RAS error handling, reset, and state save/restore Mar 23, 2026
@nvax-r
Copy link
Copy Markdown

nvax-r commented Mar 23, 2026

I've backported TLB related patches onto Jiandi's branch, if anyone is available, please review for me, thanks !
JiandiAnNVIDIA#1

@nvmochs
Copy link
Copy Markdown
Collaborator

nvmochs commented Mar 24, 2026

@JiandiAnNVIDIA A few additional comments / questions on your latest updates to the PR:

I confirmed that there were only 3 commits from the upstream dependency patches that differed from what I previously reviewed:

  • The two patches that changed the pick tag to "backported from" (thanks for fixing that!)
  • 15609f2 PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h, where I previously commented on its inclusion of the PCI_IDE content.
    • I confirmed that the PCI_IDE content is no longer present, but do have a new question (see below).

I also confirmed that the only change in the existing CONFIG annotations patches was the removal of CONFIG_PCIEAER_CXL from debian.master that I had previously requested (thanks for addressing this as well).

I verified all the NVIDIA:SAUCE patches that were present in my original review remain intact, unmodified.

For the new NVIDIA:SAUCE patches that added the CXL save/restore and cxl-reset support, except for a few commits that I call out below, I verified that the patches either picked clean or (in cases where modifications were made) that the backport notes were accurate.


Followup question on 15609f2 PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h:

In my original review, there was a patch just after this patch, 197d61c PCI: Update CXL DVSEC definitions, that is no longer present in the branch. It appears that the content from that patch was squashed into "15609f2ae03c PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h". Can you confirm this? Can we still include that patch?


0ce1557 NVIDIA: VR: SAUCE: PCI: Add HDM decoder state save/restore

Nit: Can you expand a bit on the backport note to include “why” cxl.h is needed? i.e. because of conflict resolution in "35460d55daed NVIDIA: VR: SAUCE: cxl: Move HDM decoder and register map definitions to include/cxl/cxl.h”


689a3a3 NVIDIA: VR: SAUCE: PCI: Add CXL DVSEC reset and capability register definitions
Minor nit, I think it was Nicolin’s patch that originally added PCI_DVSEC_CXL_CACHE_CAPABLE. Consider updating the backport note to reflect this.

@JiandiAnNVIDIA
Copy link
Copy Markdown
Author

Followup question on 15609f2 PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h:

In my original review, there was a patch just after this patch, 197d61c PCI: Update CXL DVSEC definitions, that is no longer present in the branch. It appears that the content from that patch was squashed into "15609f2ae03c PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h". Can you confirm this? Can we still include that patch?

Fixed. Include the original patch for commit to commit match with Terry Bowman's merged series.

0ce1557 NVIDIA: VR: SAUCE: PCI: Add HDM decoder state save/restore

Nit: Can you expand a bit on the backport note to include “why” cxl.h is needed? i.e. because of conflict resolution in "35460d55daed NVIDIA: VR: SAUCE: cxl: Move HDM decoder and register map definitions to include/cxl/cxl.h”

Fixed

689a3a3 NVIDIA: VR: SAUCE: PCI: Add CXL DVSEC reset and capability register definitions Minor nit, I think it was Nicolin’s patch that originally added PCI_DVSEC_CXL_CACHE_CAPABLE. Consider updating the backport note to reflect this.

Fixed.

@nirmoy
Copy link
Copy Markdown
Collaborator

nirmoy commented Mar 24, 2026

This patch "PCI: Update CXL DVSEC definitions" missed one rename

nvidia@localhost:/home/nvidia/NV-Kernels$ make
  CALL    scripts/checksyscalls.sh
  CC      drivers/pci/ats.o
drivers/pci/ats.c: In function ‘pci_cxl_ats_always_on’:
drivers/pci/ats.c:221:44: error: ‘CXL_DVSEC_PCIE_DEVICE’ undeclared (first use in this function); did you mean ‘PCI_DVSEC_CXL_DEVICE’?
  221 |                                            CXL_DVSEC_PCIE_DEVICE);
      |                                            ^~~~~~~~~~~~~~~~~~~~~
      |                                            PCI_DVSEC_CXL_DEVICE
drivers/pci/ats.c:221:44: note: each undeclared identifier is reported only once for each function it appears in
drivers/pci/ats.c:225:45: error: ‘CXL_DVSEC_CAP_OFFSET’ undeclared (first use in this function)
  225 |         pci_read_config_word(pdev, offset + CXL_DVSEC_CAP_OFFSET, &cap);
      |                                             ^~~~~~~~~~~~~~~~~~~~
make[4]: *** [scripts/Makefile.build:287: drivers/pci/ats.o] Error 1
make[3]: *** [scripts/Makefile.build:556: drivers/pci] Error 2
make[2]: *** [scripts/Makefile.build:556: drivers] Error 2
make[1]: *** [/home/nvidia/NV-Kernels/Makefile:2016: .] Error 2
make: *** [Makefile:248: __sub-make] Error 2

@nvmochs
Copy link
Copy Markdown
Collaborator

nvmochs commented Mar 24, 2026

Followup question on 15609f2 PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h:
In my original review, there was a patch just after this patch, 197d61c PCI: Update CXL DVSEC definitions, that is no longer present in the branch. It appears that the content from that patch was squashed into "15609f2ae03c PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h". Can you confirm this? Can we still include that patch?

Fixed. Include the original patch for commit to commit match with Terry Bowman's merged series.

Thanks for including this patch again. The backports for these two patches look much better now. Only need to address the remaining renames that Nirmoy pointed out.

0ce1557 NVIDIA: VR: SAUCE: PCI: Add HDM decoder state save/restore
Nit: Can you expand a bit on the backport note to include “why” cxl.h is needed? i.e. because of conflict resolution in "35460d55daed NVIDIA: VR: SAUCE: cxl: Move HDM decoder and register map definitions to include/cxl/cxl.h”

Fixed

689a3a3 NVIDIA: VR: SAUCE: PCI: Add CXL DVSEC reset and capability register definitions Minor nit, I think it was Nicolin’s patch that originally added PCI_DVSEC_CXL_CACHE_CAPABLE. Consider updating the backport note to reflect this.

Fixed.

Thanks for addressing these...I confirmed the updated backport notes look good.

@JiandiAnNVIDIA
Copy link
Copy Markdown
Author

This patch "PCI: Update CXL DVSEC definitions" missed one rename

nvidia@localhost:/home/nvidia/NV-Kernels$ make
  CALL    scripts/checksyscalls.sh
  CC      drivers/pci/ats.o
drivers/pci/ats.c: In function ‘pci_cxl_ats_always_on’:
drivers/pci/ats.c:221:44: error: ‘CXL_DVSEC_PCIE_DEVICE’ undeclared (first use in this function); did you mean ‘PCI_DVSEC_CXL_DEVICE’?
  221 |                                            CXL_DVSEC_PCIE_DEVICE);
      |                                            ^~~~~~~~~~~~~~~~~~~~~
      |                                            PCI_DVSEC_CXL_DEVICE
drivers/pci/ats.c:221:44: note: each undeclared identifier is reported only once for each function it appears in
drivers/pci/ats.c:225:45: error: ‘CXL_DVSEC_CAP_OFFSET’ undeclared (first use in this function)
  225 |         pci_read_config_word(pdev, offset + CXL_DVSEC_CAP_OFFSET, &cap);
      |                                             ^~~~~~~~~~~~~~~~~~~~
make[4]: *** [scripts/Makefile.build:287: drivers/pci/ats.o] Error 1
make[3]: *** [scripts/Makefile.build:556: drivers/pci] Error 2
make[2]: *** [scripts/Makefile.build:556: drivers] Error 2
make[1]: *** [/home/nvidia/NV-Kernels/Makefile:2016: .] Error 2
make: *** [Makefile:248: __sub-make] Error 2

Fixed.

SriMNvidia and others added 14 commits April 15, 2026 18:05
… to include/cxl/cxl.h

Move CXL HDM decoder register defines, register map structs
(cxl_reg_map, cxl_component_reg_map, cxl_device_reg_map,
cxl_pmu_reg_map, cxl_register_map), cxl_hdm_decoder_count(),
enum cxl_regloc_type, and cxl_find_regblock()/cxl_setup_regs()
declarations from internal CXL headers to include/cxl/pci.h.

This makes them accessible to code outside the CXL subsystem, in
particular the PCI core CXL state save/restore support added in a
subsequent patch.

No functional change.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
(backported from https://lore.kernel.org/linux-cxl/20260306080026.116789-1-smadhavan@nvidia.com/)
[jan: Resolve conflicts by moving certain definitions to include/cxl/cxl.h instead of to include/cxl/pci.h to align with its dependency of Alejandro's series]
Signed-off-by: Jiandi An <jan@nvidia.com>
…state

Add pci_add_virtual_ext_cap_save_buffer() to allocate save buffers
using virtual cap IDs (above PCI_EXT_CAP_ID_MAX) that don't require
a real capability in config space.

The existing pci_add_ext_cap_save_buffer() cannot be used for
CXL DVSEC state because it calls pci_find_saved_ext_cap()
which searches for a matching capability in PCI config space.
The CXL state saved here is a synthetic snapshot (DVSEC+HDM)
and should not be tied to a real extended-cap instance. A
virtual extended-cap save buffer API (cap IDs above
PCI_EXT_CAP_ID_MAX) allows PCI to track this state without
a backing config space capability.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
(backported from https://lore.kernel.org/linux-cxl/20260306080026.116789-1-smadhavan@nvidia.com/)
Signed-off-by: Jiandi An <jan@nvidia.com>
Save and restore CXL DVSEC control registers (CTRL, CTRL2), range
base registers, and lock state across PCI resets.

When the DVSEC CONFIG_LOCK bit is set, certain DVSEC fields
become read-only and hardware may have updated them. Blindly
restoring saved values would be silently ignored or conflict
with hardware state. Instead, a read-merge-write approach is
used: current hardware values are read for the RWL
(read-write-when-locked) fields and merged with saved state,
so only writable bits are restored while locked bits retain
their hardware values.

Hooked into pci_save_state()/pci_restore_state() so all PCI reset
paths automatically preserve CXL DVSEC configuration.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
(backported from https://lore.kernel.org/linux-cxl/20260306080026.116789-1-smadhavan@nvidia.com/)
[jan: Resolve minor conflict in drivers/pci/Makefile due to code line shifts ]
Signed-off-by: Jiandi An <jan@nvidia.com>
Save and restore CXL HDM decoder registers (global control,
per-decoder base/size/target-list, and commit state) across PCI
resets. On restore, decoders that were committed are reprogrammed
and recommitted with a 10ms timeout. Locked decoders that are
already committed are skipped, since their state is protected by
hardware and reprogramming them would fail.

The Register Locator DVSEC is parsed directly via PCI config space
reads rather than calling cxl_find_regblock()/cxl_setup_regs(),
since this code lives in the PCI core and must not depend on CXL
module symbols.

MSE is temporarily enabled during save/restore to allow MMIO
access to the HDM decoder register block.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
(backported from https://lore.kernel.org/linux-cxl/20260306080026.116789-1-smadhavan@nvidia.com/)
[jan: Include <cxl/cxl.h> in drivers/pci/cxl.c due to conflict resolution in "4acbc27592b8 NVIDIA: VR: SAUCE: cxl: Move HDM decoder and register map definitions to include/cxl/cxl.h"]
Signed-off-by: Jiandi An <jan@nvidia.com>
…efinitions

Add CXL DVSEC register definitions needed for CXL device reset per
CXL r3.2 section 8.1.3.1:
- Capability bits: RST_CAPABLE, CACHE_CAPABLE, CACHE_WBI_CAPABLE,
  RST_TIMEOUT, RST_MEM_CLR_CAPABLE
- Control2 register: DISABLE_CACHING, INIT_CACHE_WBI, INIT_CXL_RST,
  RST_MEM_CLR_EN
- Status2 register: CACHE_INV, RST_DONE, RST_ERR
- Non-CXL Function Map DVSEC register offset

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
(backported from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/)
[jan: Resolve conflicts where PCI_DVSEC_CXL_CACHE_CAPABLE is already added by "72bd823fb4f1 NVIDIA: VR: SAUCE: PCI: Allow ATS to be always on for CXL.cache capable devices"]
Signed-off-by: Jiandi An <jan@nvidia.com>
…_restore()

Export pci_dev_save_and_disable() and pci_dev_restore() so that
subsystems performing non-standard reset sequences (e.g. CXL)
can reuse the PCI core standard pre/post reset lifecycle:
driver reset_prepare/reset_done callbacks, PCI config space
save/restore, and device disable/re-enable.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
(backported from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/)
Signed-off-by: Jiandi An <jan@nvidia.com>
Add infrastructure for quiescing the CXL data path before reset:

- Memory offlining: check if CXL-backed memory is online and offline
  it via offline_and_remove_memory() before reset, per CXL
  spec requirement to quiesce all CXL.mem transactions before issuing
  CXL Reset.
- CPU cache flush: invalidate cache lines before reset
  as a safety measure after memory offline.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
(backported from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/)
Signed-off-by: Jiandi An <jan@nvidia.com>
…XL reset

Add sibling PCI function save/disable/restore coordination for CXL
reset. Before reset, all CXL.cachemem sibling functions are locked,
saved, and disabled; after reset they are restored. The Non-CXL Function
Map DVSEC and per-function DVSEC capability register are consulted to
skip non-CXL and CXL.io-only functions. A global mutex serializes
concurrent resets to prevent deadlocks between sibling functions.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
(backported from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/)
Signed-off-by: Jiandi An <jan@nvidia.com>
…ration

cxl_dev_reset() implements the hardware reset sequence:
optionally enable memory clear, initiate reset via
CTRL2, wait for completion, and re-enable caching.

cxl_do_reset() orchestrates the full reset flow:
  1. CXL pre-reset: mem offlining and cache flush (when memdev present)
  2. PCI save/disable: pci_dev_save_and_disable() automatically saves
     CXL DVSEC and HDM decoder state via PCI core hooks
  3. Sibling coordination: save/disable CXL.cachemem sibling functions
  4. Execute CXL DVSEC reset
  5. Sibling restore: always runs to re-enable sibling functions
  6. PCI restore: pci_dev_restore() automatically restores CXL state

The CXL-specific DVSEC and HDM save/restore is handled
by the PCI core's CXL save/restore infrastructure (drivers/pci/cxl.c).

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
(backported from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/)
Signed-off-by: Jiandi An <jan@nvidia.com>
Add a "cxl_reset" sysfs attribute to PCI devices that support CXL
Reset (CXL r3.2 section 8.1.3.1). The attribute is visible only on
devices with both CXL.cache and CXL.mem capabilities and the CXL
Reset Capable bit set in the DVSEC.

Writing "1" to the attribute triggers the full CXL reset flow via
cxl_do_reset(). The interface is decoupled from memdev creation:
when a CXL memdev exists, memory offlining and cache flush are
performed; otherwise reset proceeds without the memory management.

The sysfs attribute is managed entirely by the CXL module using
sysfs_create_group() / sysfs_remove_group() rather than the PCI
core's static attribute groups. This avoids cross-module symbol
dependencies between the PCI core (always built-in) and CXL_BUS
(potentially modular).

At module init, existing PCI devices are scanned and a PCI bus
notifier handles hot-plug/unplug. kernfs_drain() makes sure that
any in-flight store() completes before sysfs_remove_group() returns,
preventing use-after-free during module unload.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
(backported from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/)
Signed-off-by: Jiandi An <jan@nvidia.com>
…tribute

Document the cxl_reset sysfs attribute added to PCI devices that
support CXL Reset.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
(backported from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/)
Signed-off-by: Jiandi An <jan@nvidia.com>
…and RAS support

Add Ubuntu kernel config annotations for CXL-related configs introduced
or changed by the following cherry-picked patch series:
  - drivers/cxl changes between v6.17.9 and upstream 7.0 (which includes
    a portion of Terry Bowman's v14 CXL RAS series merged via
    for-7.0/cxl-aer-prep)
  - Alejandro Lucero's v23 CXL Type-2 device support series
  - Smita Koralahalli's v6 patch 3/9 (cxl/region: Skip decoder reset on
    detach for autodiscovered regions)

CONFIG_CXL_BUS:           Enable CXL bus support built-in; required for
                          CXL Type-2 device and RAS support
CONFIG_CXL_PCI:           Enable CXL PCI management built-in; auto-selects
                          CXL_MEM; required for CXL Type-2 device support
CONFIG_CXL_MEM:           Auto-selected by CXL_PCI; required for CXL
                          memory expansion and Type-2 device support
CONFIG_CXL_PORT:          Required for CXL port enumeration; defaults to
                          CXL_BUS value
CONFIG_FWCTL:             Selected by CXL_BUS when CXL_FEATURES is enabled;
                          required for CXL feature mailbox access
CONFIG_CXL_RAS:           New def_bool replacing PCIEAER_CXL (Terry Bowman
                          v14); auto-enabled with ACPI_APEI_GHES+PCIEAER+
                          CXL_BUS for CXL RAS error handling
CONFIG_SFC_CXL:           Solarflare SFC9100-family CXL Type-2 device
                          support; not needed for NVIDIA platforms (n)
CONFIG_ACPI_APEI_EINJ:    Required prerequisite for CONFIG_ACPI_APEI_EINJ_CXL
CONFIG_ACPI_APEI_EINJ_CXL: CXL protocol error injection support via APEI EINJ

CONFIG_PCIEAER_CXL: Remove it from debian.master policy. This config
  was removed from Kconfig by upstream commit d18f1b7
 (PCI/AER: Replace PCIEAER_CXL symbol with CXL_RAS) which is included
 in this port.

CONFIG_ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION: Override debian.master
  amd64-only policy to include arm64. Commit 4d873c5 added
  'select ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION' to arch/arm64/Kconfig,
  making this y on arm64 as well.

CONFIG_GENERIC_CPU_CACHE_MAINTENANCE: New bool config defined by
  c460697 in lib/Kconfig. Selected by arm64 via 4d873c5;
  not selected by x86. Set arm64: y, amd64: -.

CONFIG_CACHEMAINT_FOR_HOTPLUG: New optional menuconfig defined by
  2ec3b54 in drivers/cache/Kconfig. Depends on
  GENERIC_CPU_CACHE_MAINTENANCE so becomes visible on arm64. Defaults
  to n; HiSilicon HHA driver not needed for NVIDIA platforms.
  Set arm64: n, amd64: -.

Signed-off-by: Jiandi An <jan@nvidia.com>
…memory access

Override debian.master policy (m->y) for DEV_DAX, DEV_DAX_CXL, and
DEV_DAX_KMEM to ensure CXL memory regions are accessible as both raw
DAX devices and hotplugged System-RAM nodes.

debian.master sets these to 'm' (modules). For NVIDIA platforms with
CXL Type-2 devices, built-in (y) is required to ensure CXL memory
regions provisioned early in boot are immediately accessible without
relying on module loading order.

CONFIG_DEV_DAX:     Override m->y; prerequisite for DEV_DAX_CXL and
                    DEV_DAX_KMEM to be built-in; depends on
                    TRANSPARENT_HUGEPAGE (already y in debian.master)

CONFIG_DEV_DAX_CXL: Override m->y; creates /dev/daxX.Y devices for CXL
                    RAM regions not in the default system memory map
                    (Soft Reserved or dynamically provisioned regions);
                    depends on CXL_BUS+CXL_REGION+DEV_DAX (all y)

CONFIG_DEV_DAX_KMEM: Override m->y; onlines CXL DAX devices as System-RAM
                    NUMA nodes via memory hotplug, making CXL memory
                    available for normal kernel and userspace allocation

Signed-off-by: Jiandi An <jan@nvidia.com>
…/restore

Add Ubuntu kernel config annotation for CONFIG_PCI_CXL introduced by
the CXL DVSEC and HDM state save/restore series (Srirangan Madhavan).

CONFIG_PCI_CXL:  Hidden bool in drivers/pci/Kconfig; auto-enabled when
                 CXL_BUS=y. Gates compilation of drivers/pci/cxl.o which
                 saves and restores CXL DVSEC control/range registers and
                 HDM decoder state across PCI resets and link transitions.

Signed-off-by: Jiandi An <jan@nvidia.com>
@JiandiAnNVIDIA
Copy link
Copy Markdown
Author

@JiandiAnNVIDIA

Since these three required some fixup when picked (thanks for adding a note to each commit), can you change "cherry picked from commit" to "backported from commit"?

74f2f02 — cxl: Test CXL_DECODER_F_LOCK as a bitmask 6d1616b — cxl: Check for invalid addresses returned from translation functions 8fc5044 — cxl/port: Introduce port_to_host() helper

Fixed.

@nvmochs
Copy link
Copy Markdown
Collaborator

nvmochs commented Apr 15, 2026

@JiandiAnNVIDIA
Since these three required some fixup when picked (thanks for adding a note to each commit), can you change "cherry picked from commit" to "backported from commit"?
74f2f02 — cxl: Test CXL_DECODER_F_LOCK as a bitmask 6d1616b — cxl: Check for invalid addresses returned from translation functions 8fc5044 — cxl/port: Introduce port_to_host() helper

Fixed.

Thanks, verified with range-diff.

Acked-by: Matthew R. Ochs <mochs@nvidia.com>

@jamieNguyenNVIDIA
Copy link
Copy Markdown
Collaborator

Just re-adding my ACK from earlier:

Acked-by: Jamie Nguyen <jamien@nvidia.com>

@github-actions
Copy link
Copy Markdown
Contributor

⚠️ Patchscan: Missing Fixes Detected

The following upstream fix commits appear to be missing from this PR:

Checking 157 commits...
checking 5c7000239d1f361f5e8048f4e2ce3a7f9f75e619 NVIDIA: VR: SAUCE: [Config] Add PCI_CXL annotation for CXL state save/restore..... no upstream reference
checking 51885043a12289fecc405e84fd7910d2e982fddc NVIDIA: VR: SAUCE: [Config] Enable CXL DAX and KMEM built-in for CXL memory access..... no upstream reference
checking ad33ca9f32214a31e6ee40573d6c844e9557d8ae NVIDIA: VR: SAUCE: [Config] CXL config annotations for Type-2 device and RAS support..... 
E: Commit messages differ: upstream 'PCI/AER: Replace PCIEAER_CXL symbol with CXL_RAS' != pick 'NVIDIA: VR: SAUCE: [Config] CXL config annotations for Type-2 device and RAS support'
ERROR: failed to verify commit ad33ca9f32214a31e6ee40573d6c844e9557d8ae
checking fc562f73981b9e22e810e2f1f50612935cefc627 NVIDIA: VR: SAUCE: Documentation: ABI: Add CXL PCI cxl_reset sysfs attribute..... no upstream reference
checking 43a1ddec5c38b4d1bbf32ceb3f9827c662a25793 NVIDIA: VR: SAUCE: cxl: Add cxl_reset sysfs interface for PCI devices..... no upstream reference
checking 32ae763ca5c0f91a78b5f626a8fd5bd224f0e5c0 NVIDIA: VR: SAUCE: cxl: Add CXL DVSEC reset sequence and flow orchestration..... no upstream reference
checking 7d826029d76858e4e43fc8d50918c20f58fcf598 NVIDIA: VR: SAUCE: cxl: Add multi-function sibling coordination for CXL reset..... no upstream reference
checking 382c942a5ed25e8849002dfc2468a55f3be4960e NVIDIA: VR: SAUCE: cxl: Add memory offlining and cache flush helpers..... no upstream reference
checking bf7e3c1043eaabf599739159f9cfd1b028d71c47 NVIDIA: VR: SAUCE: PCI: Export pci_dev_save_and_disable() and pci_dev_restore()..... no upstream reference
checking a79030f65ae7b653f0464b1f61e2bcfab008a943 NVIDIA: VR: SAUCE: PCI: Add CXL DVSEC reset and capability register definitions..... no upstream reference
checking 165f9423cfea5a3e5fd319726e5e82d00a1f4e8a NVIDIA: VR: SAUCE: PCI: Add HDM decoder state save/restore..... no upstream reference
checking 8ea00cbb7a31522db2bbdd8363886f2ce344b11d NVIDIA: VR: SAUCE: PCI: Add cxl DVSEC state save/restore across resets..... no upstream reference
checking e27bd9a02a5cd105c765efa97b7761c923dc8fb4 NVIDIA: VR: SAUCE: PCI: Add virtual extended cap save buffer for CXL state..... no upstream reference
checking 15746114b71f76cf1f859c4412eafc772d9a85d9 NVIDIA: VR: SAUCE: cxl: Move HDM decoder and register map definitions to include/cxl/cxl.h..... no upstream reference
checking 0456a87fa66d61fc70f3a57809584f9d07f4df21 NVIDIA: VR: SAUCE: PCI: Add CXL DVSEC control, lock, and range register definitions..... no upstream reference
checking f0230f4ca61e95fab234d7187e239ee12ad21ebd NVIDIA: VR: SAUCE: cxl/region: Support multi-level interleaving with smaller granularities for lower levels..... no upstream reference
checking 4feed0efd5643c37d183cd2654a8ec57621f7331 NVIDIA: VR: SAUCE: sfc: support pio mapping based on cxl..... no upstream reference
checking 7d5940081341c4744aeee5663a998f060b98f23e NVIDIA: VR: SAUCE: sfc: create cxl region..... no upstream reference
checking b58b495cc8a6cd61515294fe8fe57b3bbc7b6cc5 NVIDIA: VR: SAUCE: cxl: Avoid dax creation for accelerators..... no upstream reference
checking 390c35a1d7a9d28cd2f91a1beeeb63bdbb997ecb NVIDIA: VR: SAUCE: cxl: Allow region creation by type2 drivers..... no upstream reference
checking 6b6843a439d1c48f0e37e797fe7ce04cf209d149 NVIDIA: VR: SAUCE: cxl/region: Factor out interleave granularity setup..... no upstream reference
checking 6158acac60d2b00702f747668e4eb6eb3a1d9069 NVIDIA: VR: SAUCE: cxl/region: Factor out interleave ways setup..... no upstream reference
checking be7d6835db73627238d0a0f8271361e087ac7875 NVIDIA: VR: SAUCE: cxl: Make region type based on endpoint type..... no upstream reference
checking 82f5b8001ab699e6f14cf1576ded2d27ea212c42 NVIDIA: VR: SAUCE: sfc: get endpoint decoder..... no upstream reference
checking 3d23b74bba39e32646a7d3464830296294a00580 NVIDIA: VR: SAUCE: cxl: Define a driver interface for DPA allocation..... no upstream reference
checking 94a5a966f670f8a7b5b018468d465949d0623e4d NVIDIA: VR: SAUCE: sfc: get root decoder..... no upstream reference
checking 5ca492ae596e0dd910240d4ba326bd33d7b688c1 NVIDIA: VR: SAUCE: cxl: Define a driver interface for HPA free space enumeration..... no upstream reference
checking 7ad1e261a8b18edd00afb505d2cbb38dbc4ba5a3 NVIDIA: VR: SAUCE: sfc: obtain decoder and region if committed by firmware..... no upstream reference
checking 11cb2b50299129a10af2db8385e652d7b275d1e7 NVIDIA: VR: SAUCE: cxl: Export function for unwinding cxl by accelerators..... no upstream reference
checking bd7a321931d0bdf625f746b62cae8fd58a9e9b4b NVIDIA: VR: SAUCE: cxl: Add function for obtaining region range..... no upstream reference
checking e6cc5924b00a91968b2db3575fc4e46a03bb4cd9 NVIDIA: VR: SAUCE: cxl/hdm: Add support for getting region from committed decoder..... no upstream reference
checking ee93b4a8e7e0f40902c30f53716e7d4bebbc2d81 NVIDIA: VR: SAUCE: sfc: create type2 cxl memdev..... no upstream reference
checking 12c3e2e5597ab802ee1dbe452084660f5e86281d NVIDIA: VR: SAUCE: cxl: Prepare memdev creation for type2..... no upstream reference
checking feda1dd5d5c716fcb5541980d910056007ddece2 NVIDIA: VR: SAUCE: cxl/sfc: Initialize dpa without a mailbox..... no upstream reference
checking c0bae6d538eee81a751cc6a59eb240beecb8c9b2 NVIDIA: VR: SAUCE: cxl/sfc: Map cxl component regs..... no upstream reference
checking ee92bb2e9cc2797abaa03c6258ff217db9b73488 NVIDIA: VR: SAUCE: cxl: Move pci generic code..... no upstream reference
checking 4fdd64dba6ce716e3db7d8ec5c904b6f737c9772 NVIDIA: VR: SAUCE: sfc: add cxl support..... no upstream reference
checking 93f199ed3b231eb0589c9533899ec7c9b6fcaf59 NVIDIA: VR: SAUCE: cxl: Add type2 device basic support..... no upstream reference
checking 0e6a077fe6b549500997d02060bba743bd67e983 NVIDIA: VR: SAUCE: cxl/region: Skip decoder reset on detach for autodiscovered regions..... no upstream reference
checking 5b655d7df392a12b123fef61d0f9019437eacefe cxl: Check for invalid addresses returned from translation functions on errors..... found upstream
checking 33ee8a42377dbb0bdcdc2306518851d36f28be06 PCI/AER: Update struct aer_err_info with kernel-doc formatting..... found upstream
checking de8449ec8a3ef47ae5770f5579d9a6676fcf5771 PCI/AER: Report CXL or PCIe bus type in AER trace logging..... found upstream
checking 82c3876c16a65cf281d0de73a36db8638f0fac0b PCI/AER: Use guard() in cxl_rch_handle_error_iter()..... found upstream
checking d82202de040c71213a4c4546f103ada2dcd9b25a PCI/AER: Move CXL RCH error handling to aer_cxl_rch.c..... found upstream
checking b59d933c05652eba0996946ca032fe115d68c0e8 PCI/AER: Update is_internal_error() to be non-static is_aer_internal_error()..... found upstream
checking 36bb6ce673713fb273f1d7960d3d6466374ef3cc PCI/AER: Export pci_aer_unmask_internal_errors()..... found upstream
checking 718d42ff9916ce72f4d4b7f0af5fc7ab044f2564 PCI: Replace cxl_error_is_native() with pcie_aer_is_native()..... found upstream
checking d478eef8b7d27633ee02e0abfd87ff10eef9d494 PCI: Introduce pcie_is_cxl()..... found upstream
checking 0a92f7c5374583f043e8918ac42387e4da8e60c4 cxl: Fix premature commit_end increment on decoder commit failure..... found upstream
checking 90b28c03c4fe986be143f8541ba202b49cbe38bd cxl/region: Use do_div() for 64-bit modulo operation..... found upstream
checking e6f66c187594876725f5659b956bc2c939880e63 cxl/region: Translate HPA to DPA and memdev in unaligned regions..... found upstream
checking 161b970c8a42822d6285a90563ec5afb28bdd8f9 cxl/region: Translate DPA->HPA in unaligned MOD3 regions..... found upstream
checking d379dbd3f813915b2175ab85442dcfd874a8ccf3 cxl/core: Fix cxl_dport debugfs EINJ entries..... found upstream
checking 5631756a0fb756557a5a5ed73ce9cb430fd9a4fb cxl/acpi: Remove cxl_acpi_set_cache_size()..... found upstream
checking c7de45e87d56d8813abdaaab41e99d603c2cdd30 cxl/hdm: Fix newline character in dev_err() messages..... found upstream
checking cad86cf8d7ccf8aff0ec373cb061b99da9bcb0c1 cxl/pci: Remove outdated FIXME comment and BUILD_BUG_ON..... found upstream
checking 03eafd49bb4f47f9ebe632e1c7de6a82c1ffa1cb cxl: Update RAS handler interfaces to also support CXL Ports..... found upstream
checking 3dfcbd1da9c4301c14198114205d2b55815ff84b cxl/mem: Clarify @host for devm_cxl_add_nvdimm()..... found upstream
checking 6ea551ce58ff26486d641133fe3925e4ba844be0 cxl/pci: Move CXL driver's RCH error handling into core/ras_rch.c..... found upstream
checking 7aed7c401ac52299b234b8d351681dcaef31b4ac PCI/AER: Replace PCIEAER_CXL symbol with CXL_RAS..... found upstream
checking 3e0cd1fb7da9e8805ac7a75827fd971879a991b2 cxl/pci: Remove CXL VH handling in CONFIG_PCIEAER_CXL conditional blocks from core/pci.c..... found upstream
checking 3f5a0a7f1720c3713c0a70ac7576e063a7b9b4c6 cxl/pci: Remove unnecessary CXL RCH handling helper functions..... found upstream
checking 8ac9837931a3c97f5c5068f9aaa90a833871f9db cxl/pci: Remove unnecessary CXL Endpoint handling helper functions..... found upstream
checking 946875b707e4a21bffccfb55cc7901162b107cc0 PCI: Update CXL DVSEC definitions..... found upstream
checking 7c4e96af007dc5ddbafc77c8b37407aab616e1c8 PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h..... found upstream
checking c37eb1c52305303f56f586d36a0158f3b7241712 cxl/memdev: fix deadlock in cxl_memdev_autoremove() on attach failure..... found upstream
checking 7715a7e65aea25941af436b868740c25b87df102 cxl/mbox: Use proper endpoint validity check upon sanitize..... found upstream
checking 3fc9b2595e7c9f71792b43d11890f24a4084a9ea cxl/mem: Introduce cxl_memdev_attach for CXL-dependent operation..... found upstream
checking f5453cf4a27ea6c88ad5eea6c820ae56befa01bf cxl/mem: Drop @host argument to devm_cxl_add_memdev()..... found upstream
checking cf51f134d6082970994904afbe655e20b79a3b06 cxl/mem: Convert devm_cxl_add_memdev() to scope-based-cleanup..... found upstream
checking f1aac676a7a4fc191e88c12f1776663c4f0e8bb6 cxl/port: Arrange for always synchronous endpoint attach..... found upstream
checking e398f6a96c07db9535ab92315264f1c8e88fc912 cxl/mem: Arrange for always-synchronous memdev attach..... found upstream
checking 2cfb9396c3f39467c27aaf236be27bdb0a309a70 cxl/mem: Fix devm_cxl_memdev_edac_release() confusion..... found upstream
checking cd57ef87902ddadd1c1497678d25b37755e2d18c soc: fsl: qbman: use kmalloc_array() instead of kmalloc()..... found upstream
checking 471a770fd62eddbff8b0bc1cdc1c6aabe71dd315 soc: fsl: qbman: add WQ_PERCPU to alloc_workqueue users..... found upstream
checking 378100707cc1dc55658779f0d95eae9916842269 MAINTAINERS: Update email address for Christophe Leroy..... found upstream
checking c4c8ed09d6966ae883c8ed596f525e8fcbd1aa5a MAINTAINERS: refer to intended file in STANDALONE CACHE CONTROLLER DRIVERS..... found upstream
checking 40fdee7970273780c017875d30ef4145bef7d74d cache: Support cache maintenance for HiSilicon SoC Hydra Home Agent..... found upstream
checking cf23c0f15b5ae8682adb32160d2712a4ce14afb4 riscv: ERRATA_STARFIVE_JH7100: Fix missing dependency on new CONFIG_CACHEMAINT_FOR_DMA..... found upstream
checking a66e74090d5136a174256bb8ae16215ffd81231b soc: renesas: Fix missing dependency on new CONFIG_CACHEMAINT_FOR_DMA..... found upstream
checking 8539267c9ee0194eb9d372ed5c90a7e998100081 cache: Make top level Kconfig menu a boolean dependent on RISCV..... found upstream
checking 3eb1066c61e7b5c33b9368822aadc8c8de078fc1 MAINTAINERS: Add Jonathan Cameron to drivers/cache and add lib/cache_maint.c + header..... found upstream
checking 344b36ea0b7caebadf46386264e3b3496d4e9d28 arm64: Select GENERIC_CPU_CACHE_MAINTENANCE..... found upstream
checking ff63c9be3e5c3d9cd0ea01d66131a68ab91b2ad4 lib: Support ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION..... found upstream
checking 86ba2cec8a4e7a7dd3928bc8c9c20092ecd61555 soc: amlogic: meson-gx-socinfo: add new SoCs id..... found upstream
checking a39d807270a7d9c2adc026957cb16c05cdd04682 dt-bindings: arm: amlogic: meson-gx-ao-secure: support more SoCs..... found upstream
checking 42b11f97817ef874ebd41c1515848477f58da065 memregion: Support fine grained invalidate by cpu_cache_invalidate_memregion()..... found upstream
checking aef26673945ce3407749269aceefb7449d91cd74 memregion: Drop unused IORES_DESC_* parameter from cpu_cache_invalidate_memregion()..... found upstream
checking b8eb1e0e305b489126291ed07d7278c2af999e07 dt-bindings: cache: sifive,ccache0: add a pic64gx compatible..... found upstream
checking 4bbf0a1394eb0b6b2ea192bcf65ec6d51780a4a0 MAINTAINERS: rename Microchip RISC-V entry..... found upstream
checking 3e618676a561dbf32aafbd9e4152614081a27e51 MAINTAINERS: add new soc drivers to Microchip RISC-V entry..... found upstream
checking 292ca3bbcbbc53920e765af7ddcfb279af357007 soc: microchip: mpfs-control-scb: Fix resource leak on driver unbind..... found upstream
checking 16f3079ccb1d60c65a7cd4fd6d3ffa45328f677e soc: microchip: mpfs-mss-top-sysreg: Fix resource leak on driver unbind..... found upstream
checking cb0ff7e86da5731055e5c7e6f6bda0ecd5e77af4 soc: microchip: add mfd drivers for two syscon regions on PolarFire SoC..... found upstream
checking 9b3e295e7b0c31c077f601020ca368ba4390a7ac dt-bindings: soc: microchip: document the simple-mfd syscon on PolarFire SoC..... found upstream
checking 0e07af0ddb40a280c9977d51cd45b9882d7f7e25 soc: amlogic: canvas: simplify lookup error handling..... found upstream
checking 35a60b6331433c61d19db3b7626b6266b2d5047f soc: amlogic: canvas: fix device leak on lookup..... found upstream
checking 6b908451f6cffddcd358806e719fe7b0d1685719 soc: apple: sart: drop device reference after lookup..... found upstream
checking 98b17aedcc2feb5ebf8a7c381fe6a49c4ae08d51 soc: apple: mailbox: fix device leak on lookup..... found upstream
checking a140739f9664cbcbff194eff033c5081a8421124 cxl/test: Assign overflow_err_count from log->nr_overflow..... found upstream
checking 051e98c72e2835fbae86848e87231a8d81444f47 cxl/test: Remove ret_limit race condition in mock_get_event()..... found upstream
checking f52108012cfaee2b323f426897a9dcd08378b877 cxl/test: remove unused mock function for cxl_rcd_component_reg_phys()..... found upstream
checking fd91786b18d7056876ad203d2f2cb2b7d2bed72f cxl/test: Add support for acpi extended linear cache..... found upstream
checking c4ea11679c433ff8d33a3b7e9627638e9f636a5d cxl/test: Add cxl_test CFMWS support for extended linear cache..... found upstream
checking 6b8e5fed2edf53e6f2570cf7f191b57f07d6d403 cxl/test: Standardize CXL auto region size..... found upstream
checking d17fca3d6b15cf8f88b0861455e1e1b410bedd48 cxl/region: Remove local variable @inc in cxl_port_setup_targets()..... found upstream
checking e54e6239d245b1705d16468b4fd7632c8b0f3344 cxl/acpi: Group xor arithmetric setup code in a single block..... found upstream
checking 9977793b0013f03e3f9ac818e26ed32310cf2a16 cxl: Simplify cxl_rd_ops allocation and handling..... found upstream
checking 6dbac66e307a7ca01d95fcdf983692ee7b9f322c cxl: Clarify comment in spa_maps_hpa()..... found upstream
checking 78810f436d719a8f0f601e81bef3e6d425c0b1e7 cxl: Rename region_res_match_cxl_range() to spa_maps_hpa()..... found upstream
checking 5cbae486a7381f888b06e246a353b3279fa58b9f acpi/hmat: Return when generic target is updated..... found upstream
checking 671a919cdc96cdcaee6208eb12ef039ce4f44d8e cxl: Test CXL_DECODER_F_LOCK as a bitmask..... found upstream
checking 31850f28ca5b5f6922580a5bbcb627739e3c0935 cxl: Add handling of locked CXL decoder..... found upstream
checking 35a684f5f22d8969b531646967e162615fca1964 cxl/region: fix format string for resource_size_t..... found upstream
checking d5c21c37501a989bee37f309d99acb047368f1f4 cxl/region: Fix leakage in __construct_region()..... found upstream
checking fe339ae4157ff44d1d1a6f56ab56888df627b364 cxl/region: Add support to indicate region has extended linear cache..... found upstream
checking b18a8cc53105fcae335618b156c6de0e88641c7b cxl: Adjust extended linear cache failure emission in cxl_acpi..... found upstream
checking 4e80500bf44552fed1d10cec6e81f347c81cacaf cxl/test: Add cxl_translate module for address translation testing..... found upstream
checking 8673ee29a45b501121cd50359a310c324313c6f1 cxl/acpi: Restore HBIW check before dereferencing platform_data..... found upstream
checking 17f9e76a2eb24588e752d9621e44324970f7f8a6 cxl/acpi: Make the XOR calculations available for testing..... found upstream
checking 8a80eca03357036afb84841aef6202aa6be1892b cxl/region: Refactor address translation funcs for testing..... found upstream
checking 9bdbbc833574bb6958dc947ebcf57d5be420fc7c cxl/pci: replace use of system_wq with system_percpu_wq..... found upstream
checking 5cfb4ce0b781dcbc2182fdd6fb84ca93961c0f42 cxl: fix typos in cdat.c comments..... found upstream
checking d3515651850dff81675b6342128dffd13128fc06 cxl/port: Remove devm_cxl_port_enumerate_dports()..... found upstream
checking 7b8cba25228b44a085f12359d30bd547b0dec0d1 Documentation/driver-api/cxl: remove page-allocator quirk section..... found upstream
checking bbfbbca03fe3482b6961d4b2f606ad36ce1a0598 cxl: Adjust offset calculation for poison injection..... found upstream
checking 29482d64200d05b74110c34e6226d4a9dd4c39d6 cxl/region: Use %pa printk format to emit resource_size_t..... found upstream
checking 530a95e597e7ffb6e2bbff2d0c6ef1dae83678ae cxl/port: Avoid missing port component registers setup..... found upstream
checking ea1284fa29e5fd0c629853d0b4f6f4533833e271 cxl: Move port register setup to when first dport appear..... found upstream
checking 95609fb4c90576357db2ea6b67f7a0f25fffd6ef cxl: Change sslbis handler to only handle single dport..... found upstream
checking 14b542d875eb560eb37919412a4c9367cecb13ee cxl/test: Setup target_map for cxl_test decoder initialization..... found upstream
checking 45f89e31951a1e108619a333d9f9b9a8451f260d cxl/test: Adjust the mock version of devm_cxl_switch_port_decoders_setup()..... found upstream
checking e92ff7243ae4a78083abe40d517f67cecdf54ef8 cxl/test: Add mock version of devm_cxl_add_dport_by_dev()..... found upstream
checking 297ac5be3ac8cc18f0b1648c77396d8d67d90bde cxl/port: Fix target list setup for multiple decoders sharing the same dport..... found upstream
checking 93fdb3715c45aa2693bf8fb70da8cba3eac6e4ac cxl/port: Hold port host lock during dport adding...... found upstream
checking 21f970c4142dcef7d9c37fbb889bb28033fc4461 cxl/port: Introduce port_to_host() helper..... found upstream
checking 1fbe3af4a3b2b4a3d10641e7631302e006c63408 cxl: Defer dport allocation for switch ports..... found upstream
checking c084c67298f064c4a45435af8a7722d28184108d cxl/test: Refactor decoder setup to reduce cxl_test burden..... found upstream
checking 92ed4e82d1417ccb033b265644b253b4d7e80f92 cxl: Add a cached copy of target_map to cxl_decoder..... found upstream
checking c80408eb1c9eb1ad11382865c360971804ccaeae cxl: Add helper to delete dport..... found upstream
checking af291e50f786d4c05e60214dfcf5eda5585951c7 cxl: Add helper to detect top of CXL device topology..... found upstream
checking 5558bbefce7015ed16b5450380dfc2a1fbd89e4f cxl: Documentation/driver-api/cxl: Describe the x86 Low Memory Hole solution..... found upstream
checking 4033ff168a2b60256914aeb5d614a75bd18cab08 cxl/acpi: Rename CFMW coherency restrictions..... found upstream
checking 032b8663349a0195bae99ed60e64feaea8a492f1 Documentation/driver-api: Fix typo error in cxl..... found upstream
checking 7887000edf5170c2b0414d20443d65215284ba51 acpi/hmat: Remove now unused hmat_update_target_coordinates()..... found upstream
checking d6482ee86ac2e44b0d6e4f979a3df9aadb523952 cxl, acpi/hmat: Update CXL access coordinates directly instead of through HMAT..... found upstream
checking 5f65dbb2e42c50d5d29268c64e78ea0be0711d04 drivers/base/node: Add a helper function node_update_perf_attrs()..... found upstream
checking 13fd32f4447b0c65a6da4523c41ca1d0ecacce71 mm/memory_hotplug: Update comment for hotplug memory callback priorities..... found upstream
checking b5e179512e7fdc724a41e8b702052ebca88fbbba cxl: Fix emit of type resource_size_t argument for validate_region_offset()..... found upstream
checking 4a6212a9b91f14cbe82d66cc9940606eaf423ddb cxl/region: Add inject and clear poison by region offset..... found upstream
checking b7bed11331d8467f9b471575ef574d1357bfa621 cxl/core: Add locked variants of the poison inject and clear funcs..... found upstream
checking 2cf54d81bbec57fe77bd892b3755c8f8a45c38f6 cxl/region: Introduce SPA to DPA address translation..... found upstream
checking 0942e0e04c720df2a10ac607c9ab85b113828213 cxl: Define a SPA->CXL HPA root decoder callback for XOR Math..... found upstream
checking 60f19cbc4b0bd2c7fd4e5a4907126eaebed9ea34 cxl: Move hpa_to_spa callback to a new root decoder ops structure..... found upstream
checking 889b8fa88a1bfc277cf2409f0ecbcf1122dfb37e cxl/region: use str_enabled_disabled() instead of ternary operator..... found upstream
checking d3d5a3694ce7f264d3bf869dfbffb0369e3e00ff cxl/hdm: Use str_plural() to simplify the code..... found upstream
checking 1c3660a825fef0e4be2f5437903034f35596b552 Revert "NVIDIA: VR: SAUCE: cxl: add support for cxl reset"..... no upstream reference
All fixes:

Please review and consider including them.

@nvmochs
Copy link
Copy Markdown
Collaborator

nvmochs commented Apr 16, 2026

@nvmochs nvmochs closed this Apr 16, 2026
@nirmoy
Copy link
Copy Markdown
Collaborator

nirmoy commented Apr 16, 2026

The patchscan did post a comment — scroll up to find the :warning: Patchscan: Missing Fixes Detected comment from github-actions[bot] in this thread.

The All fixes: section in that comment is emptyno missing Fixes: patches were found for this PR.

The job failed because of a different issue: commit ad33ca9f (NVIDIA: VR: SAUCE: [Config] CXL config annotations for Type-2 device and RAS support) contains a reference to upstream SHA 7aed7c401 in its message body. Patchscan found that SHA in origin/linux, but the commit titles differ:

E: Commit messages differ: upstream 'PCI/AER: Replace PCIEAER_CXL symbol with CXL_RAS'
                           != pick 'NVIDIA: VR: SAUCE: [Config] CXL config annotations...'

This is a false positive — the SAUCE config commit isn't a backport of that upstream commit; it just references the upstream SHA for context. No fixes are missing from this PR.

This also exposed a UX bug in the workflow: E: (upstream verification errors) and W: (missing fixes) currently both trigger the same "Missing Fixes Detected" comment title, which is misleading when there are no actual missing fixes. I'll push a fix to distinguish these two cases with separate comment titles.

nirmoy added a commit to nirmoy/NV-Kernels that referenced this pull request Apr 16, 2026
Previously, both E: (upstream commit-message mismatch) and W: (missing
Fixes: patch) set fixes_found=true, causing the "Missing Fixes Detected"
comment to appear even when no Fixes: patches were missing.  PR NVIDIA#342
triggered exactly this: a SAUCE config commit referencing an upstream SHA
with a different title caused E: output, but All fixes: was empty.

Replace the two separate if-blocks (which could overwrite each other via
GITHUB_OUTPUT) with a single mutually-exclusive chain:
  W: / "Fixes for"  → fixes_found=true  (missing Fixes: patches)
  E: / non-zero rc  → fixes_found=error  (upstream verification failure)
  neither           → fixes_found=false  (all-clear)

Update the "error" PR comment title and body to explain this is typically
a false positive from SAUCE commits that reference upstream SHAs in their
message body with a different title.

Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
nirmoy added a commit to nirmoy/NV-Kernels that referenced this pull request Apr 16, 2026
Previously, both E: (upstream commit-message mismatch) and W: (missing
Fixes: patch) set fixes_found=true, causing the "Missing Fixes Detected"
comment to appear even when no Fixes: patches were missing.  PR NVIDIA#342
triggered exactly this: a SAUCE config commit referencing an upstream SHA
with a different title caused E: output, but All fixes: was empty.

Replace the two separate if-blocks (which could overwrite each other via
GITHUB_OUTPUT) with a single mutually-exclusive chain:
  W: / "Fixes for"  → fixes_found=true  (missing Fixes: patches)
  E: / non-zero rc  → fixes_found=error  (upstream verification failure)
  neither           → fixes_found=false  (all-clear)

Update the "error" PR comment title and body to explain this is typically
a false positive from SAUCE commits that reference upstream SHAs in their
message body with a different title.

Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
nirmoy added a commit to nirmoy/NV-Kernels that referenced this pull request Apr 16, 2026
Previously, both E: (upstream commit-message mismatch) and W: (missing
Fixes: patch) set fixes_found=true, causing the "Missing Fixes Detected"
comment to appear even when no Fixes: patches were missing.  PR NVIDIA#342
triggered exactly this: a SAUCE config commit referencing an upstream SHA
with a different title caused E: output, but All fixes: was empty.

Replace the two separate if-blocks (which could overwrite each other via
GITHUB_OUTPUT) with a single mutually-exclusive chain:
  W: / "Fixes for"  → fixes_found=true  (missing Fixes: patches)
  E: / non-zero rc  → fixes_found=error  (upstream verification failure)
  neither           → fixes_found=false  (all-clear)

Update the "error" PR comment title and body to explain this is typically
a false positive from SAUCE commits that reference upstream SHAs in their
message body with a different title.

Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.