Skip to content

[linux-nvidia-6.18-next] PCI: mirror PI7C9X3G606GPC Port 4 BAR0#447

Open
nirmoy wants to merge 187 commits into
NVIDIA:linux-nvidia-6.18-nextfrom
nirmoy:codex/pericom-msix-bar-war-6.18
Open

[linux-nvidia-6.18-next] PCI: mirror PI7C9X3G606GPC Port 4 BAR0#447
nirmoy wants to merge 187 commits into
NVIDIA:linux-nvidia-6.18-nextfrom
nirmoy:codex/pericom-msix-bar-war-6.18

Conversation

@nirmoy
Copy link
Copy Markdown
Collaborator

@nirmoy nirmoy commented May 28, 2026

Summary

  • Add a PCI final/early-resume quirk for the Diodes/Pericom PI7C9X3G606GPC switch.
  • Mirror the immediate upstream port BAR0 value into downstream Port 4 BAR0 after Linux PCI resource assignment and during early resume.
  • Scope the WAR to the Diodes-confirmed OS-visible Tile0/P4 mapping: upstream bus + 1, device 04, function 0.
  • Port 4 BAR0 can read back as zero through normal PCI config space even after a successful write, so the quirk rewrites BAR0 whenever it runs.

Validation

  • Current PR head: b8a7745cab1c3c7be7b25c2dbba0f038662b60f6.
  • Local patch checks passed at current head:
    • scripts/checkpatch.pl --strict --ignore GERRIT_CHANGE_ID --git HEAD
    • git diff --check HEAD~1..HEAD
    • make O=/tmp/nv-kernels-pr447-quirks-build -j$(nproc) drivers/pci/quirks.o
  • Built arm64 packages on the GB200 build host with make bindeb-pkg after removing the 64-bit upstream BAR0 skip.
  • Installed and booted the Quark DUT: OS 172.17.33.143 via jumper 10.22.18.250; BMC 172.17.33.144.
Linux localhost-right 6.18.33-pr447-pericom-no64 #1 SMP PREEMPT_DYNAMIC Fri May 29 19:03:57 UTC 2026 aarch64
  • Confirmed the affected topology and quirk application from the boot journal:
pci 0002:a1:00.0: BAR 0 [mem 0x10300000-0x1037ffff]
pci 0002:a2:04.0: [12d8:c008] type 01 class 0x060400 PCIe Switch Downstream Port
pci 0002:a3:00.0: [1344:51c3] type 00 class 0x010802 PCIe Endpoint
pci 0002:a1:00.0: BAR 0 [mem 0x10300000-0x1037ffff]: assigned
pci 0002:a2:04.0: wrote upstream BAR 0 0x10300000 to Port 4 BAR 0 for PI7C9X3G606GPC BAR0 mirror workaround
  • Rootfs stayed read-write on the NVMe/BTRFS device:
/dev/nvme0n1p2[/@root/versions/v-diagos-snapshot-20260421] btrfs rw,relatime,ssd,discard=async,space_cache=v2,subvolid=264,subvol=/@root/versions/v-diagos-snapshot-20260421
  • Ran a 300 second NVMe-backed rootfs fio smoke test:
pr447-no64-rootfs-smoke: err= 0
READ: bw=251MiB/s, io=73.6GiB, run=300009msec
WRITE: bw=108MiB/s, io=31.6GiB, run=300009msec
  • Post-fio journalctl -b -k scan for BTRFS error|I/O error|nvme.*timeout|device inaccessible|read-only|blk_update_request|Buffer I/O error returned no matches.
  • Did not use the Diodes BMC/I2C BAR0 readback helper in this validation run. That helper uses special CPED/CDEP programming and is not a supported production validation path on this platform.
  • The 24.04 DUT has an unrelated DKMS failure for mods/4.31.0 on test kernels, so DKMS hooks were temporarily bypassed only to finish package configuration/initramfs/grub; hooks were restored afterward.

References

Launchpad: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.17/+bug/2154457

6.17 PR: #442
BOS PR: #443
NVBug: https://nvbugspro.nvidia.com/bug/6205517
NVBug: https://nvbugspro.nvidia.com/bug/6134331

Kartik Rajput and others added 30 commits May 24, 2026 00:58
Introduce a generic macro TEGRA_GPIO_PORT to define SoC specific
ports macros. This simplifies the code and avoids unnecessary
duplication.

Suggested-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Kartik Rajput <kkartik@nvidia.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
(cherry picked from commit f75db6f)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Signed-off-by: Jamie Nguyen <jamien@nvidia.com>
Extend the existing Tegra186 GPIO controller driver with support for
the GPIO controller found on Tegra410. Tegra410 supports two GPIO
controllers referred to as 'COMPUTE' and 'SYSTEM'.

Co-developed-by: Nathan Hartman <nhartman@nvidia.com>
Signed-off-by: Nathan Hartman <nhartman@nvidia.com>
Signed-off-by: Prathamesh Shete <pshete@nvidia.com>
Signed-off-by: Kartik Rajput <kkartik@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
(cherry picked from commit 9631a10)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Signed-off-by: Jamie Nguyen <jamien@nvidia.com>
On Tegra410, Compute and System GPIOs have same port names. This
results in the same GPIO names for both Compute and System GPIOs
during initialization in `tegra186_gpio_probe()`, which results in
following warnings:

  kernel: gpio gpiochip1: Detected name collision for GPIO name 'PA.00'
  kernel: gpio gpiochip1: Detected name collision for GPIO name 'PA.01'
  kernel: gpio gpiochip1: Detected name collision for GPIO name 'PA.02'
  kernel: gpio gpiochip1: Detected name collision for GPIO name 'PB.00'
  kernel: gpio gpiochip1: Detected name collision for GPIO name 'PB.01'
  ...

Add GPIO name prefix in the SoC data and use it to initialize the GPIO
name.

Port names remain unchanged for previous SoCs. On Tegra410, Compute
GPIOs are named COMPUTE-P<PORT>.GPIO, and System GPIOs are named
SYSTEM-P<PORT>.GPIO.

Fixes: 9631a10 ("gpio: tegra186: Add support for Tegra410")
Signed-off-by: Kartik Rajput <kkartik@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://lore.kernel.org/r/20251113163112.885900-1-kkartik@nvidia.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
(cherry picked from commit 67f9b82)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Signed-off-by: Jamie Nguyen <jamien@nvidia.com>
Tegra410 and Tegra241 have deprecated HIDREV register. It is
recommended to use ARM SMCCC calls to get chip_id, major and minor
revisions.

Use ARM SMCCC to get chip_id, major and minor revision.

Signed-off-by: Kartik Rajput <kkartik@nvidia.com>
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Signed-off-by: Jamie Nguyen <jamien@nvidia.com>
Extract common cleanup code into dedicated helper functions to simplify
the code and improve readability. This refactoring includes:

- tegra_qspi_reset(): Device reset and interrupt cleanup
- tegra_qspi_dma_stop(): DMA termination and disable
- tegra_qspi_pio_stop(): PIO mode disable

No functional changes. This is purely a code reorganization to prepare
for improved timeout handling in subsequent patches.

Signed-off-by: Vishwaroop A <va@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Link: https://patch.msgid.link/20251028155703.4151791-3-va@nvidia.com
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit 6022eac)
Signed-off-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Signed-off-by: Jamie Nguyen <jamien@nvidia.com>
Under high system load, QSPI interrupts can be delayed or blocked on the
target CPU, causing wait_for_completion_timeout() to report failure even
though the hardware successfully completed the transfer.

When a timeout occurs, check the QSPI_RDY bit in QSPI_TRANS_STATUS to
determine if the hardware actually completed the transfer. If so, manually
invoke the completion handler to process the transfer successfully instead
of failing it.

This distinguishes lost/delayed interrupts from real hardware timeouts,
preventing unnecessary failures of transfers that completed successfully.

Signed-off-by: Vishwaroop A <va@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Link: https://patch.msgid.link/20251028155703.4151791-4-va@nvidia.com
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit 380fd29)
Signed-off-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Signed-off-by: Jamie Nguyen <jamien@nvidia.com>
When APEI fails to handle a stage-2 synchronous external abort (SEA),
today KVM injects an asynchronous SError to the VCPU then resumes it,
which usually results in unpleasant guest kernel panic.

One major situation of guest SEA is when vCPU consumes recoverable
uncorrected memory error (UER). Although SError and guest kernel panic
effectively stops the propagation of corrupted memory, guest may
re-use the corrupted memory if auto-rebooted; in worse case, guest
boot may run into poisoned memory. So there is room to recover from
an UER in a more graceful manner.

Alternatively KVM can redirect the synchronous SEA event to VMM to
- Reduce blast radius if possible. VMM can inject a SEA to VCPU via
  KVM's existing KVM_SET_VCPU_EVENTS API. If the memory poison
  consumption or fault is not from guest kernel, blast radius can be
  limited to the triggering thread in guest userspace, so VM can
  keep running.
- Allow VMM to protect from future memory poison consumption by
  unmapping the page from stage-2, or to interrupt guest of the
  poisoned page so guest kernel can unmap it from stage-1 page table.
- Allow VMM to track SEA events that VM customers care about, to restart
  VM when certain number of distinct poison events have happened,
  to provide observability to customers in log management UI.

Introduce an userspace-visible feature to enable VMM handle SEA:
- KVM_CAP_ARM_SEA_TO_USER. As the alternative fallback behavior
  when host APEI fails to claim a SEA, userspace can opt in this new
  capability to let KVM exit to userspace during SEA if it is not
  owned by host.
- KVM_EXIT_ARM_SEA. A new exit reason is introduced for this.
  KVM fills kvm_run.arm_sea with as much as possible information about
  the SEA, enabling VMM to emulate SEA to guest by itself.
  - Sanitized ESR_EL2. The general rule is to keep only the bits
    useful for userspace and relevant to guest memory.
  - Flags indicating if faulting guest physical address is valid.
  - Faulting guest physical and virtual addresses if valid.

Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
Co-developed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://msgid.link/20251013185903.1372553-2-jiaqiyan@google.com
Signed-off-by: Oliver Upton <oupton@kernel.org>
(cherry picked from commit ad9c62b)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Carol L. Soto <csoto@nvidia.com>
Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Test how KVM handles guest SEA when APEI is unable to claim it, and
KVM_CAP_ARM_SEA_TO_USER is enabled.

The behavior is triggered by consuming recoverable memory error (UER)
injected via EINJ. The test asserts two major things:
1. KVM returns to userspace with KVM_EXIT_ARM_SEA exit reason, and
   has provided expected fault information, e.g. esr, flags, gva, gpa.
2. Userspace is able to handle KVM_EXIT_ARM_SEA by injecting SEA to
   guest and KVM injects expected SEA into the VCPU.

Tested on a data center server running Siryn AmpereOne processor
that has RAS support.

Several things to notice before attempting to run this selftest:
- The test relies on EINJ support in both firmware and kernel to
  inject UER. Otherwise the test will be skipped.
- The under-test platform's APEI should be unable to claim the SEA.
  Otherwise the test will be skipped.
- Some platform doesn't support notrigger in EINJ, which may cause
  APEI and GHES to offline the memory before guest can consume
  injected UER, and making test unable to trigger SEA.

Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
Link: https://msgid.link/20251013185903.1372553-3-jiaqiyan@google.com
Signed-off-by: Oliver Upton <oupton@kernel.org>
(cherry picked from commit feee9ef)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Carol L. Soto <csoto@nvidia.com>
Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Document the new userspace-visible features and APIs for handling
synchronous external abort (SEA)
- KVM_CAP_ARM_SEA_TO_USER: How userspace enables the new feature.
- KVM_EXIT_ARM_SEA: exit userspace gets when it needs to handle SEA
  and what userspace gets while taking the SEA.

Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
Link: https://msgid.link/20251013185903.1372553-4-jiaqiyan@google.com
[ oliver: make documentation concise, remove implementation detail ]
Signed-off-by: Oliver Upton <oupton@kernel.org>
(cherry picked from commit 4debb5e)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Carol L. Soto <csoto@nvidia.com>
Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
The main use of {LD,ST}64B* is to talk to a device, which is hopefully
directly assigned to the guest and requires no additional handling.

However, this does not preclude a VMM from exposing a virtual device
to the guest, and to allow 64 byte accesses as part of the programming
interface. A direct consequence of this is that we need to be able
to forward such access to userspace.

Given that such a contraption is very unlikely to ever exist, we choose
to offer a limited service: userspace gets (as part of a new exit reason)
the ESR, the IPA, and that's it. It is fully expected to handle the full
semantics of the instructions, deal with ACCDATA, the return values and
increment PC. Much fun.

A canonical implementation can also simply inject an abort and be done
with it. Frankly, don't try to do anything else unless you have time
to waste.

Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Oliver Upton <oupton@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com>
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit f174a9f linux-next)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Carol L. Soto <csoto@nvidia.com>
Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Add a bit of documentation for KVM_EXIT_ARM_LDST64B so that userspace
knows what to expect.

Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Oliver Upton <oupton@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com>
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 902eeba linux-next)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Carol L. Soto <csoto@nvidia.com>
Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
…emory

If FEAT_LS64WB not supported, FEAT_LS64* instructions only support
to access Device/Uncacheable memory, otherwise a data abort for
unsupported Exclusive or atomic access (0x35, UAoEF) is generated
per spec. It's implementation defined whether the target exception
level is routed and is possible to implemented as route to EL2 on a
VHE VM according to DDI0487L.b Section C3.2.6 Single-copy atomic
64-byte load/store.

If it's implemented as generate the DABT to the final enabled stage
(stage-2), inject the UAoEF back to the guest after checking the
memslot is valid.

Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Oliver Upton <oupton@kernel.org>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com>
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 2937aee linux-next)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Carol L. Soto <csoto@nvidia.com>
Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Instructions introduced by FEAT_{LS64, LS64_V} is controlled by
HCRX_EL2.{EnALS, EnASR}. Configure all of these to allow usage
at EL0/1.

This doesn't mean these instructions are always available in
EL0/1 if provided. The hypervisor still have the control at
runtime.

Acked-by: Will Deacon <will@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Oliver Upton <oupton@kernel.org>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com>
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit dea58da linux-next)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Carol L. Soto <csoto@nvidia.com>
Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Using FEAT_{LS64, LS64_V} instructions in a guest is also controlled
by HCRX_EL2.{EnALS, EnASR}. Enable it if guest has related feature.

Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Oliver Upton <oupton@kernel.org>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com>
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 151b92c linux-next)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Carol L. Soto <csoto@nvidia.com>
Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Armv8.7 introduces single-copy atomic 64-byte loads and stores
instructions and its variants named under FEAT_{LS64, LS64_V}.
These features are identified by ID_AA64ISAR1_EL1.LS64 and the
use of such instructions in userspace (EL0) can be trapped.

As st64bv (FEAT_LS64_V) and st64bv0 (FEAT_LS64_ACCDATA) can not be tell
apart, FEAT_LS64 and FEAT_LS64_ACCDATA which will be supported in later
patch will be exported to userspace, FEAT_LS64_V will be enabled only
in kernel.

In order to support the use of corresponding instructions in userspace:
- Make ID_AA64ISAR1_EL1.LS64 visbile to userspace
- Add identifying and enabling in the cpufeature list
- Expose these support of these features to userspace through HWCAP3
  and cpuinfo

ld64b/st64b (FEAT_LS64) and st64bv (FEAT_LS64_V) is intended for
special memory (device memory) so requires support by the CPU, system
and target memory location (device that support these instructions).
The HWCAP3_LS64, implies the support of CPU and system (since no
identification method from system, so SoC vendors should advertise
support in the CPU if system also support them).

Otherwise for ld64b/st64b the atomicity may not be guaranteed or a
DABT will be generated, so users (probably userspace driver developer)
should make sure the target memory (device) also have the support.
For st64bv 0xffffffffffffffff will be returned as status result for
unsupported memory so user should check it.

Document the restrictions along with HWCAP3_LS64.

Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Oliver Upton <oupton@kernel.org>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com>
Signed-off-by: Will Deacon <will@kernel.org>
(backported from commit 58ce786 linux-next)
[mochs: Minor context cleanup due to lack of "arm64: Detect FEAT_XNX"]
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Carol L. Soto <csoto@nvidia.com>
Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Add tests for FEAT_LS64. Issue related instructions if feature
presents, no SIGILL should be received. When such instructions
operate on Device memory or non-cacheable memory, we may received
a SIGBUS during the test (w/o FEAT_LS64WB). Just ignore it since
we only tested whether the instruction itself can be issued as
expected on platforms declaring the support of such features.

Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Oliver Upton <oupton@kernel.org>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com>
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 57a9635 linux-next)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Carol L. Soto <csoto@nvidia.com>
Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
…evice-nGnRE

Add CONFIG_ARM64_WORKAROUND_NC_TO_NGNRE configuration option that
enables conversion of MT_NORMAL_NC (Normal Non-Cacheable) memory
attribute to Device-nGnRE memory type in MAIR_EL1 for hardware that
requires stricter memory ordering or has issues with Non-Cacheable
memory mappings.

Key changes:

1. New memory type MT_NORMAL_NC_DMA (Attr5):
   - Introduced specifically for DMA coherent memory mappings
   - Configured with the same Normal Non-Cacheable attribute (0x44)
     as MT_NORMAL_NC (Attr2) by default
   - pgprot_dmacoherent uses MT_NORMAL_NC_DMA when workaround is
     enabled, MT_NORMAL_NC otherwise

2. MAIR_EL1 conversion via alternatives framework:
   - arch/arm64/mm/proc.S uses ARM64 alternatives to patch MAIR_EL1
     during early boot
   - Converts MT_NORMAL_NC (Attr2) from 0x44 to 0x04 (Device-nGnRE)
     using efficient bfi instruction
   - MT_NORMAL_NC_DMA (Attr5) keeps the same attribute value as
     MT_NORMAL_NC originally had
   - Zero performance overhead when workaround is disabled

3. Boot-time configuration:
   - Enabled via kernel command line: mair_el1_nc_to_ngnre=1
   - Boot CPU fixup in enable_nc_to_ngnre() applies conversion before
     alternatives are patched
   - Secondary CPUs automatically use patched alternatives in
     __cpu_setup
   - Runtime changes not supported as alternatives cannot be
     re-patched after boot

4. Errata framework integration:
   - Registered in arm64_errata[] array as ARM64_WORKAROUND_NC_TO_NGNRE
   - Capability type: ARM64_CPUCAP_BOOT_CPU_FEATURE
   - Uses cpucap_is_possible() for build-time capability checking

The workaround preserves pgprot_dmacoherent behavior while allowing
MT_NORMAL_NC to be converted to Device memory type for other mappings
that may be affected by hardware issues. Ensure NC memory attribute
assignment is prevented for passthrough device MMIO regions.

Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Carol L. Soto <csoto@nvidia.com>
Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
CPU_CYCLES is expected to count the logical CPU (PE) clock. Currently it's
preferred to use PMCCNTR_EL0 for counting CPU_CYCLES, but it'll count
processor clock rather than the PE clock (ARM DDI0487 L.b D13.1.3) if
one of the SMT siblings is not idle on a multi-threaded implementation.
So don't use it on SMT cores.

Introduce topology_core_has_smt() for knowing the SMT implementation and
cached it in arm_pmu::has_smt during allocation.

When counting cycles on SMT CPU 2-3 and CPU 3 is idle, without this
patch we'll get:
[root@client1 tmp]# perf stat -e cycles -A -C 2-3 -- stress-ng -c 1
--taskset 2 --timeout 1
[...]
 Performance counter stats for 'CPU(s) 2-3':

CPU2           2880457316      cycles
CPU3           2880459810      cycles
       1.254688470 seconds time elapsed

With this patch the idle state of CPU3 is observed as expected:
[root@client1 ~]#  perf stat -e cycles -A -C 2-3 -- stress-ng -c 1
--taskset 2 --timeout 1
[...]
 Performance counter stats for 'CPU(s) 2-3':

CPU2           2558580492      cycles
CPU3               305749      cycles
       1.113626410 seconds time elapsed

Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit c3d78c3)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Signed-off-by: Jamie Nguyen <jamien@nvidia.com>
…ERIC_ARCH_TOPOLOGY

The arm_pmu driver is using topology_core_has_smt() for retrieving
the SMT implementation which depends on CONFIG_GENERIC_ARCH_TOPOLOGY.
The config is optional on arm platforms so provide a
!CONFIG_GENERIC_ARCH_TOPOLOGY stub for topology_core_has_smt().

Fixes: c3d78c3 ("perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202511041757.vuCGOmFc-lkp@intel.com/
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Yicong Yang <yangyccccc@gmail.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 7ab06ea)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Signed-off-by: Jamie Nguyen <jamien@nvidia.com>
Add the part number and MIDR for NVIDIA Olympus.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
(cherry picked from commit d5e4c71)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Signed-off-by: Jamie Nguyen <jamien@nvidia.com>
Add NVIDIA Olympus MIDR to neoverse_spe range list.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
(backported from commit d852b83)
[mochs: Minor context cleanup due to absence of "perf arm_spe: Add CPU variants supporting common data source packet"]
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Signed-off-by: Jamie Nguyen <jamien@nvidia.com>
Implementer may need to reset a filter config when
stopping a counter, thus adding a callback for this.

Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit a2573bc)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Signed-off-by: Jamie Nguyen <jamien@nvidia.com>
The PMIIDR value is composed by the values in PMPIDR registers.
We can use PMPIDR registers as alternative for device
identification for systems that do not implement PMIIDR.

Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 04330be)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Signed-off-by: Jamie Nguyen <jamien@nvidia.com>
Distinguish NVIDIA devices by revision and variant bits
in PMIIDR register in addition to product id.

Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 82dfd72)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Signed-off-by: Jamie Nguyen <jamien@nvidia.com>
Support NVIDIA PMU that utilizes the optional event filter2 register.

Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit decc368)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Signed-off-by: Jamie Nguyen <jamien@nvidia.com>
The documentation in nvidia-pmu.rst contains PMUs specific
to NVIDIA Tegra241 SoC. Rename the file for this specific
SoC to have better distinction with other NVIDIA SoC.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
(backported from https://lore.kernel.org/all/20260126181155.2776097-1-bwicaksono@nvidia.com/)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Signed-off-by: Jamie Nguyen <jamien@nvidia.com>
Adds Unified Coherent Fabric PMU support in Tegra410 SOC.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
(backported from https://lore.kernel.org/all/20260126181155.2776097-1-bwicaksono@nvidia.com/)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Signed-off-by: Jamie Nguyen <jamien@nvidia.com>
Add interface to get ACPI device associated with the
PMU. This ACPI device may contain additional properties
not covered by the standard properties.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
(backported from https://lore.kernel.org/all/20260126181155.2776097-1-bwicaksono@nvidia.com/)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Signed-off-by: Jamie Nguyen <jamien@nvidia.com>
Adds PCIE PMU support in Tegra410 SOC.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
(backported from https://lore.kernel.org/all/20260126181155.2776097-1-bwicaksono@nvidia.com/)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Signed-off-by: Jamie Nguyen <jamien@nvidia.com>
Adds PCIE-TGT PMU support in Tegra410 SOC.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
(backported from https://lore.kernel.org/all/20260126181155.2776097-1-bwicaksono@nvidia.com/)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Signed-off-by: Jamie Nguyen <jamien@nvidia.com>
ltrager and others added 15 commits May 24, 2026 00:58
…cy PMU"

This reverts commit e0ab9dd.

Signed-off-by: Lee Trager <ltrager@nvidia.com>
Acked-by: Seth Forshee <sforshee@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
…TGT PMU"

This reverts commit 9361be0.

Signed-off-by: Lee Trager <ltrager@nvidia.com>
Acked-by: Seth Forshee <sforshee@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
…PMU"

This reverts commit 839af7d.

Signed-off-by: Lee Trager <ltrager@nvidia.com>
Acked-by: Seth Forshee <sforshee@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
This reverts commit bb2ae52.

Signed-off-by: Lee Trager <ltrager@nvidia.com>
Acked-by: Seth Forshee <sforshee@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
This reverts commit 4a814e6.

Signed-off-by: Lee Trager <ltrager@nvidia.com>
Acked-by: Seth Forshee <sforshee@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
…a241"

This reverts commit f2a7f36.

Signed-off-by: Lee Trager <ltrager@nvidia.com>
Acked-by: Seth Forshee <sforshee@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
The documentation in nvidia-pmu.rst contains PMUs specific
to NVIDIA Tegra241 SoC. Rename the file for this specific
SoC to have better distinction with other NVIDIA SoC.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit d332424)
Signed-off-by: Lee Trager <ltrager@nvidia.com>
Acked-by: Seth Forshee <sforshee@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
The Unified Coherence Fabric (UCF) contains last level cache
and cache coherent interconnect in Tegra410 SOC. The PMU in
this device can be used to capture events related to access
to the last level cache and memory from different sources.

Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit f5caf26)
Signed-off-by: Lee Trager <ltrager@nvidia.com>
Acked-by: Seth Forshee <sforshee@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Add interface to get ACPI device associated with the
PMU. This ACPI device may contain additional properties
not covered by the standard properties.

Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit bc86281)
Signed-off-by: Lee Trager <ltrager@nvidia.com>
Acked-by: Seth Forshee <sforshee@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Adds PCIE PMU support in Tegra410 SOC. This PMU is instanced
in each root complex in the SOC and can capture traffic from
PCIE device to various memory types. This PMU can filter traffic
based on the originating root port or BDF and the target memory
types (CPU DRAM, GPU Memory, CXL Memory, or remote Memory).

Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit bf585ba)
Signed-off-by: Lee Trager <ltrager@nvidia.com>
Acked-by: Seth Forshee <sforshee@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Adds PCIE-TGT PMU support in Tegra410 SOC. This PMU is
instanced in each root complex in the SOC and it captures
traffic originating from any source towards PCIE BAR and CXL
HDM range. The traffic can be filtered based on the
destination root port or target address range.

Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 3dd7302)
Signed-off-by: Lee Trager <ltrager@nvidia.com>
Acked-by: Seth Forshee <sforshee@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Adds CPU Memory (CMEM) Latency PMU support in Tegra410 SOC.
The PMU is used to measure latency between the edge of the
Unified Coherence Fabric to the local system DRAM.

Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 429b763)
Signed-off-by: Lee Trager <ltrager@nvidia.com>
Acked-by: Seth Forshee <sforshee@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Adds NVIDIA C2C PMU support in Tegra410 SOC. This PMU is
used to measure memory latency between the SOC and device
memory, e.g GPU Memory (GMEM), CXL Memory, or memory on
remote Tegra410 SOC.

Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 2f89b7f)
Signed-off-by: Lee Trager <ltrager@nvidia.com>
Acked-by: Seth Forshee <sforshee@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Add JSON files for NVIDIA Tegra410 Olympus core PMU events.
Also updated the common-and-microarch.json.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
(cherry picked from commit 86ff690)
Signed-off-by: Lee Trager <ltrager@nvidia.com>
Acked-by: Seth Forshee <sforshee@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
PMCCNTR_EL0 may continue to increment on NVIDIA Olympus CPUs while the
PE is in WFI/WFE. That does not necessarily match the CPU_CYCLES event
counted by a programmable counter, so using PMCCNTR_EL0 for cycles can
give results that differ from the programmable counter path.

Extend the existing PMCCNTR avoidance decision from the SMT case to
also cover Olympus. Store the result in the common arm_pmu state at
registration time, so arm_pmuv3 can keep using a single flag when
deciding whether CPU_CYCLES may use PMCCNTR_EL0.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
(cherry picked from https://lore.kernel.org/all/20260504175204.3122979-1-bwicaksono@nvidia.com/)
Signed-off-by: Lee Trager <ltrager@nvidia.com>
Acked-by: Seth Forshee <sforshee@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
@nirmoy nirmoy added the help wanted Extra attention is needed label May 28, 2026
@nirmoy
Copy link
Copy Markdown
Collaborator Author

nirmoy commented May 28, 2026

Boro review

Summary

No issues found across the reviewed commits.

Findings: no problems found

Latest watcher review: open review

Kernel deb build: successful (download debs, 4 files)

Head: b8a7745cab1c

This comment is maintained by nv-pr-bot. It is updated when the GitHub watcher publishes a newer review.

@nirmoy nirmoy removed the help wanted Extra attention is needed label May 28, 2026
@nirmoy nirmoy force-pushed the codex/pericom-msix-bar-war-6.18 branch 2 times, most recently from 55587b6 to 7f1b4f6 Compare May 28, 2026 20:42
@nirmoy nirmoy marked this pull request as ready for review May 28, 2026 21:01
@nirmoy nirmoy added the help wanted Extra attention is needed label May 28, 2026
@nirmoy nirmoy force-pushed the codex/pericom-msix-bar-war-6.18 branch 2 times, most recently from 8654dc2 to 4e33789 Compare May 29, 2026 19:24
Some Pericom/Diodes PI7C9X3G606GPC switches require downstream
Port 4 BAR0 to mirror BAR0 of the immediate upstream port. Firmware may
apply this during boot, but Linux PCI resource assignment can move the
upstream BAR0 and leave Port 4 without the required mirror.

Diodes confirmed that Tile0/P4 is OS-visible as device 04, function 0 on
the bus below the upstream port. Add a final and early resume quirk for
that downstream function. The quirk verifies that the immediate upstream
bridge is the same switch, then writes Port 4 BAR0 from the upstream
BAR0 after resource assignment and during early resume. Port 4 BAR0 may
read back as zero even after a successful write, so the write must be
validated by platform-specific means.

Change-Id: I01079d67c4f665da5162180db929e2fe43d64ac2
Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
@nirmoy nirmoy force-pushed the codex/pericom-msix-bar-war-6.18 branch from 4e33789 to b8a7745 Compare May 29, 2026 19:27
Copy link
Copy Markdown
Collaborator

@arighi arighi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Acked-by: Andrea Righi <arighi@nvidia.com>

@github-actions github-actions Bot force-pushed the linux-nvidia-6.18-next branch from a2af04d to 295622c Compare June 2, 2026 01:06
@sforshee
Copy link
Copy Markdown
Collaborator

sforshee commented Jun 3, 2026

I have a similar question about 64-bit bars as was raised on the PR for 6.17. Seems that BAR0 is expected to be 32-bit but the 64-bit BAR check was removed. I'd like to see the resolution to this question before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

has_1_ack help wanted Extra attention is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.