Skip to content

[26.04_linux-nvidia-bos] PCI: mirror PI7C9X3G606GPC Port 4 BAR0#443

Open
nirmoy wants to merge 10 commits into
NVIDIA:26.04_linux-nvidia-bosfrom
nirmoy:codex/pericom-msix-bar-war-bos
Open

[26.04_linux-nvidia-bos] PCI: mirror PI7C9X3G606GPC Port 4 BAR0#443
nirmoy wants to merge 10 commits into
NVIDIA:26.04_linux-nvidia-bosfrom
nirmoy:codex/pericom-msix-bar-war-bos

Conversation

@nirmoy
Copy link
Copy Markdown
Collaborator

@nirmoy nirmoy commented May 27, 2026

Summary

  • Backport the PI7C9X3G606GPC Port 4 BAR0 workaround from the 6.17 PR.
  • Add a PCI final/early-resume quirk to mirror the upstream BAR0 value into downstream Port 4 BAR0.
  • Scope the WAR to the Diodes-confirmed OS-visible Tile0/P4 mapping: upstream bus + 1, device 04, function 0.
  • Port 4 BAR0 may read back as zero through normal PCI config space even after a successful write, so the quirk rewrites BAR0 whenever it runs.

Validation

  • Current PR head: a69e886f8becb7608a3ca2e26315d384383c82ab.
  • Local patch checks passed at current head:
    • scripts/checkpatch.pl --strict --ignore GERRIT_CHANGE_ID --git HEAD
    • git diff --check HEAD~1..HEAD
    • make O=/tmp/nv-kernels-pr443-quirks-build -j$(nproc) drivers/pci/quirks.o
  • Previous BOS package validation booted 7.0.0-2007-nvidia-bos-64k on the Quark DUT: OS 172.17.33.143 via jumper 10.22.18.250; BMC 172.17.33.144.
Linux localhost-right 7.0.0-2007-nvidia-bos-64k #7 SMP PREEMPT_DYNAMIC Thu May 28 19:54:58 UTC 2026 aarch64
  • Quirk/topology evidence from the booted BOS kernel:
pci 0002:a1:00.0: BAR 0 [mem 0x10300000-0x1037ffff]
pci 0002:a2:04.0: [12d8:c008] type 01 class 0x060400 PCIe Switch Downstream Port
pci 0002:a3:00.0: [1344:51c3] type 00 class 0x010802 PCIe Endpoint
pci 0002:a1:00.0: BAR 0 [mem 0x10300000-0x1037ffff]: assigned
pci 0002:a2:04.0: wrote upstream BAR 0 0x10300000 to Port 4 BAR 0 for PI7C9X3G606GPC BAR0 mirror workaround
  • Retested the equivalent 6.18 PR code after removing the 64-bit upstream BAR0 skip. The Quark DUT booted 6.18.33-pr447-pericom-no64 and showed the quirk still firing:
pci 0002:a1:00.0: BAR 0 [mem 0x10300000-0x1037ffff]: assigned
pci 0002:a2:04.0: wrote upstream BAR 0 0x10300000 to Port 4 BAR 0 for PI7C9X3G606GPC BAR0 mirror workaround
  • Ran a 300s fio randrw smoke on the NVMe-backed rootfs with the no-64-skip test kernel:
pr447-no64-rootfs-smoke: err= 0
READ: bw=251MiB/s, io=73.6GiB, run=300009msec
WRITE: bw=108MiB/s, io=31.6GiB, run=300009msec
  • Post-fio journalctl -b -k scan for BTRFS error, I/O error, nvme.*timeout, device inaccessible, read-only, blk_update_request, and Buffer I/O error returned no matches.
  • Did not use the BMC/I2C BAR0 readback helper for this validation. The Quark platform owner said that helper uses special CPED/CDEP access that is not supported as routine validation on this platform and can put the PCIe switch into a bad state.

Notes:

  • The 24.04 DUT needed a temporary run-parts compatibility wrapper to run the 26.04 BOS kernel maintscripts, which pass multiple hook directories. The wrapper was removed after package configuration.
  • OFED/DKMS modules on this 24.04 DUT report unsupported test-kernel headers for some test kernels. DKMS hooks were temporarily bypassed only to finish package configuration/initramfs/grub; hooks were restored afterward.

References

Launchpad: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.17/+bug/2154457

6.17 PR: #442
NVBug: https://nvbugspro.nvidia.com/bug/6205517
NVBug: https://nvbugspro.nvidia.com/bug/6134331

jacobmartin0 and others added 9 commits May 22, 2026 11:40
Ignore: yes
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2153497
Properties: no-test-build
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
Some options were not ordered as the annotations tool expected them to
be.

Ignore: yes
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
This is necessary to bypass dependencies declared by the nvidia-fs
dkms.conf that are present on the system, detected by the nvidia-fs
build, but not in the source directory used by dkms and so not detected
by dkms.

Ignore: yes
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
…rnel-versions (adhoc/d2026.05.20)

BugLink: https://bugs.launchpad.net/bugs/1786013
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2147212

Refactor the BPMP driver by moving channel initialization and Device
Tree resource parsing into separate helper functions. This prepares the
driver for ACPI support, where these helpers will be skipped because
channel initialization is handled by ACPI AML methods on ACPI-based
systems.

Signed-off-by: Aniruddha Rao <anrao@nvidia.com>

(backported from V4 internal mail <20260423140823.2848045-2-anrao@nvidia.com>)
[kobak: Preserve threaded channel count/semaphore initialization after
 the helper split and align rx_channel allocation continuation.]

Signed-off-by: Koba Ko <kobak@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2147212

This patch adds required changes in the Tegra BPMP driver to make it
compatible with ACPI based platforms.
On ACPI systems, IPC is handled through the AML method instead of the
core kernel framework using Mailboxes and IVC.
Bypass clock, reset and powergate init calls as these are not controlled
by the Linux drivers on ACPI based systems.

Signed-off-by: Aniruddha Rao <anrao@nvidia.com>

(backported from V4 internal mail <20260423140823.2848045-3-anrao@nvidia.com>)
[kobak: Add !ACPI_HANDLE(bpmp->dev) NULL guard around bpmp->soc->ops->init
 because ACPI match driver_data=0 makes bpmp->soc NULL; make BPMP
 debugfs directory per-device on ACPI systems to avoid duplicate
 /sys/kernel/debug/bpmp collision on dual NVDA3001 instances; remove
 unused i; heap allocate the ACPI BMRQ package to avoid the frame-size
 warning; reject short BMRQ replies before copying response data; add
 CONFIG_ACPI stub for the ACPI helper; restore the public
 irqs_disabled() guard before ACPI/DT transport selection.]

Signed-off-by: Koba Ko <kobak@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2147212

Tegra410 exposes memory bandwidth QoS for PCIe and GPU UPHY traffic on the
path to DRAM. Each bandwidth group can cap PCIe read, PCIe write, or
combined GPU UPHY read and write traffic, with target limits.

The memory bandwidth QoS is not exposed as ordinary host MMIO and
cannot be controlled from the kernel. The bandwidth limits can be programmed
by sending the corresponding requests (MBWT MRQ) to the BPMP.

On Tegra410, an ACPI-based platform, Linux BPMP driver does not use the
device-tree mailbox path for communicating with the BPMP firmware.
As a result, there is no existing client driver or interface that can be
used to send the memory bandwidth requests to the BPMP.

This patch exposes a sysfs directory mbwt_control on the tegra-bpmp platform
device with pcie_instance_id, vc_type, and bandwidth.
Writing bandwidth issues an MBWT_SET for the selected group (pcie_instance_id)
and traffic class (vc_type).
A read issues MBWT_GET and returns the bandwidth value reported by firmware.

These attributes are exposed only if MBWT QUERY probe reports both
MBWT_SET and MBWT_GET commands as supported.

ABI documented in Documentation/ABI/testing/sys-platform-tegra-bpmp

Signed-off-by: Aniruddha Rao <anrao@nvidia.com>

(backported from V4 internal mail <20260423140823.2848045-4-anrao@nvidia.com>)
[kobak: Keep functional MRQ_SOCHUB_MBWT ABI definitions and sysfs
 interface from V4; condense verbose per-field ABI comments while
 preserving enum/struct layout and Documentation/ABI coverage; use
 refcounted kobject allocation for mbwt_control; validate
 pcie_instance_id/vc_type before staging; return only the bandwidth
 value from bandwidth reads; report BPMP SET rejections to userspace;
 fail BPMP probe on mbwt_control sysfs registration failure so 7.0-bos
 does not silently boot without the requested MBWT interface.]

Signed-off-by: Koba Ko <kobak@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
@nirmoy nirmoy added the help wanted Extra attention is needed label May 27, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 27, 2026

PR Validation Report

Patchscan ✅ No Missing Fixes

All cherry-picked commits checked — no missing upstream fixes found.

PR Lint ✅ All checks passed

Details
Checking 1 commits...

Cherry-pick digest:
┌──────────────┬──────────────────────────────────────────────────────────────────┬────────────┬─────────┬───────────────────────────┐
│ Local        │ Referenced upstream / Patch subject                              │ Patch-ID   │ Subject │ SoB chain                 │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ a69e886f8bec │ [SAUCE] pci: quirks: mirror pi7c9x3g606gpc port 4 bar0           │ N/A        │ N/A     │ nirmoyd                   │
└──────────────┴──────────────────────────────────────────────────────────────────┴────────────┴─────────┴───────────────────────────┘

Lint: all checks passed.

@nirmoy
Copy link
Copy Markdown
Collaborator Author

nirmoy commented May 27, 2026

Boro review

Summary

No issues found across the reviewed commits.

Findings: no problems found

Latest watcher review: open review

Kernel deb build: successful (download debs, 4 files)

Head: a69e886f8bec

This comment is maintained by nv-pr-bot. It is updated when the GitHub watcher publishes a newer review.

@nirmoy nirmoy force-pushed the codex/pericom-msix-bar-war-bos branch from ed7f4ba to 5f3ea04 Compare May 28, 2026 13:22
@nirmoy nirmoy marked this pull request as ready for review May 28, 2026 14:17
@nirmoy nirmoy marked this pull request as draft May 28, 2026 15:00
@nirmoy nirmoy removed the help wanted Extra attention is needed label May 28, 2026
@nirmoy nirmoy force-pushed the codex/pericom-msix-bar-war-bos branch 2 times, most recently from b836068 to f171d75 Compare May 28, 2026 20:42
@nirmoy nirmoy marked this pull request as ready for review May 28, 2026 21:01
@nirmoy nirmoy added the help wanted Extra attention is needed label May 28, 2026
@nirmoy nirmoy force-pushed the codex/pericom-msix-bar-war-bos branch 2 times, most recently from 0b64a56 to 24d68b9 Compare May 29, 2026 19:24
Some Pericom/Diodes PI7C9X3G606GPC switches require downstream
Port 4 BAR0 to mirror BAR0 of the immediate upstream port. Firmware may
apply this during boot, but Linux PCI resource assignment can move the
upstream BAR0 and leave Port 4 without the required mirror.

Diodes confirmed that Tile0/P4 is OS-visible as device 04, function 0 on
the bus below the upstream port. Add a final and early resume quirk for
that downstream function. The quirk verifies that the immediate upstream
bridge is the same switch, then writes Port 4 BAR0 from the upstream
BAR0 after resource assignment and during early resume. Port 4 BAR0 may
read back as zero even after a successful write, so the write must be
validated by platform-specific means.

Change-Id: I25f2390bf686109487b60d85d1573f8883e7ad28
Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
@nirmoy nirmoy force-pushed the codex/pericom-msix-bar-war-bos branch from 24d68b9 to a69e886 Compare May 29, 2026 19:27
@nvidia-bfigg nvidia-bfigg force-pushed the 26.04_linux-nvidia-bos branch from 4a0dd52 to b35ada9 Compare June 1, 2026 23:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

help wanted Extra attention is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants