NVIDIA: SAUCE: iommu/arm-smmu-v3: Use identity domain for ASPEED BMC devices#371
Conversation
405bb46 to
53b7614
Compare
✅ Patchscan: No Missing FixesAll cherry-picked commits have been checked — no missing upstream fixes found. |
clsotog
left a comment
There was a problem hiding this comment.
Acked-by: Carol L Soto <csoto@nvidia.com>
| (pdev->bus->self->dev_flags & | ||
| PCI_DEV_FLAGS_PCI_BRIDGE_NO_ALIASES) && | ||
| pdev->bus->self->vendor == PCI_VENDOR_ID_ASPEED) | ||
| return IOMMU_DOMAIN_IDENTITY; |
There was a problem hiding this comment.
This will hide future ASPEED firmware bug we should do affected device ID check like AST1150 device ID and report this to ASPEED.
There was a problem hiding this comment.
FWDed an email which has contact of aspeed folks. Is this regression hapened after the patch which added PCI_DEV_FLAGS_PCI_BRIDGE_NO_ALIASES ? PCI_DEV_FLAGS_PCI_BRIDGE_NO_ALIASES actually fixed a real issue it was working on Grace and Vera. So I wonder if there was firmware/HWE kernel update that introduced this regression.
There was a problem hiding this comment.
okay, ack.
yes, after applied, the issue is observed in lego and vera
do you know the aspeed contacts?
There was a problem hiding this comment.
I have forwarded you an email discussion that I had for PCI_DEV_FLAGS_PCI_BRIDGE_NO_ALIASES
There was a problem hiding this comment.
Did you not observe the F_TRANSLATION ?
There was a problem hiding this comment.
This will hide future ASPEED firmware bug we should do affected device ID check like
AST1150 device IDand report this to ASPEED.
This will hide future ASPEED firmware bug we should do affected device ID check like
AST1150 device IDand report this to ASPEED.
Added device check.
There was a problem hiding this comment.
There was one F_TRANSLATION at the boot time but lsusb showed all the usb devices properly. That time I didn't think F_TRANSLATION was related to the usb controller but now I think it was.
There was a problem hiding this comment.
I thought the regression happened with that PCI_DEV_FLAGS_PCI_BRIDGE_NO_ALIASES but at the lego system I went back to 6.14 kernel and with iommu.passthrough=0 I will see the translation errors. If I use iommu.passthrough=1 I do not see it.
At vera eboard when I tried iommu.passthrough=1 last month, I was seeing another error but last night did not see it.
There was a problem hiding this comment.
I think PCI_DEV_FLAGS_PCI_BRIDGE_NO_ALIASES just exposed this firmware issue but I am not sure why I didn't see this
53b7614 to
a21698b
Compare
nirmoy
left a comment
There was a problem hiding this comment.
Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
|
No issues with the patch.
@KobaKoNvidia - What are your plans for upstreaming this? Do we need to discuss with ASPEED first if they can (or plan to) address in firmware? |
|
Retested. |
I sent mail to ASPEED guy so let's have a discussion first |
8b07926 to
80bac29
Compare
…devices ASPEED BMC devices behind an AST1150 PCIe-to-PCI bridge receive DMA from BMC firmware using host physical addresses that bypass the kernel's DMA API entirely. When these devices are assigned a DMA translated domain, the SMMU generates F_TRANSLATION faults because the BMC's physical addresses have no corresponding IOVA mappings in the SMMU page tables. Fix this by returning IOMMU_DOMAIN_IDENTITY for PCI devices whose parent bridge has both the PCI_BRIDGE_NO_ALIASES flag and an ASPEED vendor ID, so the SMMU passes BMC DMA transactions through untranslated. Signed-off-by: Koba Ko <kobak@nvidia.com>
a21698b to
55f7351
Compare
PR Validation ReportPatchscan ✅ No Missing FixesAll cherry-picked commits checked — no missing upstream fixes found. PR Lint
|
|
Merged, closing PR. |
[Description]
ASPEED BMC devices (VGA, USB) behind an AST1150 PCIe-to-PCI bridge receive
DMA from BMC firmware using host physical addresses that bypass the kernel's
DMA API. When these devices are assigned a DMA translated domain, the SMMU
generates F_TRANSLATION faults because the BMC's physical addresses have no
corresponding IOVA mappings in the SMMU page tables.
Without the AST1150 NO_ALIAS quirk (commit 550a190), USB probe failed
in arm_smmu_insert_master() due to StreamID collision with VGA, leaving USB
in SMMU bypass mode and masking the BMC DMA issue. With the quirk applied,
both devices probe successfully into a translated domain, exposing the
underlying BMC DMA problem.
Fix this by returning IOMMU_DOMAIN_IDENTITY in arm_smmu_def_domain_type()
for PCI devices whose parent bridge has both the PCI_BRIDGE_NO_ALIASES flag
and an ASPEED vendor ID. Identity domain passes BMC DMA through untranslated.
Bug: 5918716
[How to verify and results]
Platform: Vera
Kernel: 6.17.13-kobak-no64dma+ based on 24.04_linux-nvidia-6.17-next
Boot cmdline: iommu.passthrough=0
Boot without patch, check dmesg:
$ sudo dmesg | grep F_TRANSLATION
Expected: 8+ F_TRANSLATION faults for client 0008:02:02.0 (ASPEED USB)
Apply patch, rebuild, reboot, check dmesg:
$ sudo dmesg | grep F_TRANSLATION
Expected: Zero F_TRANSLATION for ASPEED devices
Verify USB and VGA functional:
$ lsusb # ASPEED virtual hub visible
$ sudo dmesg | grep cdc_ether # CDC Ethernet registered
$ sudo dmesg | grep ast.*drm # AST DRM loaded
Results:
Verified by:
LP: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia/+bug/2150470