Skip to content

NVIDIA: SAUCE: iommu/arm-smmu-v3: Use identity domain for ASPEED BMC devices#371

Closed
kobak2026 wants to merge 1 commit into
NVIDIA:24.04_linux-nvidia-6.17-nextfrom
kobak2026:bug-5918716/smmu-identity-aspeed
Closed

NVIDIA: SAUCE: iommu/arm-smmu-v3: Use identity domain for ASPEED BMC devices#371
kobak2026 wants to merge 1 commit into
NVIDIA:24.04_linux-nvidia-6.17-nextfrom
kobak2026:bug-5918716/smmu-identity-aspeed

Conversation

@kobak2026
Copy link
Copy Markdown
Collaborator

@kobak2026 kobak2026 commented Apr 16, 2026

[Description]

ASPEED BMC devices (VGA, USB) behind an AST1150 PCIe-to-PCI bridge receive
DMA from BMC firmware using host physical addresses that bypass the kernel's
DMA API. When these devices are assigned a DMA translated domain, the SMMU
generates F_TRANSLATION faults because the BMC's physical addresses have no
corresponding IOVA mappings in the SMMU page tables.

Without the AST1150 NO_ALIAS quirk (commit 550a190), USB probe failed
in arm_smmu_insert_master() due to StreamID collision with VGA, leaving USB
in SMMU bypass mode and masking the BMC DMA issue. With the quirk applied,
both devices probe successfully into a translated domain, exposing the
underlying BMC DMA problem.

Fix this by returning IOMMU_DOMAIN_IDENTITY in arm_smmu_def_domain_type()
for PCI devices whose parent bridge has both the PCI_BRIDGE_NO_ALIASES flag
and an ASPEED vendor ID. Identity domain passes BMC DMA through untranslated.

Bug: 5918716

[How to verify and results]

Platform: Vera
Kernel: 6.17.13-kobak-no64dma+ based on 24.04_linux-nvidia-6.17-next
Boot cmdline: iommu.passthrough=0

  1. Boot without patch, check dmesg:

    $ sudo dmesg | grep F_TRANSLATION

    Expected: 8+ F_TRANSLATION faults for client 0008:02:02.0 (ASPEED USB)

  2. Apply patch, rebuild, reboot, check dmesg:

    $ sudo dmesg | grep F_TRANSLATION

    Expected: Zero F_TRANSLATION for ASPEED devices

  3. Verify USB and VGA functional:

    $ lsusb # ASPEED virtual hub visible
    $ sudo dmesg | grep cdc_ether # CDC Ethernet registered
    $ sudo dmesg | grep ast.*drm # AST DRM loaded

Results:

Test ASPEED F_TRANSLATION USB VGA
Without fix (passthrough=0) 8+ faults per boot Works Works
iommu.passthrough=1 Zero Works Works
With identity domain patch Zero Works Works

Verified by:

  • Koba Ko: 2026-04-15 on Vera
  • Carol Soto: 2026-04-15 on Vera eboard — "the patch is ok"

LP: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia/+bug/2150470

@kobak2026 kobak2026 force-pushed the bug-5918716/smmu-identity-aspeed branch from 405bb46 to 53b7614 Compare April 16, 2026 16:43
@kobak2026 kobak2026 requested review from clsotog, nirmoy and nvmochs April 16, 2026 16:43
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 16, 2026

✅ Patchscan: No Missing Fixes

All cherry-picked commits have been checked — no missing upstream fixes found.

Copy link
Copy Markdown
Collaborator

@clsotog clsotog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acked-by: Carol L Soto <csoto@nvidia.com>

(pdev->bus->self->dev_flags &
PCI_DEV_FLAGS_PCI_BRIDGE_NO_ALIASES) &&
pdev->bus->self->vendor == PCI_VENDOR_ID_ASPEED)
return IOMMU_DOMAIN_IDENTITY;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will hide future ASPEED firmware bug we should do affected device ID check like AST1150 device ID and report this to ASPEED.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWDed an email which has contact of aspeed folks. Is this regression hapened after the patch which added PCI_DEV_FLAGS_PCI_BRIDGE_NO_ALIASES ? PCI_DEV_FLAGS_PCI_BRIDGE_NO_ALIASES actually fixed a real issue it was working on Grace and Vera. So I wonder if there was firmware/HWE kernel update that introduced this regression.

Copy link
Copy Markdown
Collaborator Author

@kobak2026 kobak2026 Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, ack.
yes, after applied, the issue is observed in lego and vera
do you know the aspeed contacts?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have forwarded you an email discussion that I had for PCI_DEV_FLAGS_PCI_BRIDGE_NO_ALIASES

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you not observe the F_TRANSLATION ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will hide future ASPEED firmware bug we should do affected device ID check like AST1150 device ID and report this to ASPEED.

This will hide future ASPEED firmware bug we should do affected device ID check like AST1150 device ID and report this to ASPEED.

Added device check.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was one F_TRANSLATION at the boot time but lsusb showed all the usb devices properly. That time I didn't think F_TRANSLATION was related to the usb controller but now I think it was.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the regression happened with that PCI_DEV_FLAGS_PCI_BRIDGE_NO_ALIASES but at the lego system I went back to 6.14 kernel and with iommu.passthrough=0 I will see the translation errors. If I use iommu.passthrough=1 I do not see it.
At vera eboard when I tried iommu.passthrough=1 last month, I was seeing another error but last night did not see it.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think PCI_DEV_FLAGS_PCI_BRIDGE_NO_ALIASES just exposed this firmware issue but I am not sure why I didn't see this

@kobak2026 kobak2026 force-pushed the bug-5918716/smmu-identity-aspeed branch from 53b7614 to a21698b Compare April 16, 2026 17:57
Copy link
Copy Markdown
Collaborator

@nirmoy nirmoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acked-by: Nirmoy Das <nirmoyd@nvidia.com>

@nvmochs
Copy link
Copy Markdown
Collaborator

nvmochs commented Apr 16, 2026

No issues with the patch.

Acked-by: Matthew R. Ochs <mochs@nvidia.com>

@KobaKoNvidia - What are your plans for upstreaming this? Do we need to discuss with ASPEED first if they can (or plan to) address in firmware?

@clsotog
Copy link
Copy Markdown
Collaborator

clsotog commented Apr 16, 2026

Retested. Acked-by: Carol L Soto <csoto@nvidia.com>

@kobak2026
Copy link
Copy Markdown
Collaborator Author

No issues with the patch.

Acked-by: Matthew R. Ochs <mochs@nvidia.com>

@KobaKoNvidia - What are your plans for upstreaming this? Do we need to discuss with ASPEED first if they can (or plan to) address in firmware?

I sent mail to ASPEED guy so let's have a discussion first

…devices

ASPEED BMC devices behind an AST1150 PCIe-to-PCI bridge receive DMA
from BMC firmware using host physical addresses that bypass the
kernel's DMA API entirely.

When these devices are assigned a DMA translated domain, the SMMU
generates F_TRANSLATION faults because the BMC's physical addresses
have no corresponding IOVA mappings in the SMMU page tables.

Fix this by returning IOMMU_DOMAIN_IDENTITY for PCI devices whose
parent bridge has both the PCI_BRIDGE_NO_ALIASES flag and an ASPEED
vendor ID, so the SMMU passes BMC DMA transactions through
untranslated.

Signed-off-by: Koba Ko <kobak@nvidia.com>
@kobak2026 kobak2026 force-pushed the bug-5918716/smmu-identity-aspeed branch from a21698b to 55f7351 Compare April 27, 2026 15:59
@github-actions
Copy link
Copy Markdown
Contributor

PR Validation Report

Patchscan ✅ No Missing Fixes

All cherry-picked commits checked — no missing upstream fixes found.

PR Lint ⚠️ Warnings

Details
Checking 1 commits...

Cherry-pick digest:
┌──────────────┬───────────────────────────────────────────────┬────────────┬─────────┬───────────────────────────┐
│ Local        │ Referenced upstream / Patch subject           │ Patch-ID   │ Subject │ SoB chain                 │
├──────────────┼───────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 55f7351fc1d1 │ [SAUCE] iommu/arm-smmu-v3: use identity domai │ N/A        │ N/A     │ kobak                     │
└──────────────┴───────────────────────────────────────────────┴────────────┴─────────┴───────────────────────────┘

Lint: all checks passed.

PR metadata:
W: PR title missing [<branch>] prefix: "NVIDIA: SAUCE: iommu/arm-smmu-v3: Use identity domain for ASPEED BMC devices"

@nvmochs
Copy link
Copy Markdown
Collaborator

nvmochs commented Apr 27, 2026

Merged, closing PR.

738fff0e2060 (nnoble/nvidia-6.17-next) NVIDIA: SAUCE: iommu/arm-smmu-v3: Use identity domain for ASPEED BMC devices

@nvmochs nvmochs closed this Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants