[for 26.04_linux-nvidia-bos]: NVIDIA: SAUCE: vfio/nvgrace-gpu: Add Blackwell-Next GPU readiness check via CXL DVSEC#382
Closed
nirmoy wants to merge 1 commit into
Conversation
…ck via CXL DVSEC Add a CXL DVSEC-based readiness check for Blackwell-Next GPUs alongside the existing legacy BAR0 polling path. On probe and after reset, the driver reads the CXL Device DVSEC capability to determine whether the GPU memory is valid. This is checked by polling on the Memory_Active bit based on the Memory_Active_Timeout. Also check if MEM_INFO_VALID is set within 1 second per CXL spec 4.0 Tables 8-13. If not, return error. A static inline wrapper dispatches to the appropriate readiness check based on whether the CXL DVSEC capability is present. Add PCI_DVSEC_CXL_MEM_ACTIVE_TIMEOUT to pci_regs.h for the timeout field encoding. cc: Kevin Tian <kevin.tian@intel.com> Suggested-by: Alex Williamson <alex@shazbot.org> Signed-off-by: Ankit Agrawal <ankita@nvidia.com> (backported from https://lore.kernel.org/all/20260416014504.63067-1-ankita@nvidia.com/) [nirmoy: kept both egm_node (existing EGM SAUCE) and cxl_dvsec in struct to avoid conflict with EGM backport] Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
Contributor
✅ Patchscan: No Missing FixesAll cherry-picked commits have been checked — no missing upstream fixes found. |
77eb99a to
0224667
Compare
Collaborator
|
Verified this matches v3 on LKML.
|
Collaborator
|
|
Collaborator
|
Applied: Closing PR. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
BugLink: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-7.0/+bug/2148701
Summary
Backport of the vfio/nvgrace-gpu Blackwell-Next (VR) GPU readiness check (v3) from LKML to
26.04_linux-nvidia-bos.Patch: https://lore.kernel.org/all/20260416014504.63067-1-ankita@nvidia.com/
Blackwell-Next (VR) GPUs report device readiness via the CXL DVSEC Range 1 Low register instead of the BAR0 HBM training register used by GB200. Adds runtime detection by checking the presence of the DVSEC register and routes to the new method if present, otherwise falls back to the legacy approach.
Jira: https://jirasw.nvidia.com/browse/DGX-16091
Tested with: GPU passthrough test on Blackwell-Next (VR) hardware.