[for 26.04_linux-nvidia]: NVIDIA: SAUCE: vfio/nvgrace-gpu: Add Blackwell-Next GPU readiness check via CXL DVSEC#380
Closed
nirmoy wants to merge 1 commit into
Conversation
…ck via CXL DVSEC Blackwell-Next GPUs report device readiness via the CXL DVSEC Range 1 Low register (offset 0x1C) instead of the BAR0 HBM training register used by GB200. The GPU memory readiness is checked by polling for the Memory_Active bit (bit 1) for the Memory_Active_Timeout (bits 15:13). Add runtime detection by checking the presence of the DVSEC register. Route to the new method if present, otherwise continue using the legacy approach. (backported from https://lore.kernel.org/all/20260416014504.63067-1-ankita@nvidia.com/) Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
Contributor
✅ Patchscan: No Missing FixesAll cherry-picked commits have been checked — no missing upstream fixes found. |
Collaborator
|
@nirmoy I know this is in draft state but took a look and have a few comments:
|
Collaborator
Author
|
Closing — wrong target branch. Replaced by #382 targeting 26.04_linux-nvidia-bos. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
BugLink: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-7.0/+bug/2148701
Summary
Backport of the vfio/nvgrace-gpu Blackwell-Next GPU readiness check (v3) from LKML to
26.04_linux-nvidia.Patch: https://lore.kernel.org/all/20260416014504.63067-1-ankita@nvidia.com/
Blackwell-Next GPUs report device readiness via the CXL DVSEC Range 1 Low register (offset 0x1C) instead of the BAR0 HBM training register used by GB200. Adds runtime detection by checking the presence of the DVSEC register and routes to the new method if present, otherwise falls back to the legacy approach.
One minor conflict resolved: the
nvgrace_gpu_pci_core_devicestruct already hadint egm_nodefrom the existing EGM SAUCE in this branch; both fields are retained.Test
Patch applies cleanly on top of
26.04_linux-nvidia. Boot test on Vera pending — this requires Blackwell-Next hardware for functional validation.Todo:
[ ] VM passthrough test