Skip to content

amdgpu drm file dump cleanups#3015

Open
tursulin wants to merge 8 commits into
checkpoint-restore:criu-devfrom
tursulin:drm-dump-cleanups
Open

amdgpu drm file dump cleanups#3015
tursulin wants to merge 8 commits into
checkpoint-restore:criu-devfrom
tursulin:drm-dump-cleanups

Conversation

@tursulin
Copy link
Copy Markdown

@tursulin tursulin commented May 5, 2026

Just a bunch of cleanups and improvements to amdgpu_plugin_drm_dump_file(), hopefully all agreeable.

No major bugs fixed apart a false positive exit with no buffers dumped in the case when kernel does not support DRM_IOCTL_AMDGPU_GEM_LIST_HANDLES.

Improvements are less allocations and less re-initializing of libdrm. Moving both from once per buffer object to either once, or until the big enough buffer is re-allocated.

cc @fdavid-amd

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 57.22%. Comparing base (9a1453b) to head (42b4aaa).
⚠️ Report is 1 commits behind head on criu-dev.

Additional details and impacted files
@@            Coverage Diff            @@
##           criu-dev    #3015   +/-   ##
=========================================
  Coverage     57.22%   57.22%           
=========================================
  Files           154      154           
  Lines         40440    40440           
  Branches       8863     8863           
=========================================
  Hits          23142    23142           
  Misses        17034    17034           
  Partials        264      264           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@avagin avagin requested a review from fdavid-amd May 5, 2026 23:59
@avagin avagin added the gpu/amd label May 5, 2026
@avagin
Copy link
Copy Markdown
Member

avagin commented May 19, 2026

@fdavid-amd friendly ping

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Cleanups and improvements to amdgpu_plugin_drm_dump_file(): reduces allocations and libdrm re-initialization (moving them out of the per-BO loop), simplifies the GEM-list-handles retry into a single do/while, and changes the unsupported-ioctl case to fail rather than silently dumping zero buffers.

Changes:

  • Refactor handles-list retrieval into a do/while loop with reusable entries allocation; treat EINVAL from DRM_IOCTL_AMDGPU_GEM_LIST_HANDLES as an error instead of continuing with zero entries.
  • Hoist amdgpu_device_initialize and the posix_memalign buffer out of the per-BO loop, reusing the buffer when large enough; consolidate cleanup at the exit: label.
  • Use fd directly for drmPrimeHandleToFD (instead of amdgpu_device_get_fd) and simplify local variable scoping.

Comment thread plugins/amdgpu/amdgpu_plugin_drm.c Outdated
Comment thread plugins/amdgpu/amdgpu_plugin_drm.c Outdated
Comment thread plugins/amdgpu/amdgpu_plugin_drm.c
Tvrtko Ursulin added 8 commits May 20, 2026 09:24
If the protobuf image allocation fails we can simply return immediately
since there aren't any other things to clean up at this point.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
The DRM minor local variable is shadowed between two blocks inside
amdgpu_plugin_drm_dump_file(). If at the top level we access the minor via
the copy stored in the protobuf image, we can simply drop this copy and
so avoid the buffer object dump loop shadowing it.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
… loop

We can save a few lines of code and end up with a potentially more
readable function if we replace the open-coded try-retry of getting the
list of buffer objects with a loop.

While refactoring we also remove the needless copy of the entry as the
dump loop iterates the handles, and also stop returning a false success
with a made up zero buffer objects saved in cases when the kernel does not
support the DRM_IOCTL_AMDGPU_GEM_LIST_HANDLES ioctl.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
…mage name

amdgpu_plugin_drm_dump_file() already defines a large temporary stack
buffer for the purpose of generating the file level protobuf image name.

Instead of the buffer object dumping loop defining a separate temporary
stack buffer for the buffer content protobuf images we can simply use the
top level one.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
When saving buffer object content the current code constantly allocates
and frees aligned memory where the GPU will copy each buffer one by one.
We can optimise this path by keeping the buffer around and only grow it if
is too small for the current object.

While doing this we also fix a theoretically incorrect freeing of memory
allocated via posix_memalign with xfree.

An alternative solution could be to probe for the maximum size before hand
by adding a pre-scan loop.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Instead of initializing and de-initializing libdrm for every buffer object
we dump (each of which consists of a few system calls), we can simply
initialize it once and by doing so simplify a bunch of the error handling
blocks.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Simplify a bunch of error handling blocks by freeing the vm_info_entries
array as soon as we are done accessing it.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
There is no need to use the fd libdrm duplicated from the original fd.
Just use the original one.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
@tursulin tursulin force-pushed the drm-dump-cleanups branch from 42b4aaa to 9c329bf Compare May 20, 2026 08:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants