Skip to content

AMDGPU: harden hotswap ELF growth and descriptors#3145

Open
harsh-amd wants to merge 1 commit into
ROCm:users/harsh/comgr-hotswap-redundancy-cleanupfrom
harsh-amd:comgr-hotswap-elf-hardening
Open

AMDGPU: harden hotswap ELF growth and descriptors#3145
harsh-amd wants to merge 1 commit into
ROCm:users/harsh/comgr-hotswap-redundancy-cleanupfrom
harsh-amd:comgr-hotswap-elf-hardening

Conversation

@harsh-amd

Copy link
Copy Markdown

Stacked on #3138.

Summary:

  • Key entry trampoline descriptor rewrites by descriptor virtual address so duplicate kernel descriptor names update the intended descriptor.
  • Make SGPR metadata fallback explicit: missing or malformed per-kernel SGPR metadata logs a warning and falls back to the descriptor, while corrupt metadata remains a hard error.
  • Keep descriptor SGPR fields in sync with the effective metadata count.
  • Adjust grown ELF sections, program headers, and symbols using virtual-address ordering while preserving file-offset movement by file order.
  • Keep debug symbol/line trampoline inputs limited to B0/A0 trampolines while shifting debug addresses by total .text growth.

Tests:

  • build/bin/HotswapElfTests
  • build/bin/HotswapMCTests
  • build/bin/llvm-lit -sv build/tools/comgr/test-lit --filter hotswap
  • build-asan/bin/HotswapElfTests
  • build-asan/bin/HotswapMCTests
  • build-asan/bin/llvm-lit -sv build-asan/tools/comgr/test-lit --filter hotswap

@harsh-amd harsh-amd requested a review from lamb-j as a code owner July 1, 2026 12:48
@harsh-amd harsh-amd added the comgr Related to Code Object Manager label Jul 1, 2026
@harsh-amd harsh-amd requested a review from chinmaydd as a code owner July 1, 2026 12:48
@harsh-amd harsh-amd added the hotswap Related to the Comgr Hotswap feature label Jul 1, 2026
@harsh-amd harsh-amd force-pushed the users/harsh/comgr-hotswap-redundancy-cleanup branch from efa8111 to bc4e2f9 Compare July 1, 2026 13:09
Key entry trampoline descriptor rewrites by descriptor virtual address so duplicate kernel descriptor names update the intended descriptor.

Make SGPR metadata fallback explicit, keep descriptor SGPR fields in sync with the effective metadata count, and adjust grown ELF addresses using virtual address ordering.

Add unit coverage for duplicate descriptors, metadata fallback, vaddr-based ELF growth, and entry-only debug growth.
@harsh-amd harsh-amd force-pushed the comgr-hotswap-elf-hardening branch from 4bb40fe to dd686a2 Compare July 1, 2026 13:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comgr Related to Code Object Manager hotswap Related to the Comgr Hotswap feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant