Skip to content

Support compressed ELF sections#252

Draft
GHF wants to merge 3 commits intojeremy-rifkin:mainfrom
GHF:elfcompress
Draft

Support compressed ELF sections#252
GHF wants to merge 3 commits intojeremy-rifkin:mainfrom
GHF:elfcompress

Conversation

@GHF
Copy link
Copy Markdown
Contributor

@GHF GHF commented May 28, 2025

Note

Work in progress

For extremely lightweight debug symbolication (lighter than MiniDebugInfo), we've been using debug-stripped binaries with intact .symtab (generated with an objcopy --keep-symbols invocation). However, .strtab + .symtab contents still comprise the majority of the file footprint.

I realized that .symtab and .strtab should be extremely compressible and there's no rule that non-.debug ELF sections can't be SHF_COMPRESSED, so I gave it a try:

> elfutils-elfcompress --force --permissive --verbose --type=zstd --name=.strtab --name=.symtab a.out
processing: a.out
[47] .symtab compressed (11185272 => 2504475 22.39%)
[48] .strtab compressed (66983971 => 4424907 6.61%)

This produces a binary that is actually still symbolicable by lldb (but not gdb: BFD: BFD (GNU Binutils) 2.42.50 internal error, aborting at /usr/src/debug/gdb-cross-canadian-x86-64/15.1/bfd/bfd.c:1236 in _bfd_doprnt).

I wrote up some patches to load from compressed tables but I haven't written tests or a CMake feature flag. Initial tests with my input on a few IO-constrained embedded devices show that the zstd-compressed symbols actually load much faster while zlib is right around break even.

What do you think as far as the concept? AFAIK while this is legal per ELF spec, there's no compiler support to generate compressed table sections. On the flip side, the ecosystem support has to start somewhere and I happen to have a decent use case for right here. I don't think there's much cost to adding and maintaining this feature but it's your call.

GHF added 3 commits May 23, 2025 11:25
Add parsing for sh_flags ELF section header field, detecting whether
SHF_SHF_COMPRESSED is set on any table sections detected.
@jeremy-rifkin
Copy link
Copy Markdown
Owner

Thanks for taking the time to put this together and contribute to the project!

I have some initial thoughts / considerations:

  • Firstly, I'm in general happy to support anything that's useful to people for debugging/diagnostics. I do have complexity and testing in mind as well, which it sounds like you do too
  • This adds some notable complexity and it's something I'd want to be able to thoroughly test, which might be tricky given tools don't currently take advantage of compressed strtabs
  • This adds a dependency on zlib and zstd to cpptrace directly, which might not be a massive lift given that libdwarf needs those libraries but it does complicate cpptrace's cmake a bit more
  • Cpptrace might be configured to not use libdwarf as a back-end and that complicates things a bit with regards to zlib/zstd

I think overall I'm open to this, I just have concerns about added complexity. I'd be much more eager to accept the complexity if tools were actually using this (and if gdb didn't error about it, I take that as a sign this is pretty niche). Overall I'm impressed that this PR (even though it's a draft) isn't as huge as I might have expected. Especially if it's behind a cmake feature flag that makes it easier to justify.

@jeremy-rifkin jeremy-rifkin force-pushed the main branch 2 times, most recently from c9a56a3 to bfe123d Compare June 12, 2025 16:17
@jeremy-rifkin
Copy link
Copy Markdown
Owner

Hello again, I wanted to circle back on this PR. Is this still something you'd find useful for cpptrace to support and are you still interested in working on this?

@GHF
Copy link
Copy Markdown
Contributor Author

GHF commented Mar 2, 2026

Hello again, I wanted to circle back on this PR. Is this still something you'd find useful for cpptrace to support and are you still interested in working on this?

Hey, appreciate the patience with me just dumping a PR and ghosting. I actually no longer work at the shop where we used this code but the functionality is pretty useful for including low-overhead self-symbolication to an executable. It's less powerful than a proper split debuginfo server but there's no need to store and account for TiBs of split debuginfo for large projects that are undergoing rapid development.

With very slow disk IO on some not-so-uncommon hardware, I've measured that it's significantly faster to symbolicate from compressed symtab/strtab than uncompressed.

That said, I personally don't have any use for this now and not much bandwidth to develop this. For anyone else interested, the remaining tasks, as far as I see, are:

  • Develop new tests for new code: the compression functions that integrate zlib/zstd with base_file and bspan, the ELF header parsing (including bit width and endianness), and probably an integration test with test files
  • Fix the build for Windows and Mac (I'm pretty sure the CMake doesn't account for this)
  • Decouple the CMake zlib and zstd usage from the libdwarf flag

@jeremy-rifkin I wouldn't blame you for just closing this out considering the value/cost tradeoff. You might want to merge the very first commit though.

  • Firstly, I'm in general happy to support anything that's useful to people for debugging/diagnostics. I do have complexity and testing in mind as well, which it sounds like you do too

Agreed. I try to analyze complexity and testing in terms of the new code paths added. While there are obviously many new branches added, the new complexity is all ultimately gated on binaries' SHF_COMPRESSED. The "new" runtime dependencies (zlib and zstd) don't add additional cost if libdwarf is already linked.

  • This adds some notable complexity and it's something I'd want to be able to thoroughly test, which might be tricky given tools don't currently take advantage of compressed strtabs

As mentioned, lldb is perfectly happy with these binaries but anything binutils-based rather than LLVM is not happy. Aside from lldb, IIRC llvm-objcopy could write but not read this format, while ld.lld could write it. And of course elfutils also supports this.

I believe the support that exists now is through intentional development but I'm not sure how active that development remains. The most relevant ticket is from @MaskRay who laid much of the groundwork leading to this state of the art (arbitrary compressed ELF sections): https://discourse.llvm.org/t/rfc-compressed-sht-symtab-sht-strtab-for-elf/77608 (note the GCC Bugzilla link within about the binutils support)

@MaskRay I'm not sure if you'll see this ping but you're probably the best person to assess whether the tooling for compressed strtab/symtab is going anywhere any time soon. I believe you're no longer at Google working on SHF_COMPRESSED? (I'm a former Fuchsia dev btw)

  • This adds a dependency on zlib and zstd to cpptrace directly, which might not be a massive lift given that libdwarf needs those libraries but it does complicate cpptrace's cmake a bit more

  • Cpptrace might be configured to not use libdwarf as a back-end and that complicates things a bit with regards to zlib/zstd

Right, it's direct dependency rather than the indirect dep of libdwarf.

I think overall I'm open to this, I just have concerns about added complexity. I'd be much more eager to accept the complexity if tools were actually using this (and if gdb didn't error about it, I take that as a sign this is pretty niche). Overall I'm impressed that this PR (even though it's a draft) isn't as huge as I might have expected. Especially if it's behind a cmake feature flag that makes it easier to justify.

Appreciate the kind words. I want to give you credit for structuring the code in such a way that adding new functionality, especially platform-specific functionality, fits neatly into the abstractions that comprise the project.

@MaskRay
Copy link
Copy Markdown

MaskRay commented Mar 2, 2026

I believe the support that exists now is through intentional development but I'm not sure how active that development remains. The most relevant ticket is from @MaskRay who laid much of the groundwork leading to this state of the art (arbitrary compressed ELF sections): discourse.llvm.org/t/rfc-compressed-sht-symtab-sht-strtab-for-elf/77608 (note the GCC Bugzilla link within about the binutils support)

Hello!

With recent lld ( llvm/llvm-project#84855) ld.lld --compress-sections .strtab=zstd --compress-sections .symtab=zstd ... gives an output with compressed .strtab and .symtab .

(Related: compact section header table https://groups.google.com/g/generic-abi/c/9DPPniRXFa8 However, some folks favor a compressed section header table instead.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants