Support compressed ELF sections#252
Conversation
Add parsing for sh_flags ELF section header field, detecting whether SHF_SHF_COMPRESSED is set on any table sections detected.
|
Thanks for taking the time to put this together and contribute to the project! I have some initial thoughts / considerations:
I think overall I'm open to this, I just have concerns about added complexity. I'd be much more eager to accept the complexity if tools were actually using this (and if gdb didn't error about it, I take that as a sign this is pretty niche). Overall I'm impressed that this PR (even though it's a draft) isn't as huge as I might have expected. Especially if it's behind a cmake feature flag that makes it easier to justify. |
c9a56a3 to
bfe123d
Compare
|
Hello again, I wanted to circle back on this PR. Is this still something you'd find useful for cpptrace to support and are you still interested in working on this? |
Hey, appreciate the patience with me just dumping a PR and ghosting. I actually no longer work at the shop where we used this code but the functionality is pretty useful for including low-overhead self-symbolication to an executable. It's less powerful than a proper split debuginfo server but there's no need to store and account for TiBs of split debuginfo for large projects that are undergoing rapid development. With very slow disk IO on some not-so-uncommon hardware, I've measured that it's significantly faster to symbolicate from compressed symtab/strtab than uncompressed. That said, I personally don't have any use for this now and not much bandwidth to develop this. For anyone else interested, the remaining tasks, as far as I see, are:
@jeremy-rifkin I wouldn't blame you for just closing this out considering the value/cost tradeoff. You might want to merge the very first commit though.
Agreed. I try to analyze complexity and testing in terms of the new code paths added. While there are obviously many new branches added, the new complexity is all ultimately gated on binaries'
As mentioned, I believe the support that exists now is through intentional development but I'm not sure how active that development remains. The most relevant ticket is from @MaskRay who laid much of the groundwork leading to this state of the art (arbitrary compressed ELF sections): https://discourse.llvm.org/t/rfc-compressed-sht-symtab-sht-strtab-for-elf/77608 (note the GCC Bugzilla link within about the binutils support) @MaskRay I'm not sure if you'll see this ping but you're probably the best person to assess whether the tooling for compressed strtab/symtab is going anywhere any time soon. I believe you're no longer at Google working on
Right, it's direct dependency rather than the indirect dep of libdwarf.
Appreciate the kind words. I want to give you credit for structuring the code in such a way that adding new functionality, especially platform-specific functionality, fits neatly into the abstractions that comprise the project. |
Hello! With recent lld ( llvm/llvm-project#84855) (Related: compact section header table https://groups.google.com/g/generic-abi/c/9DPPniRXFa8 However, some folks favor a compressed section header table instead.) |
Note
Work in progress
For extremely lightweight debug symbolication (lighter than MiniDebugInfo), we've been using debug-stripped binaries with intact
.symtab(generated with an objcopy--keep-symbolsinvocation). However,.strtab+.symtabcontents still comprise the majority of the file footprint.I realized that
.symtaband.strtabshould be extremely compressible and there's no rule that non-.debug ELF sections can't beSHF_COMPRESSED, so I gave it a try:This produces a binary that is actually still symbolicable by lldb (but not gdb:
BFD: BFD (GNU Binutils) 2.42.50 internal error, aborting at /usr/src/debug/gdb-cross-canadian-x86-64/15.1/bfd/bfd.c:1236 in _bfd_doprnt).I wrote up some patches to load from compressed tables but I haven't written tests or a CMake feature flag. Initial tests with my input on a few IO-constrained embedded devices show that the zstd-compressed symbols actually load much faster while zlib is right around break even.
What do you think as far as the concept? AFAIK while this is legal per ELF spec, there's no compiler support to generate compressed table sections. On the flip side, the ecosystem support has to start somewhere and I happen to have a decent use case for right here. I don't think there's much cost to adding and maintaining this feature but it's your call.