Use libunwind for jemalloc memory profiling stack traces#5602
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
a006f3b to
9a3f534
Compare
0ec146a to
14f2a9f
Compare
9a3f534 to
3cf1a03
Compare
14f2a9f to
23c683e
Compare
…ces (#5792) ## Motivation Backport of #5602, stacked on the backport of #5601. Switches jemalloc's stack trace collection from the default (frame pointers) to libunwind, which produces complete stack traces in release builds without requiring frame pointers. ## Proposal Cherry-pick of `9a3f534cbdb` onto the backport of #5601. ## Test Plan CI
23c683e to
ec6dcd1
Compare
3cf1a03 to
432b59f
Compare
ec6dcd1 to
1489741
Compare
432b59f to
05f5482
Compare
1489741 to
2490b0e
Compare
05f5482 to
eca652a
Compare
2490b0e to
cc79847
Compare
eca652a to
d2f41a8
Compare
| thiserror = "1.0.65" | ||
| thiserror-context = "0.1.1" | ||
| tikv-jemallocator = "0.6.0" | ||
| tikv-jemallocator = { git = "https://github.com/linera-io/jemallocator.git", rev = "16f3e75fb145e12f5ae15ed232d1368d4539fc14" } |
There was a problem hiding this comment.
if we use git dependencies we won't be able to publish to crates.
There was a problem hiding this comment.
Ah good point, we might have to publish this fork to crates.io then. I have a PR to hopefully merge this upstream tikv/jemallocator#159, but until that gets merged (if it ever does), we'll need to use our published fork
There was a problem hiding this comment.
Ah nvm, maybe if we just add a version it'll work?
d2f41a8 to
6e26770
Compare
## Motivation The `test-crates-and-docrs` CI job has been failing on `testnet_conway` since the backport of #5602 (#5792). `cargo package` requires all dependencies to have a `version` field — the backport included the git spec but dropped `version = "0.6"`. ## Proposal Add `version = "0.6"` to the `tikv-jemallocator` workspace dependency, matching what #5602 has on `main`. The `[patch.crates-io]` section still ensures the git fork is used at build time; the version is only needed so `cargo package` knows what to reference when stripping git specs for packaging. Also temporarily adds a `pull_request` trigger to the Documentation workflow so the fix can be verified on this PR before merging. **This trigger should be removed before or after merge.** ## Test Plan - CI — the `test-crates-and-docrs` job will run on this PR via the temporary trigger
| thiserror = "1.0.65" | ||
| thiserror-context = "0.1.1" | ||
| tikv-jemallocator = "0.6.0" | ||
| tikv-jemallocator = { version = "0.6", git = "https://github.com/linera-io/jemallocator.git", rev = "16f3e75fb145e12f5ae15ed232d1368d4539fc14" } |
There was a problem hiding this comment.
Is this not gonna break the crates' publishing step? If there's a version for it, can't we use one from crates?
There was a problem hiding this comment.
I don't think so. AFAIK cargo publish will strip the git/rev stuff and keep just the version
There was a problem hiding this comment.
Mmm based on this theory what would happen when someone runs cargo install?
There was a problem hiding this comment.
That's where I was getting at – yes :)
There was a problem hiding this comment.
Ah good point, compilation might fail :(
I might have to just bite the bullet then and publish the fork crate
There was a problem hiding this comment.
Compilation will fail if someone tries to cargo install with the jemalloc feature, but like I said in the other thread here we could split the profiling into a separate feature, then people could still install binaries with jemalloc just not with profiling
| [patch.crates-io] | ||
| tikv-jemallocator = { git = "https://github.com/linera-io/jemallocator.git", rev = "16f3e75fb145e12f5ae15ed232d1368d4539fc14" } | ||
| tikv-jemalloc-sys = { git = "https://github.com/linera-io/jemallocator.git", rev = "16f3e75fb145e12f5ae15ed232d1368d4539fc14" } |
There was a problem hiding this comment.
I believe cargo publish will also strip the patches
There was a problem hiding this comment.
I don't understand then. Either we don't need it or the published crate will contain a different deps than the non-published code?
There was a problem hiding this comment.
So the published crate will use the vanilla jemallocator version without our fork changes. Because SDK users probably don't care about jemalloc and memory profiling? And then our own binaries when we build them will have the fork changes
There was a problem hiding this comment.
This seems like accidental outcome and not planned 🤔 maybe that's ok.
There was a problem hiding this comment.
Can we have a crate linera-jemallocator?
There was a problem hiding this comment.
So according to Claude's analysis, if we go the published crates route, we need to actually publish 4 crates:
- linera-jemalloc-sys — fork of tikv-jemalloc-sys
- linera-jemalloc-ctl — fork of tikv-jemalloc-ctl, depends on linera-jemalloc-sys
- linera-jemallocator — fork of tikv-jemallocator, depends on linera-jemalloc-sys
- linera-jemalloc-pprof — fork of jemalloc_pprof, depends on linera-jemalloc-ctl
If we decide not to publish crates, I could split the profiling into another jemalloc-profiling feature or something, and if people try to cargo install with profiling, the build will fail. I think that's the tradeoff
6e26770 to
ae7c5be
Compare

Motivation
The default libgcc DWARF unwinder (
_Unwind_Find_FDE) used by jemalloc for profilingstack traces has a known livelock
(jemalloc/jemalloc#2282) — one
thread hangs in
_Unwind_Find_FDEand all others block on futex_wait. We've hit this inproduction on validator shards.
Proposal
Switch jemalloc's stack unwinder to libunwind by building with
--enable-prof-libunwind.tikv-jemallocatoratlinera-io/jemallocator fork which adds a
profiling_libunwindfeature (upstream doesn't support it yet —tikv/jemallocator#146)
profiling_libunwindin thejemallocfeaturelibunwind-dev(builder) andlibunwind8(runtime) todocker/DockerfileTest Plan
Release Plan
testnet_conway.Links
tikv/jemallocator#146
jemalloc/jemalloc#2282