Add LSAN suppressions for client-side shutdown leaks#13255
Open
saintstack wants to merge 2 commits into
Open
Conversation
Follow-on to b0055b8 ("Remove explicit __lsan_do_leak_check() from stopImmediately()"). That change fixed the LSAN deadlock/timeout but LSAN still runs at process exit and reports unsuppressed shutdown leaks (12,792 bytes in 84 allocations), causing ctests to fail. Add targeted suppressions for each specific actor function that leaks at shutdown. These are all background actors (recurring timers, proxy monitors, status fetchers, etc.) that are still running when the process exits because FDB does not cancel all actors during shutdown. Each suppression names the exact function in the stack to avoid masking real leaks. A real leak (e.g., per-request allocation that grows unboundedly) would have application-specific functions in the stack that would not match any of these suppressions. Verified: with these suppressions, LSAN reports zero unsuppressed leaks in fdb_c_upgrade_to_future_version and multi_process_fdbcli_tests with USE_ASAN=ON. The tests still fail due to a separate pre-existing issue: ASAN makecontext/swapcontext warning logged as Severity=40 by fdbmonitor, which the test runner treats as a test failure.
Contributor
There was a problem hiding this comment.
Pull request overview
Adds additional LeakSanitizer suppressions to prevent client-side test failures caused by known shutdown-time “leaks” from background actors that are still running when processes exit.
Changes:
- Extend
contrib/lsan.suppressionswith suppressions for multiple client-side background actors observed in LSAN reports at process shutdown. - Add explanatory comments describing why these shutdown-time allocations are currently not cleaned up and how the suppressions are intended to be narrowly targeted.
Address review feedback: - Replace toolchain-specific pattern (recurring<std::__1::__bind_front_t<void (DatabaseContext::) with the stable FDB symbol DatabaseContext::expireThrottles. This is portable across toolchains and does not mask unrelated DatabaseContext recurring leaks. - Fix comment: the actual recurring actor is throttleExpirer (binding expireThrottles), not updateLatencyBandConfig.
Contributor
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Contributor
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Contributor
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Contributor
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Contributor
Result of foundationdb-pr-clang on Linux RHEL 9
|
Contributor
Result of foundationdb-pr on Linux RHEL 9
|
Contributor
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Contributor
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Contributor
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Contributor
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
Contributor
Result of foundationdb-pr-clang on Linux RHEL 9
|
Contributor
Result of foundationdb-pr on Linux RHEL 9
|
Contributor
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
Contributor
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow-on to b0055b8 ("Remove explicit __lsan_do_leak_check() from stopImmediately()"). That change fixed the LSAN deadlock/timeout but LSAN still runs at process exit and reports unsuppressed shutdown leaks (12,792 bytes in 84 allocations), causing ctests to fail.
Add targeted suppressions for each specific actor function that leaks at shutdown. These are all background actors (recurring timers, proxy monitors, status fetchers, etc.) that are still running when the process exits because FDB does not cancel all actors during shutdown.
Each suppression names the exact function in the stack to avoid masking real leaks. A real leak (e.g., per-request allocation that grows unboundedly) would have application-specific functions in the stack that would not match any of these suppressions.
Verified: with these suppressions, LSAN reports zero unsuppressed leaks in fdb_c_upgrade_to_future_version and multi_process_fdbcli_tests with USE_ASAN=ON. The tests still fail due to a separate pre-existing issue: ASAN makecontext/swapcontext warning logged as Severity=40 by fdbmonitor, which the test runner treats as a test failure.