Skip to content

Commit 7ed37d8

Browse files
AlnisMmeta-codesync[bot]
authored andcommitted
Fix use-after-unmap in markMovingForSlabRelease
Summary: I noticed that in coredumper there have been crashes in fookeep for some time which should be fixed with this diff. Fix a race condition where `toString()` accesses slab memory after `abortSlabRelease()` has restored the slab to normal operation. Another thread (e.g. MemoryMonitor) can pick up the same slab, release it, and call `adviseSlab()` which does `madvise(MADV_REMOVE)`, unmapping the pages before the first thread reads from them — causing SIGSEGV. The fix captures the item description before aborting, while the slab is still marked for release and protected from being picked by other threads. I didn't test if this actually fixes the issue because I am not sure how to reproduce it. We can check coredumper in a few weeks to see if the issue is really fixed. Reviewed By: pbhandar2, rlyerly Differential Revision: D97321920 fbshipit-source-id: 05c2929389393ca5355037e9325f3d6c2f45e3db
1 parent 84f0b13 commit 7ed37d8

1 file changed

Lines changed: 27 additions & 13 deletions

File tree

cachelib/allocator/CacheAllocator.h

Lines changed: 27 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -5432,6 +5432,31 @@ bool CacheAllocator<CacheTrait>::markMovingForSlabRelease(
54325432
});
54335433
};
54345434

5435+
// Snapshot the item string under processAllocForRelease() protection.
5436+
// Reading alloc directly here races with concurrent free(), while doing it
5437+
// after abortSlabRelease() can race with the slab being advised away.
5438+
auto captureItemStringForAbort = [&]() {
5439+
std::string itemStr = "item <already freed before abort>";
5440+
try {
5441+
allocator_->processAllocForRelease(ctx, alloc, [&](void* memory) {
5442+
itemStr = static_cast<Item*>(memory)->toString();
5443+
});
5444+
} catch (const std::exception& e) {
5445+
itemStr =
5446+
folly::sformat("<failed to capture item for abort: {}>", e.what());
5447+
}
5448+
return itemStr;
5449+
};
5450+
5451+
auto abortWithMessage = [&](const std::string& reason) {
5452+
auto itemStr = captureItemStringForAbort();
5453+
allocator_->abortSlabRelease(ctx);
5454+
throw exception::SlabReleaseAborted(
5455+
folly::sformat("Slab Release aborted {} while still trying to mark"
5456+
" as moving for Item: {}. Pool: {}, Class: {}.",
5457+
reason, itemStr, ctx.getPoolId(), ctx.getClassId()));
5458+
};
5459+
54355460
auto startTime = util::getCurrentTimeMs();
54365461
while (true) {
54375462
allocator_->processAllocForRelease(ctx, alloc, fn);
@@ -5449,25 +5474,14 @@ bool CacheAllocator<CacheTrait>::markMovingForSlabRelease(
54495474
itemFreed = true;
54505475

54515476
if (isShutdownInProgress()) {
5452-
allocator_->abortSlabRelease(ctx);
5453-
throw exception::SlabReleaseAborted(
5454-
folly::sformat("Slab Release aborted while still trying to mark"
5455-
" as moving for Item: {}. Pool: {}, Class: {}.",
5456-
static_cast<Item*>(alloc)->toString(), ctx.getPoolId(),
5457-
ctx.getClassId()));
5477+
abortWithMessage("due to shutdown");
54585478
}
54595479

54605480
if (config_.slabRebalanceTimeout.count() > 0) {
54615481
auto elapsedTime = util::getCurrentTimeMs() - startTime;
54625482
if (elapsedTime >
54635483
static_cast<uint64_t>(config_.slabRebalanceTimeout.count())) {
5464-
allocator_->abortSlabRelease(ctx);
5465-
throw exception::SlabReleaseAborted(
5466-
folly::sformat("Slab Release aborted after {} ms while still"
5467-
" trying to mark as moving for Item: {}. Pool: {},"
5468-
" Class: {}.",
5469-
elapsedTime, static_cast<Item*>(alloc)->toString(),
5470-
ctx.getPoolId(), ctx.getClassId()));
5484+
abortWithMessage(folly::sformat("after {} ms", elapsedTime));
54715485
}
54725486
}
54735487

0 commit comments

Comments
 (0)