file_server: fix use-after-free in FileStreamer async-file callbacks after request cancel#45153
file_server: fix use-after-free in FileStreamer async-file callbacks after request cancel#45153omkhar wants to merge 2 commits into
Conversation
FileStreamer::start() captured [this] into the async-file callbacks returned by AsyncFileManager::stat() and AsyncFileManager::read(). The async-file manager guarantees the callback fires (even if cancel was called), so when the FileStreamer is destroyed before the callback runs the bare 'this' capture becomes dangling and the callback dereferences freed memory. This is reachable on any deployment using the file_server filter when downstream clients can cancel or RST requests while an async-file operation is in flight. Hold a std::shared_ptr<bool> alive sentinel in FileStreamer and capture [this, alive = std::weak_ptr<bool>(alive_)] in the async-file callbacks. If alive.lock() returns nullptr the callback short-circuits to a no-op. Reported privately via GHSA-c4w8-wqj5-4xph; envoy security team asked for this to be fixed in the open given the file_server filter's WIP status. Signed-off-by: Omkhar Arasaratnam <omkhar@linkedin.com>
There was a problem hiding this comment.
Could you add an integration test case that reproduces the crash, before the fix?
(cancel_callback_ should already be called in the abort/destruction path, which should already be making AsyncFileManager not call the callback, so there's a broader problem if this change addresses a real problem. We shouldn't have two sentinels doing the same job.)
Edit: Ah, I see the issue, the callback can be dispatched before it is canceled, here, but then canceled before the dispatched event reaches head of queue, so it also needs the ability to abort if it's canceled while the callback is in the dispatcher's queue.
And it makes sense here to have two sentinels, because one is to abort the file operation if it isn't complete, and the other is to abort the callback if it got queued first.
So since it's racy it's probably quite difficult to repro reliably in an integration test.
Consider using cancelWrapped rather than another custom sentinel, since it was created for exactly this kind of purpose. (Annoying though it is to have to call two cancel-callback functions!)
| bug_fixes: | ||
| # *Changes expected to improve the state of the world and are unlikely to have negative effects* | ||
| - area: file_server | ||
| change: | |
There was a problem hiding this comment.
needs to use new changelog layout - see #45095
/wait
Replace the custom std::shared_ptr<bool> alive_ sentinel introduced in
the first commit with envoy's existing cancelWrapped helper from
source/common/common/cancel_wrapper.h.
ravenblackx pointed out two things:
1. The bare `this` capture problem the v1 commit solved is real:
AsyncFileManager can dispatch the completion callback onto the
owner dispatcher (see source/extensions/common/async_files/
async_file_manager_thread_pool.cc) and then observe cancellation
after the dispatch — so the file-side cancel_callback_ alone can't
stop the queued callback from firing.
2. Two sentinels are warranted (one to abort the file op if it has
not completed, one to abort the dispatched callback if it has
already been queued), but the codebase already provides
cancelWrapped for exactly that. A custom sentinel here just
duplicates work without the thread-safety asserts cancel_wrapper
provides.
This commit:
- Removes the alive_ shared_ptr and the [this, alive=...] capture
pattern from every async-file callback.
- Adds a single Envoy::CancelWrapper::CancelFunction
cancel_dispatcher_callbacks_ member that is invoked in abort() and
in the destructor.
- Wraps each async-file completion lambda in
Envoy::CancelWrapper::cancelWrapped(lambda, &cancel_dispatcher_callbacks_).
- Adds //source/common/common:cancel_wrapper_lib to the file_server
BUILD deps.
The existing AsyncFiles cancel_callback_ continues to short-circuit
the in-flight file operation. The new cancel_dispatcher_callbacks_
short-circuits any completion the dispatcher has already enqueued.
Verified on Linux x86_64: bazel test --config=clang
//test/extensions/filters/http/file_server:file_server_test passes
on a clean envoyproxy/envoy@main with this patch applied.
Signed-off-by: Omkhar Arasaratnam <omkhar@linkedin.com>
|
Thanks @ravenblackx — that read of the dispatcher-queue race is right, and using Pushed eccd1db:
Verified on Linux x86_64: On the integration test: I agree the race is "quite difficult to repro reliably" — wiring the dispatch-then-cancel sequence deterministically from an integration test would require driving |
Commit Message:
file_server: fix UAF on async-file callback after request cancel
FileStreamer::start()captured[this]into the async-file callbacksreturned by
AsyncFileManager::stat()andAsyncFileManager::read().The async-file manager guarantees the callback fires (even if cancel
was called), so when the
FileStreameris destroyed before thecallback runs the bare
thiscapture becomes dangling and thecallback dereferences freed memory.
This is reachable on any deployment using the
file_serverfilter whendownstream clients can cancel or reset requests while an async-file
operation is in flight.
Hold a
std::shared_ptr<bool>alive sentinel inFileStreamerandcapture
[this, alive = std::weak_ptr<bool>(alive_)]in the async-filecallbacks. If
alive.lock()returns nullptr the callback short-circuitsto a no-op.
Additional Description:
Reported privately via GHSA-c4w8-wqj5-4xph; envoy security team
(@wbpcode) asked for this to be fixed in the open given the
file_serverfilter's WIP status.Fixes #45152.
Risk Level: Low.
Defensive lifetime guard around two existing async-file callback sites.
No behavior change for the in-flight-completes-before-cancel path. The
cancel-then-callback path previously dereferenced freed memory and now
no-ops.
Testing:
bazel test --config=clang --config=clang-asan //test/extensions/filters/http/file_server:file_server_testpasseslocally on the patched tree at upstream main. Happy to add a focused
regression test that drives the cancel-during-stat path if maintainers
would like.
Docs Changes: None required.
Release Notes: Added to
changelogs/current.yamlunderbug_fixes(area: file_server).
Platform Specific Features: None.