Skip to content

Commit 1fc6500

Browse files
committed
Merge #237: Made SpawnProcess() behavior safe post fork()
5205a87 test: check SpawnProcess post-fork safety (Sjors Provoost) 69652f0 Precompute argv before fork in SpawnProcess (Sjors Provoost) 30a8681 SpawnProcess: avoid fd leak on close failure (Sjors Provoost) 14e926a refactor: extract MakeArgv helper (Sjors Provoost) Pull request description: Bitcoin Core functional tests for the `echoipc` RPC method were occasionally timing out, see bitcoin/bitcoin#34187. A helpful LLM quickly figured out it was due to doing unsafe things after `fork()`. https://pubs.opengroup.org/onlinepubs/009695399/functions/fork.html Fix it by having `SpawnProcess` evaluate `fd_to_args` and build `argv` in the parent _before_ `fork()`. `SpawnProcess` now avoids `ExecProcess`. I intentionally did not fix the latter, not just to reduce churn, but also because it's now only used by `mpgen` and it doesn't seem worth losing the debug errors it (unsafely) throws. The last commit adds a test which reproduces the issue. ACKs for top commit: ryanofsky: Code review ACK 5205a87, just tweaking error codes and output since last review and fixing potential (pre-existing) descriptor leak on failure bug. Tree-SHA512: 812a053c7ca1ed06e29e8f845bf62d888b44fd4b8b486ab0e735b8ebe9368aec0989ad5b094c8745c301264247351955633d147b9caf17643a54d88c2930d0cd
2 parents d0fc108 + 5205a87 commit 1fc6500

4 files changed

Lines changed: 163 additions & 15 deletions

File tree

include/mp/util.h

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -221,13 +221,16 @@ using FdToArgsFn = std::function<std::vector<std::string>(int fd)>;
221221

222222
//! Spawn a new process that communicates with the current process over a socket
223223
//! pair. Returns pid through an output argument, and file descriptor for the
224-
//! local side of the socket. Invokes fd_to_args callback with the remote file
225-
//! descriptor number which returns the command line arguments that should be
226-
//! used to execute the process, and which should have the remote file
227-
//! descriptor embedded in whatever format the child process expects.
224+
//! local side of the socket.
225+
//! The fd_to_args callback is invoked in the parent process before fork().
226+
//! It must not rely on child pid/state, and must return the command line
227+
//! arguments that should be used to execute the process. Embed the remote file
228+
//! descriptor number in whatever format the child process expects.
228229
int SpawnProcess(int& pid, FdToArgsFn&& fd_to_args);
229230

230231
//! Call execvp with vector args.
232+
//! Not safe to call in a post-fork child of a multi-threaded process.
233+
//! Currently only used by mpgen at build time.
231234
void ExecProcess(const std::vector<std::string>& args);
232235

233236
//! Wait for a process to exit and return its exit code.

src/mp/util.cpp

Lines changed: 45 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
#include <pthread.h>
1515
#include <sstream>
1616
#include <string>
17+
#include <sys/types.h>
1718
#include <sys/resource.h>
1819
#include <sys/socket.h>
1920
#include <sys/wait.h>
@@ -36,6 +37,17 @@ namespace fs = std::filesystem;
3637
namespace mp {
3738
namespace {
3839

40+
std::vector<char*> MakeArgv(const std::vector<std::string>& args)
41+
{
42+
std::vector<char*> argv;
43+
argv.reserve(args.size() + 1);
44+
for (const auto& arg : args) {
45+
argv.push_back(const_cast<char*>(arg.c_str()));
46+
}
47+
argv.push_back(nullptr);
48+
return argv;
49+
}
50+
3951
//! Return highest possible file descriptor.
4052
size_t MaxFd()
4153
{
@@ -111,35 +123,57 @@ int SpawnProcess(int& pid, FdToArgsFn&& fd_to_args)
111123
throw std::system_error(errno, std::system_category(), "socketpair");
112124
}
113125

126+
// Evaluate the callback and build the argv array before forking.
127+
//
128+
// The parent process may be multi-threaded and holding internal library
129+
// locks at fork time. In that case, running code that allocates memory or
130+
// takes locks in the child between fork() and exec() can deadlock
131+
// indefinitely. Precomputing arguments in the parent avoids this.
132+
const std::vector<std::string> args{fd_to_args(fds[0])};
133+
const std::vector<char*> argv{MakeArgv(args)};
134+
114135
pid = fork();
115136
if (pid == -1) {
116137
throw std::system_error(errno, std::system_category(), "fork");
117138
}
118-
// Parent process closes the descriptor for socket 0, child closes the descriptor for socket 1.
139+
// Parent process closes the descriptor for socket 0, child closes the
140+
// descriptor for socket 1. On failure, the parent throws, but the child
141+
// must _exit(126) (post-fork child must not throw).
119142
if (close(fds[pid ? 0 : 1]) != 0) {
120-
throw std::system_error(errno, std::system_category(), "close");
143+
if (pid) {
144+
(void)close(fds[1]);
145+
throw std::system_error(errno, std::system_category(), "close");
146+
}
147+
static constexpr char msg[] = "SpawnProcess(child): close(fds[1]) failed\n";
148+
const ssize_t writeResult = ::write(STDERR_FILENO, msg, sizeof(msg) - 1);
149+
(void)writeResult;
150+
_exit(126);
121151
}
152+
122153
if (!pid) {
123-
// Child process must close all potentially open descriptors, except socket 0.
154+
// Child process must close all potentially open descriptors, except
155+
// socket 0. Do not throw, allocate, or do non-fork-safe work here.
124156
const int maxFd = MaxFd();
125157
for (int fd = 3; fd < maxFd; ++fd) {
126158
if (fd != fds[0]) {
127159
close(fd);
128160
}
129161
}
130-
ExecProcess(fd_to_args(fds[0]));
162+
163+
execvp(argv[0], argv.data());
164+
// NOTE: perror() is not async-signal-safe; calling it here in a
165+
// post-fork child may deadlock in multithreaded parents.
166+
// TODO: Report errors to the parent via a pipe (e.g. write errno)
167+
// so callers can get diagnostics without relying on perror().
168+
perror("execvp failed");
169+
_exit(127);
131170
}
132171
return fds[1];
133172
}
134173

135174
void ExecProcess(const std::vector<std::string>& args)
136175
{
137-
std::vector<char*> argv;
138-
argv.reserve(args.size());
139-
for (const auto& arg : args) {
140-
argv.push_back(const_cast<char*>(arg.c_str()));
141-
}
142-
argv.push_back(nullptr);
176+
const std::vector<char*> argv{MakeArgv(args)};
143177
if (execvp(argv[0], argv.data()) != 0) {
144178
perror("execvp failed");
145179
if (errno == ENOENT && !args.empty()) {
@@ -152,7 +186,7 @@ void ExecProcess(const std::vector<std::string>& args)
152186
int WaitProcess(int pid)
153187
{
154188
int status;
155-
if (::waitpid(pid, &status, 0 /* options */) != pid) {
189+
if (::waitpid(pid, &status, /*options=*/0) != pid) {
156190
throw std::system_error(errno, std::system_category(), "waitpid");
157191
}
158192
return status;

test/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ if(BUILD_TESTING AND TARGET CapnProto::kj-test)
2626
${MP_PROXY_HDRS}
2727
mp/test/foo-types.h
2828
mp/test/foo.h
29+
mp/test/spawn_tests.cpp
2930
mp/test/test.cpp
3031
)
3132
include(${PROJECT_SOURCE_DIR}/cmake/TargetCapnpSources.cmake)

test/mp/test/spawn_tests.cpp

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
// Copyright (c) The Bitcoin Core developers
2+
// Distributed under the MIT software license, see the accompanying
3+
// file COPYING or http://www.opensource.org/licenses/mit-license.php.
4+
5+
#include <mp/util.h>
6+
7+
#include <kj/test.h>
8+
9+
#include <chrono>
10+
#include <compare>
11+
#include <condition_variable>
12+
#include <csignal>
13+
#include <cstdlib>
14+
#include <mutex>
15+
#include <string>
16+
#include <sys/wait.h>
17+
#include <thread>
18+
#include <unistd.h>
19+
#include <vector>
20+
21+
namespace {
22+
23+
// Poll for child process exit using waitpid(..., WNOHANG) until the child exits
24+
// or timeout expires. Returns true if the child exited and status_out was set.
25+
// Returns false on timeout or error.
26+
static bool WaitPidWithTimeout(int pid, std::chrono::milliseconds timeout, int& status_out)
27+
{
28+
const auto deadline = std::chrono::steady_clock::now() + timeout;
29+
while (std::chrono::steady_clock::now() < deadline) {
30+
const int r = ::waitpid(pid, &status_out, WNOHANG);
31+
if (r == pid) return true;
32+
if (r == 0) {
33+
std::this_thread::sleep_for(std::chrono::milliseconds{1});
34+
continue;
35+
}
36+
// waitpid error
37+
return false;
38+
}
39+
return false;
40+
}
41+
42+
} // namespace
43+
44+
KJ_TEST("SpawnProcess does not run callback in child")
45+
{
46+
// This test is designed to fail deterministically if fd_to_args is invoked
47+
// in the post-fork child: a mutex held by another parent thread at fork
48+
// time appears locked forever in the child.
49+
std::mutex target_mutex;
50+
std::mutex control_mutex;
51+
std::condition_variable control_cv;
52+
bool locked{false};
53+
bool release{false};
54+
55+
// Holds target_mutex until the releaser thread updates release
56+
std::thread locker([&] {
57+
std::unique_lock<std::mutex> target_lock(target_mutex);
58+
{
59+
std::lock_guard<std::mutex> g(control_mutex);
60+
locked = true;
61+
}
62+
control_cv.notify_one();
63+
64+
std::unique_lock<std::mutex> control_lock(control_mutex);
65+
control_cv.wait(control_lock, [&] { return release; });
66+
});
67+
68+
// Wait for target_mutex to be held by the locker thread.
69+
{
70+
std::unique_lock<std::mutex> l(control_mutex);
71+
control_cv.wait(l, [&] { return locked; });
72+
}
73+
74+
// Release the lock shortly after SpawnProcess starts.
75+
std::thread releaser([&] {
76+
// In the unlikely event a CI machine overshoots this delay, a
77+
// regression could be missed. This is preferable to spurious
78+
// test failures.
79+
std::this_thread::sleep_for(std::chrono::milliseconds{50});
80+
{
81+
std::lock_guard<std::mutex> g(control_mutex);
82+
release = true;
83+
}
84+
control_cv.notify_one();
85+
});
86+
87+
int pid{-1};
88+
const int fd{mp::SpawnProcess(pid, [&](int child_fd) -> std::vector<std::string> {
89+
// If this callback runs in the post-fork child, target_mutex appears
90+
// locked forever (the owning thread does not exist), so this deadlocks.
91+
std::lock_guard<std::mutex> g(target_mutex);
92+
return {"true", std::to_string(child_fd)};
93+
})};
94+
::close(fd);
95+
96+
int status{0};
97+
// Give the child up to 1 second to exit. If it does not, terminate it and
98+
// reap it to avoid leaving a zombie behind.
99+
const bool exited{WaitPidWithTimeout(pid, std::chrono::milliseconds{1000}, status)};
100+
if (!exited) {
101+
::kill(pid, SIGKILL);
102+
::waitpid(pid, &status, /*options=*/0);
103+
}
104+
105+
releaser.join();
106+
locker.join();
107+
108+
KJ_EXPECT(exited, "Timeout waiting for child process to exit");
109+
KJ_EXPECT(WIFEXITED(status) && WEXITSTATUS(status) == 0);
110+
}

0 commit comments

Comments
 (0)