Skip to content

Commit eed7965

Browse files
authored
fix: handle fork in otel process ctx (#1650)
Depends #1640 # What does this PR do? This PR is a follow up of #1585 and #1640 which implement OTel process context publication. It adds a proper handling of the case of publishing a new context after a `fork`. # Motivation Some language runtimes (e.g. Python or Ruby) resorts to `fork`. In this case, the OTel process context of the child must be re-published (the process context of the parent is explicitly NOT inherited through `MADVISE_DONTFORK`). However, since they share the same copy of the static handler, prior to this PR, the publication would try to access the non-inherited mapping, potentially causing a crash. This PR properly handles this case by storing the PID of the publisher, so that upon update, we can detect if there's been a fork since (and we are a child), in which case we can re-create a mapping from scratch. # How to test the change? There is currently no test, because having tests that `fork` is not trivial to set up in the current Rust test framework (it is admittedly possible, but is a larger question that should be treated separately IMHO). However, two follow-up PRs are coming with an FFI and libdatadog-side protobuf encoding of the payload, which will make it possible to test the fork behavior e.g. from Ruby, once they land. Co-authored-by: yann.hamdaoui <yann.hamdaoui@datadoghq.com>
1 parent d9048c6 commit eed7965

1 file changed

Lines changed: 43 additions & 6 deletions

File tree

libdd-library-config/src/otel_process_ctx.rs

Lines changed: 43 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ pub mod linux {
3030
use rustix::{
3131
fs::{ftruncate, memfd_create, MemfdFlags},
3232
mm::{madvise, mmap, mmap_anonymous, munmap, Advice, MapFlags, ProtFlags},
33-
process::set_virtual_memory_region_name,
33+
process::{getpid, set_virtual_memory_region_name, Pid},
3434
};
3535

3636
/// Current version of the process context format
@@ -213,6 +213,9 @@ pub mod linux {
213213
/// or drop).
214214
#[allow(unused)]
215215
payload: Vec<u8>,
216+
/// The process id of the last publisher. This is useful to detect forks(), and publish a
217+
/// new context accordingly.
218+
pid: Pid,
216219
}
217220

218221
impl ProcessContextHandle {
@@ -264,7 +267,11 @@ pub mod linux {
264267

265268
let _ = mapping.set_name();
266269

267-
Ok(ProcessContextHandle { mapping, payload })
270+
Ok(ProcessContextHandle {
271+
mapping,
272+
payload,
273+
pid: getpid(),
274+
})
268275
}
269276

270277
/// Updates the context after initial publication.
@@ -341,15 +348,45 @@ pub mod linux {
341348

342349
/// Publishes or updates the process context for it to be visible by external readers.
343350
///
344-
/// If this is the first publication, or if [unpublish] has been called last, this will follow
345-
/// the Publish protocol of the process context specification.
351+
/// If any of the following condition holds:
352+
///
353+
/// - this is the first publication
354+
/// - [unpublish] has been called last
355+
/// - the previous context has been published from a different process id (that is, a `fork()`
356+
/// happened and we're the child process)
357+
///
358+
/// Then we follow the Publish protocol of the OTel process context specification (allocating a
359+
/// fresh mapping).
360+
///
361+
/// Otherwise, if a context has been previously published from the same process and hasn't been
362+
/// unpublished since, we follow the Update protocol.
363+
///
364+
/// # Fork safety
346365
///
347-
/// Otherwise, the context is updated following the Update protocol.
366+
/// If we're a forked children of the original publisher, we are extremely restricted in the
367+
/// set of operations that we can do (we must be async-signal-safe). On paper, heap allocation
368+
/// is Undefined Behavior, for example. We assume that a forking runtime (such as Python or
369+
/// Ruby) that doesn't follow with an immediate `exec` is already "taking that risk", so to
370+
/// speak (typically, if no thread is ever spawned before the fork, things are mostly fine).
348371
pub fn publish(payload: Vec<u8>) -> anyhow::Result<()> {
349372
let mut guard = lock_context_handle()?;
350373

351374
match &mut *guard {
352-
Some(handler) => handler.update(payload),
375+
Some(handler) if handler.pid == getpid() => handler.update(payload),
376+
Some(handler) => {
377+
let mut local_handler = ProcessContextHandle::publish(payload)?;
378+
// If we've been forked, we need to prevent the mapping from being dropped
379+
// normally, as it would try to unmap a region that isn't mapped anymore in the
380+
// child process, or worse, could have been remapped to something else in the
381+
// meantime.
382+
//
383+
// To do so, we get the old handler back in `local_handler` and prevent `mapping`
384+
// from being dropped specifically.
385+
std::mem::swap(&mut local_handler, handler);
386+
let _: ManuallyDrop<MemMapping> = ManuallyDrop::new(local_handler.mapping);
387+
388+
Ok(())
389+
}
353390
None => {
354391
*guard = Some(ProcessContextHandle::publish(payload)?);
355392
Ok(())

0 commit comments

Comments
 (0)