Skip to content

Commit 9c963ee

Browse files
committed
Auto merge of rust-lang#148799 - ohadravid:windows-thread-local-dtors-using-fls, r=ChrisDenton
Switch the destructors implementation for thread locals on Windows to use FLS ## Summary Switch the thread local **destructors** implementation on Windows to use the _Fiber Local Storage_ APIs, which provide native support for setting a callback to be called on thread termination, replacing the current `tls_callback` symbol-based implementation. _Except for some spellchecking, no LLMs were used to produce code / comments / text in this PR._ ## Current Implementation On Windows, in order to support thread locals with destructors, the standard library uses a special `tls_callback` symbol that is used to call the `destructors::run()` hook on thread termination. This has two downsides: 1. It is not well documented, and seems to cause some problems [1] [2] [3]. 2. It disallows some synchronization operations, as mentioned in [`LocalKey`'s documentation](https://doc.rust-lang.org/std/thread/struct.LocalKey.html#synchronization-in-thread-local-destructors). [1]: rust-lang#144234 [2]: rust-lang#145154 [3]: rust-lang#140798 as an example of point 2, this code, which uses `JoinHandle::join` in a thread local Drop impl, will deadlock on stable: <details> <summary>Join-on-Drop Deadlock Example</summary> ```rust struct JoinOnDrop(Option<JoinHandle<()>>); impl Drop for JoinOnDrop { fn drop(&mut self) { self.0.take().unwrap().join().unwrap(); } } thread_local! { static HANDLE: JoinOnDrop = { let thread = std::thread::spawn(|| { println!("Starting..."); // std::thread::sleep(Duration::from_secs(3)); println!("Done"); }); JoinOnDrop(Some(thread)) }; } fn main() { let thread = std::thread::spawn(|| { HANDLE.with(|_| { println!("Some other thread"); }) }); thread.join().unwrap(); println!("Done"); } ``` </details> ## Proposed Change We can use the `Fls{Alloc,Set,Get,Free}` functions (see https://devblogs.microsoft.com/oldnewthing/20191011-00/?p=102989) to implement the dtor callback needed for thread locals that have a Drop implementation. We allocate a single key, and use its destructor callback to run all the registered destructors when a thread is shutting down. With this implementation, the above code sample will not deadlock (but it still might not be a good idea to do this!). ## Safety and Compatibility We use the common `thread_local` + atomic pattern to only set a single FLS key. The destructor callback is only called when that value is non-zero. Destructors will only run at thread exit: we verify that we are not running in a fiber during the destructors callback. **This means that using fibers (which is very rare) will result in thread locals being leaked**, unless the fiber is converted back to a thread using `ConvertFiberToThread` before thread termination. This is not ideal, but should be OK as destructors are not guaranteed to run, but it needs to be documented. It might be possible for the user to use something like the current `tls_callback` to observe an already-freed thread locals, which is something that can also happen in the current implementation. ~Destructors will only run on the correct thread: Fibers cannot be moved between threads.~ Destructors will only run on the correct thread: the hook uses a `#[thread_local]` list, so fiber movement between threads does not change which which thread executes the destructors. Destructors will only run once: even if the hook is called multiple time, the `#[thread_local]` list is cleared after the first run. Users cannot observe different locals because they are using fibers: because we only use an Fls local marker to trigger the destructors callback, we don't change anything about how users interact with "normal" thread locals and fiber locals. ### DLL Unloading It is possible to build a `cdylib` which uses thread locals and unload it dynamically using `FreeLibrary`. This can cause the OS to call into an unmapped cleanup hook, so we use `atexit` to manually free the special FLS key, which will also trigger the cleanup hook for each registered thread. This is safe because similar to thread shutdown, no user code can ran after this point, and only the destructors of the running thread will run. see `tests/run-make/dynamic-loading-cdylib/load_and_unload.rs`. ## Other Notes The implementation is based on the `key::racy` and `guard::apple` code, because we need a `LazyKey`-like racey static and an `enable` function. While TLS slots are [limited to 1088](https://devblogs.microsoft.com/oldnewthing/20170712-00/?p=96585), FLS slots are currently [limited to 4000](https://devblogs.microsoft.com/windows-music-dev/effectively-removing-the-fls-slot-allocation-limit-in-windows-10/) per process. ### Miri Because miri is aware to the thread local implementation, I also implemented these functions and support for them in the interpreter here: https://github.com/rust-lang/miri/compare/master...ohadravid:miri:windows-fls-support?expand=1 I guess that this will need to be merged before this PR (if this is accepted) - let me know and I'll open that PR as well. ### Targets without `target_thread_local` In `*-gnu` Windows targets, the `target_thread_local` feature is unavailable. We could also change the "key" (non-`target_thread_local`) Windows impl at `library\std\src\sys\thread_local\key\windows.rs` to be based on the Fls functions. I can add it to this PR, or as a separate PR, if you think this is preferable. `Cell` in a `#[thread_local]` is used to store the resulting key, like the other implementations. When `target_thread_local` isn't available, we always fetch the atomic and set the FLS key's value.
2 parents e164200 + 336dc57 commit 9c963ee

15 files changed

Lines changed: 875 additions & 113 deletions

File tree

library/std/src/sys/pal/windows/c.rs

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
#![unstable(issue = "none", feature = "windows_c")]
66
#![allow(clippy::style)]
77

8-
use core::ffi::{CStr, c_uint, c_ulong, c_ushort, c_void};
8+
use core::ffi::{CStr, c_int, c_uint, c_ulong, c_ushort, c_void};
99
use core::ptr;
1010

1111
mod windows_sys;
@@ -241,3 +241,7 @@ cfg_select! {
241241
// Only available starting with Windows 8.
242242
#[cfg(not(target_vendor = "win7"))]
243243
windows_link::link!("ws2_32.dll" "system" fn GetHostNameW(name : PWSTR, namelen : i32) -> i32);
244+
245+
unsafe extern "C" {
246+
pub fn atexit(cb: unsafe extern "C" fn()) -> c_int;
247+
}

library/std/src/sys/pal/windows/c/bindings.txt

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2130,6 +2130,11 @@ FindExSearchNameMatch
21302130
FindFirstFileExW
21312131
FindNextFileW
21322132
FIONBIO
2133+
FLS_OUT_OF_INDEXES
2134+
FlsAlloc
2135+
FlsFree
2136+
FlsGetValue
2137+
FlsSetValue
21332138
FlushFileBuffers
21342139
FORMAT_MESSAGE_ALLOCATE_BUFFER
21352140
FORMAT_MESSAGE_ARGUMENT_ARRAY
@@ -2258,6 +2263,7 @@ IPV6_DROP_MEMBERSHIP
22582263
IPV6_MREQ
22592264
IPV6_MULTICAST_LOOP
22602265
IPV6_V6ONLY
2266+
IsThreadAFiber
22612267
LINGER
22622268
listen
22632269
LocalFree
@@ -2318,6 +2324,7 @@ OPEN_ALWAYS
23182324
OPEN_EXISTING
23192325
OpenProcessToken
23202326
OVERLAPPED
2327+
PFLS_CALLBACK_FUNCTION
23212328
PIPE_ACCEPT_REMOTE_CLIENTS
23222329
PIPE_ACCESS_DUPLEX
23232330
PIPE_ACCESS_INBOUND

library/std/src/sys/pal/windows/c/windows_sys.rs

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,10 @@ windows_link::link!("kernel32.dll" "system" fn ExitProcess(uexitcode : u32) -> !
2727
windows_link::link!("kernel32.dll" "system" fn FindClose(hfindfile : HANDLE) -> BOOL);
2828
windows_link::link!("kernel32.dll" "system" fn FindFirstFileExW(lpfilename : PCWSTR, finfolevelid : FINDEX_INFO_LEVELS, lpfindfiledata : *mut core::ffi::c_void, fsearchop : FINDEX_SEARCH_OPS, lpsearchfilter : *const core::ffi::c_void, dwadditionalflags : FIND_FIRST_EX_FLAGS) -> HANDLE);
2929
windows_link::link!("kernel32.dll" "system" fn FindNextFileW(hfindfile : HANDLE, lpfindfiledata : *mut WIN32_FIND_DATAW) -> BOOL);
30+
windows_link::link!("kernel32.dll" "system" fn FlsAlloc(lpcallback : PFLS_CALLBACK_FUNCTION) -> u32);
31+
windows_link::link!("kernel32.dll" "system" fn FlsFree(dwflsindex : u32) -> BOOL);
32+
windows_link::link!("kernel32.dll" "system" fn FlsGetValue(dwflsindex : u32) -> *mut core::ffi::c_void);
33+
windows_link::link!("kernel32.dll" "system" fn FlsSetValue(dwflsindex : u32, lpflsdata : *const core::ffi::c_void) -> BOOL);
3034
windows_link::link!("kernel32.dll" "system" fn FlushFileBuffers(hfile : HANDLE) -> BOOL);
3135
windows_link::link!("kernel32.dll" "system" fn FormatMessageW(dwflags : FORMAT_MESSAGE_OPTIONS, lpsource : *const core::ffi::c_void, dwmessageid : u32, dwlanguageid : u32, lpbuffer : PWSTR, nsize : u32, arguments : *const *const i8) -> u32);
3236
windows_link::link!("kernel32.dll" "system" fn FreeEnvironmentStringsW(penv : PCWSTR) -> BOOL);
@@ -67,6 +71,7 @@ windows_link::link!("kernel32.dll" "system" fn GetWindowsDirectoryW(lpbuffer : P
6771
windows_link::link!("kernel32.dll" "system" fn InitOnceBeginInitialize(lpinitonce : *mut INIT_ONCE, dwflags : u32, fpending : *mut BOOL, lpcontext : *mut *mut core::ffi::c_void) -> BOOL);
6872
windows_link::link!("kernel32.dll" "system" fn InitOnceComplete(lpinitonce : *mut INIT_ONCE, dwflags : u32, lpcontext : *const core::ffi::c_void) -> BOOL);
6973
windows_link::link!("kernel32.dll" "system" fn InitializeProcThreadAttributeList(lpattributelist : LPPROC_THREAD_ATTRIBUTE_LIST, dwattributecount : u32, dwflags : u32, lpsize : *mut usize) -> BOOL);
74+
windows_link::link!("kernel32.dll" "system" fn IsThreadAFiber() -> BOOL);
7075
windows_link::link!("kernel32.dll" "system" fn LocalFree(hmem : HLOCAL) -> HLOCAL);
7176
windows_link::link!("kernel32.dll" "system" fn LockFileEx(hfile : HANDLE, dwflags : LOCK_FILE_FLAGS, dwreserved : u32, nnumberofbytestolocklow : u32, nnumberofbytestolockhigh : u32, lpoverlapped : *mut OVERLAPPED) -> BOOL);
7277
windows_link::link!("kernel32.dll" "system" fn MoveFileExW(lpexistingfilename : PCWSTR, lpnewfilename : PCWSTR, dwflags : MOVE_FILE_FLAGS) -> BOOL);
@@ -2667,6 +2672,7 @@ impl Default for FLOATING_SAVE_AREA {
26672672
unsafe { core::mem::zeroed() }
26682673
}
26692674
}
2675+
pub const FLS_OUT_OF_INDEXES: u32 = 4294967295u32;
26702676
pub const FORMAT_MESSAGE_ALLOCATE_BUFFER: FORMAT_MESSAGE_OPTIONS = 256u32;
26712677
pub const FORMAT_MESSAGE_ARGUMENT_ARRAY: FORMAT_MESSAGE_OPTIONS = 8192u32;
26722678
pub const FORMAT_MESSAGE_FROM_HMODULE: FORMAT_MESSAGE_OPTIONS = 2048u32;
@@ -3037,6 +3043,8 @@ pub struct OVERLAPPED_0_0 {
30373043
}
30383044
pub type PCSTR = *const u8;
30393045
pub type PCWSTR = *const u16;
3046+
pub type PFLS_CALLBACK_FUNCTION =
3047+
Option<unsafe extern "system" fn(lpflsdata: *const core::ffi::c_void)>;
30403048
pub type PIO_APC_ROUTINE = Option<
30413049
unsafe extern "system" fn(
30423050
apccontext: *mut core::ffi::c_void,
Lines changed: 200 additions & 85 deletions
Original file line numberDiff line numberDiff line change
@@ -1,103 +1,218 @@
11
//! Support for Windows TLS destructors.
22
//!
3-
//! Unfortunately, Windows does not provide a nice API to provide a destructor
4-
//! for a TLS variable. Thus, the solution here ended up being a little more
5-
//! obscure, but fear not, the internet has informed me [1][2] that this solution
6-
//! is not unique (no way I could have thought of it as well!). The key idea is
7-
//! to insert some hook somewhere to run arbitrary code on thread termination.
8-
//! With this in place we'll be able to run anything we like, including all
9-
//! TLS destructors!
3+
//! Windows has an API to provide a destructor for a FLS (fiber local storage) variable,
4+
//! which behaves similarly to a TLS variable for our purpose [1].
105
//!
11-
//! In order to realize this, all TLS destructors are tracked by *us*, not the
12-
//! Windows runtime. This means that we have a global list of destructors for
6+
//! All TLS destructors are tracked by *us*, not the Windows runtime.
7+
//! This means that we have a global list of destructors for
138
//! each TLS key or variable that we know about.
149
//!
15-
//! # What's up with CRT$XLB?
16-
//!
17-
//! For anything about TLS destructors to work on Windows, we have to be able
18-
//! to run *something* when a thread exits. To do so, we place a very special
19-
//! static in a very special location. If this is encoded in just the right
20-
//! way, the kernel's loader is apparently nice enough to run some function
21-
//! of ours whenever a thread exits! How nice of the kernel!
22-
//!
23-
//! Lots of detailed information can be found in source [1] above, but the
24-
//! gist of it is that this is leveraging a feature of Microsoft's PE format
25-
//! (executable format) which is not actually used by any compilers today.
26-
//! This apparently translates to any callbacks in the ".CRT$XLB" section
27-
//! being run on certain events.
28-
//!
29-
//! So after all that, we use the compiler's `#[link_section]` feature to place
30-
//! a callback pointer into the magic section so it ends up being called.
31-
//!
32-
//! # What's up with this callback?
33-
//!
34-
//! The callback specified receives a number of parameters from... someone!
35-
//! (the kernel? the runtime? I'm not quite sure!) There are a few events that
36-
//! this gets invoked for, but we're currently only interested on when a
37-
//! thread or a process "detaches" (exits). The process part happens for the
38-
//! last thread and the thread part happens for any normal thread.
39-
//!
40-
//! # The article mentions weird stuff about "/INCLUDE"?
41-
//!
42-
//! It sure does! Specifically we're talking about this quote:
43-
//!
44-
//! ```quote
45-
//! The Microsoft run-time library facilitates this process by defining a
46-
//! memory image of the TLS Directory and giving it the special name
47-
//! “__tls_used” (Intel x86 platforms) or “_tls_used” (other platforms). The
48-
//! linker looks for this memory image and uses the data there to create the
49-
//! TLS Directory. Other compilers that support TLS and work with the
50-
//! Microsoft linker must use this same technique.
51-
//! ```
52-
//!
53-
//! Basically what this means is that if we want support for our TLS
54-
//! destructors/our hook being called then we need to make sure the linker does
55-
//! not omit this symbol. Otherwise it will omit it and our callback won't be
56-
//! wired up.
57-
//!
58-
//! We don't actually use the `/INCLUDE` linker flag here like the article
59-
//! mentions because the Rust compiler doesn't propagate linker flags, but
60-
//! instead we use a shim function which performs a volatile 1-byte load from
61-
//! the address of the _tls_used symbol to ensure it sticks around.
62-
//!
63-
//! [1]: https://www.codeproject.com/Articles/8113/Thread-Local-Storage-The-C-Way
64-
//! [2]: https://github.com/ChromiumWebApps/chromium/blob/master/base/threading/thread_local_storage_win.cc#L42
10+
//! [1]: https://devblogs.microsoft.com/oldnewthing/20191011-00/?p=102989
6511
6612
use core::ffi::c_void;
13+
use core::sync::atomic::{AtomicBool, AtomicU32, Ordering, fence};
6714

15+
use crate::cell::Cell;
6816
use crate::ptr;
69-
use crate::sys::c;
17+
use crate::sys::c::{self, FLS_OUT_OF_INDEXES};
18+
19+
pub type Key = u32;
20+
21+
unsafe fn create(dtor: c::PFLS_CALLBACK_FUNCTION) -> Key {
22+
let key_result = unsafe { c::FlsAlloc(dtor) };
23+
24+
if key_result == c::FLS_OUT_OF_INDEXES {
25+
rtabort!("out of FLS keys");
26+
}
7027

71-
unsafe extern "C" {
72-
#[link_name = "_tls_used"]
73-
static TLS_USED: u8;
28+
key_result
7429
}
75-
pub fn enable() {
76-
// When destructors are used, we need to add a reference to the _tls_used
77-
// symbol provided by the CRT, otherwise the TLS support code will get
78-
// GC'd by the linker and our callback won't be called.
79-
unsafe { ptr::from_ref(&TLS_USED).read_volatile() };
80-
// We also need to reference CALLBACK to make sure it does not get GC'd
81-
// by the compiler/LLVM. The callback will end up inside the TLS
82-
// callback array pointed to by _TLS_USED through linker shenanigans,
83-
// but as far as the compiler is concerned, it looks like the data is
84-
// unused, so we need this hack to prevent it from disappearing.
85-
unsafe { ptr::from_ref(&CALLBACK).read_volatile() };
30+
31+
unsafe fn set(key: Key, ptr: *const c_void) {
32+
let result = unsafe { c::FlsSetValue(key, ptr) };
33+
34+
if result == c::FALSE {
35+
rtabort!("failed to set FLS value");
36+
}
37+
}
38+
39+
fn is_thread_a_fiber() -> bool {
40+
let res = unsafe { c::IsThreadAFiber() };
41+
res == c::TRUE
42+
}
43+
44+
static KEY: AtomicU32 = AtomicU32::new(FLS_OUT_OF_INDEXES);
45+
46+
/// Used to track whether we are currently in the critical section of `enable`.
47+
/// For miri, these atomic operations cause synchronization that can mask user bugs,
48+
/// and they are not needed as `atexit` is anyway not supported, so we can skip them.
49+
struct EnableGuard;
50+
static AT_EXIT_HOOK_CALLED: AtomicBool = AtomicBool::new(false);
51+
static ACTIVE_ENABLE_CALLS: AtomicU32 = AtomicU32::new(0);
52+
53+
impl EnableGuard {
54+
// Mark the start of an `enable` call, returning whether the `atexit` hook has already been called or not.
55+
fn new() -> (Self, bool) {
56+
if cfg!(miri) {
57+
return (Self, false);
58+
}
59+
ACTIVE_ENABLE_CALLS.fetch_add(1, Ordering::Relaxed);
60+
61+
// Both `new` and `start_exit` publish state to one atomic and inspect the other.
62+
// `AcqRel` is insufficient because neither read is required to observe the other's publication,
63+
// so we could create the guard but `start_exit` would not see any active enable calls.
64+
// `SeqCst` ensures that there's a single global order between the publish and check,
65+
// so at least one side must observe the other and bail.
66+
fence(Ordering::SeqCst);
67+
68+
let at_exit_called = AT_EXIT_HOOK_CALLED.load(Ordering::Relaxed);
69+
70+
(Self, at_exit_called)
71+
}
72+
73+
/// Mark the start of process exit, returning whether we should free the FLS key or not.
74+
fn start_exit() -> bool {
75+
// After this hook starts, new destructor registration will be skipped,
76+
// causing TLS destructors initialized after this point to leak.
77+
if AT_EXIT_HOOK_CALLED.swap(true, Ordering::Relaxed) {
78+
// Cleanup already started, there is nothing else to do.
79+
return false;
80+
}
81+
82+
fence(Ordering::SeqCst);
83+
84+
let any_active_enabled_called = ACTIVE_ENABLE_CALLS.load(Ordering::Relaxed) != 0;
85+
86+
if any_active_enabled_called {
87+
// If another thread is currently in `enable`, it may already have loaded this key and may be about to call `FlsSetValue`.
88+
// So we must *not* call free the FLS key.
89+
//
90+
// During real process exit this is harmless because the `cleanup` hook is always available,
91+
// and the FLS callback will be triggered normally by the OS.
92+
//
93+
// During DLL unload, the unloader cannot safely have threads running code from the DLL except for the destructors,
94+
// so there must not be any `enable` calls active anyway.
95+
return false;
96+
}
97+
98+
return true;
99+
}
86100
}
87101

88-
#[unsafe(link_section = ".CRT$XLB")]
89-
#[cfg_attr(miri, used)] // Miri only considers explicitly `#[used]` statics for `lookup_link_section`
90-
pub static CALLBACK: unsafe extern "system" fn(*mut c_void, u32, *mut c_void) = tls_callback;
102+
#[cfg(not(miri))]
103+
impl Drop for EnableGuard {
104+
fn drop(&mut self) {
105+
ACTIVE_ENABLE_CALLS.fetch_sub(1, Ordering::Relaxed);
106+
}
107+
}
91108

92-
unsafe extern "system" fn tls_callback(_h: *mut c_void, dw_reason: u32, _pv: *mut c_void) {
93-
if dw_reason == c::DLL_THREAD_DETACH || dw_reason == c::DLL_PROCESS_DETACH {
94-
unsafe {
95-
#[cfg(target_thread_local)]
96-
super::super::destructors::run();
97-
#[cfg(not(target_thread_local))]
98-
super::super::key::run_dtors();
109+
pub fn enable() {
110+
let registered = if cfg!(target_thread_local) {
111+
#[thread_local]
112+
static REGISTERED: Cell<bool> = Cell::new(false);
113+
REGISTERED.replace(true)
114+
} else {
115+
// `#[thread_local]` is unavailable on windows-gnu (`target_thread_local` is off),
116+
// but setting the FLS key's value is about as expensive as `TlsGet`, so we don't bother tracking registration separately.
117+
false
118+
};
99119

100-
crate::rt::thread_cleanup();
120+
if !registered {
121+
// We are in a critical section where we are trying to register a destructor for the current thread.
122+
// We need to avoid racing with the `atexit` hook that frees the FLS slot, which would cause us to call `FlsSetValue` on a freed key,
123+
// or calling `atexit` during process shutdown, which would cause a deadlock.
124+
let (_guard, at_exit_called) = EnableGuard::new();
125+
126+
if at_exit_called {
127+
// We are exiting and don't want to race with the `atexit` hook, so we won't be able to run the destructors for this thread.
128+
return;
101129
}
130+
131+
let current_key = KEY.load(Ordering::Acquire);
132+
133+
// If we already allocated a key, we only need to set it to a non-null value so that the destructors hook is run for this thread.
134+
let key = if current_key != FLS_OUT_OF_INDEXES {
135+
current_key
136+
} else {
137+
// Otherwise, we try to allocate a key.
138+
let new_key = unsafe { create(Some(cleanup)) };
139+
140+
// Now we need to set this key to be used by everyone else.
141+
// If we won the race, our key is the right one and we can set it to non-null value.
142+
// If we lost, we'll use the winning key and free our losing key.
143+
match KEY.compare_exchange(current_key, new_key, Ordering::Release, Ordering::Acquire) {
144+
Ok(_) => {
145+
// If the current DLL is unloaded, the registered `cleanup` hook will not be available later during thread exit,
146+
// triggering a `STATUS_ACCESS_VIOLATION`. To avoid this, we use the `atexit` hook, which is called during DLL unload
147+
// to manually free the FLS slot, triggering the destructors.
148+
//
149+
// However, calling `atexit` during process exit can cause a deadlock.
150+
// In a Rust binary, `enable` is called during the main thread startup and before any user code,
151+
// and we checked using `at_exit_called` that we aren't in process shutdown.
152+
//
153+
// In a Rust DLL, dynamic unloading can only happen safely when no other threads are
154+
// concurrently executing Rust code, so if we are here we cannot be unloading yet.
155+
//
156+
// If a main non-Rust binary is exiting, it must not be trigger the `enable` guard
157+
// for the first time during process shutdown.
158+
let res = unsafe { c::atexit(free_fls_key_at_exit) };
159+
if res != 0 {
160+
rtabort!("failed to register fls atexit hook");
161+
}
162+
163+
new_key
164+
}
165+
Err(other_key) => {
166+
unsafe { c::FlsFree(new_key) };
167+
other_key
168+
}
169+
}
170+
};
171+
172+
// Setting the key's value to non-zero will cause the dtor callback to be called when the thread exits.
173+
unsafe { set(key, ptr::without_provenance(1)) };
174+
}
175+
}
176+
177+
extern "C" fn free_fls_key_at_exit() {
178+
// The main purpose of this hook is to free the FLS slot during DLL unload.
179+
// However, this hook will also be called during normal process exit, while other Rust threads are still running,
180+
// so we must be careful to avoid races with `enable`.
181+
let should_free_key = EnableGuard::start_exit();
182+
if !should_free_key {
183+
return;
184+
}
185+
186+
let current_key = KEY.swap(c::FLS_OUT_OF_INDEXES, Ordering::AcqRel);
187+
if current_key != c::FLS_OUT_OF_INDEXES {
188+
// Calling `FlsFree` will cause the OS to call the `cleanup` hook, in the current thread, *for each thread* (or fiber) with a value in this FLS slot.
189+
// `cleanup` is safe to run repeatedly: it only drains the current thread's TLS destructor list, and we check that we are not running in a fiber before doing so.
190+
// We only call this when no `enable` call is active, so it cannot race with `FlsSetValue` using this key.
191+
// Destructors of thread locals in other threads will not run and therefore leak, which is allowed since we are exiting or unloading.
192+
unsafe { c::FlsFree(current_key) };
193+
}
194+
}
195+
196+
unsafe extern "system" fn cleanup(_ptr: *const c_void) {
197+
// Avoid running the hook if we are in a fiber.
198+
// This will cause destructors of thread locals to not run, leaking them.
199+
// Thread-local runtime state will not be cleaned.
200+
//
201+
// We need to verify that we won't run the destructors *before* the thread exits,
202+
// but if the fiber that registered the callback is deleted, the thread might still be running other fibers.
203+
//
204+
// By checking that we are not running in a fiber here, we are guaranteed that the hook is only running during the thread's exit.
205+
// See also the `fiber_does_not_trigger_dtor` test.
206+
if is_thread_a_fiber() {
207+
return;
102208
}
209+
210+
unsafe {
211+
#[cfg(target_thread_local)]
212+
super::super::destructors::run();
213+
#[cfg(not(target_thread_local))]
214+
super::super::key::run_dtors();
215+
}
216+
217+
crate::rt::thread_cleanup();
103218
}

0 commit comments

Comments
 (0)