Skip to content

Directly join owned threads on cleanup#92

Open
GregoryComer wants to merge 1 commit intogoogle:mainfrom
GregoryComer:win-cleanup-fix
Open

Directly join owned threads on cleanup#92
GregoryComer wants to merge 1 commit intogoogle:mainfrom
GregoryComer:win-cleanup-fix

Conversation

@GregoryComer
Copy link
Copy Markdown

@GregoryComer GregoryComer commented Apr 30, 2026

On Windows during process shutdown, Windows can kill the threadpool threads before they have a chance to signal exit. This causes pthreadpool_destroy to hang indefinitely in wait_on_num_recruited_threads. I ran into this while updating pytorch and executorch to use google/pthreadpool, in preparation to update the XNNPACK dependency.

Here's a backtrace for the hang. Note that all threadpool threads are already killed.

      frame #0: 0x00007ff89ea256a4 ntdll.dll`NtWaitForAlertByThreadId + 20
      frame #1: 0x00007ff89e954c1e ntdll.dll`RtlSleepConditionVariableSRW + 478
      frame #2: 0x00007ff89b452818 KernelBase.dll`SleepConditionVariableSRW + 56
      frame #3: 0x00007ff8767119e8 vcruntime140_threads.dll`cnd_wait + 72
      frame #4: 0x00007fffe7191133 _portable_lib.cp312-win_amd64.pyd`wait_on_num_recruited_threads(threadpool=0x0000020c1b9e20c0, expected=<unavailable>) at threadpool-atomics.h:122
      frame #5: 0x00007fffe7191011 _portable_lib.cp312-win_amd64.pyd`pthreadpool_destroy(threadpool=0x0000020c1b9e20c0) at pthreads.c:810
      frame #6: 0x00007fffe715ff2b _portable_lib.cp312-win_amd64.pyd`std::unique_ptr<pthreadpool,void (*)(pthreadpool *)>::~unique_ptr(this=<unavailable>) at memory:3456 [inlined]
      frame #7: 0x00007fffe715ff1f _portable_lib.cp312-win_amd64.pyd`executorch::extension::threadpool::ThreadPool::~ThreadPool(this=0x0000020c1ba4dd90) [inlined]
      frame #8: 0x00007fffe715ff1f _portable_lib.cp312-win_amd64.pyd`std::default_delete<executorch::extension::threadpool::ThreadPool>::operator(this=<unavailable>, _Ptr=0x0000020c1ba4dd90) [inlined]
      frame #9: 0x00007fffe715ff1f _portable_lib.cp312-win_amd64.pyd`std::unique_ptr<executorch::extension::threadpool::ThreadPool,std::default_delete<executorch::extension::threadpool::ThreadPool> >::~unique_ptr(this=<unavailable>) at
  memory:3456 [inlined]
      frame #10: 0x00007fffe715ff0b _portable_lib.cp312-win_amd64.pyd``dynamic atexit destructor for 'threadpool' at threadpool.cpp:159
      frame #11: 0x00007ff89c30bc75 ucrtbase.dll`bsearch_s + 597
      frame #12: 0x00007ff89c30b897 ucrtbase.dll`_execute_onexit_table + 135
      frame #13: 0x00007ff89c30b84d ucrtbase.dll`_execute_onexit_table + 61
      frame #14: 0x00007fffe762e23d _portable_lib.cp312-win_amd64.pyd`dllmain_crt_process_detach(is_terminating=<unavailable>) at dll_dllmain.cpp:180
      frame #15: 0x00007fffe762e3d1 _portable_lib.cp312-win_amd64.pyd`dllmain_dispatch(instance=0x00007fffe6ca0000, reason=<unavailable>, reserved=0x0000000000000001) at dll_dllmain.cpp:293
      frame #16: 0x00007ff89ea1f6fe ntdll.dll`RtlEncodeRemotePointer + 206
      frame #17: 0x00007ff89e8cbcae ntdll.dll`RtlRaiseException + 5486
      frame #18: 0x00007ff89e94d37f ntdll.dll`LdrShutdownProcess + 383
      frame #19: 0x00007ff89e94c54e ntdll.dll`RtlExitUserProcess + 158
      frame #20: 0x00007ff89ca918ab kernel32.dll`ExitProcess + 11
      frame #21: 0x00007ff89c360093 ucrtbase.dll`logbf + 707
      frame #22: 0x00007ff770c914fb python.exe`__scrt_common_main_seh at exe_common.inl:295
      frame #23: 0x00007ff89ca7e8d7 kernel32.dll`BaseThreadInitThunk + 23
      frame #24: 0x00007ff89e94c40c ntdll.dll`RtlUserThreadStart + 44

To repro, having a pthreadpool destructor called at process exit on Windows will repro 100% of the time for me (for example, from a destructor on a static global). I'm using clang-cl and running inside a native python extension, I haven't tried MSVC.

To solve it, I've just reworked the non-executor thread path to just directly join. It will still call wait_on_num_recruited threads when using executor-provided threads.

In theory, executor-owned threads could have the same bug. It should maybe have a timeout. Alternatively, if you wanted to tolerate some windows-specific code, you could check to see if the thread is still alive.

Test Plan

I verified that this patch works on PyTorch core and ExecuTorch. There's a few red jobs, but they are unrelated.
pytorch/pytorch#178201
pytorch/executorch#19237

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants