Skip to content

common: Intentionally leak logger instance to fix hanging on Windows #22273

Open
rillomas wants to merge 10 commits intoggml-org:masterfrom
rillomas:fix-hang-on-windows
Open

common: Intentionally leak logger instance to fix hanging on Windows #22273
rillomas wants to merge 10 commits intoggml-org:masterfrom
rillomas:fix-hang-on-windows

Conversation

@rillomas
Copy link
Copy Markdown
Contributor

@rillomas rillomas commented Apr 23, 2026

Overview

Added workaround for #22142. There are three points in this PR:

  • Intentional leak of logger instance
    • ~common_log() called at DLL teardown phase was causing hanging on Windows. DLL teardown phase seems to be a fragile timing to do system calls like mutex lock, cond notify, thread join, etc. which did not provide sane results. We are working around this by intentionally leaking the logger instance to skip cleanup.
  • Using static array's instead of std::vector in logger
    • We were seeing crashes on Linux when logger thread outlived the std::vector teardown, causing the thread to touch released items on std::vector. Using static arrays prevents this.
  • Calling common_log_free at test-log
    • Even if we skipped the logger destructor, we were still seeing hangs on Windows after adding common_log_flush. This seemed like a timing issue where the OS is running DLL teardown while the runtime tries to start the logger thread asynchronously (which was started by common_log_flush). We added an explicit cleanup in the end of the unit test to avoid this.

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: YES (Copilot GPT5.4 medium). Used for issue analysis and solution recommendation.

@ggerganov ggerganov added the merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. label Apr 23, 2026
@ggerganov ggerganov removed the merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. label Apr 24, 2026
@ggerganov
Copy link
Copy Markdown
Member

Does not seem to work - the tests are failing

@rillomas
Copy link
Copy Markdown
Contributor Author

Let me have a look again.

Using std::vector will cause g_col to be released before the logger thread exits, causing the logger thread to touch freed memory causing a crash
@rillomas rillomas marked this pull request as ready for review April 24, 2026 08:27
@rillomas
Copy link
Copy Markdown
Contributor Author

rillomas commented Apr 24, 2026

I've fixed the crashing on Linux. It seems like g_col being a vector was causing the heap to get freed at teardown phase while the logger thread is running, causing the crash. Previously the common_log destructor was (presumably) joining the logger thread first assuring g_col was never touched after release.

@rillomas rillomas requested a review from ggerganov as a code owner April 24, 2026 08:37
@github-actions github-actions Bot added the testing Everything test related label Apr 24, 2026
@rillomas rillomas marked this pull request as draft April 25, 2026 00:18
@rillomas
Copy link
Copy Markdown
Contributor Author

rillomas commented Apr 27, 2026

Seems like even if we skip the logger's destruction, calling common_log_flush just before the program exits will cause hanging in the same way. It looks like a timing issue where the OS tries to clean the threads and the logger tries to run the logger thread because the hanging goes away when I add printf which changes timing.

I guess I'll just remove the intentional leaking and fix the unit test to call common_log_free since we'll need it anyway, and it's safer to call it from the app rather than during DLL teardown. Forgot that the hanging was happening on other apps so I'll need to keep the intentional leaking. I'll also need the common_log_free in the unit test.

@rillomas rillomas changed the title common: Changed to leak logger singleton to prevent hanging on Windows tests: Fix hanging on logger unit test Apr 27, 2026
@rillomas rillomas changed the title tests: Fix hanging on logger unit test common: Intentionally leak logger instance to fix hanging on Windows Apr 27, 2026
@rillomas rillomas marked this pull request as ready for review April 27, 2026 02:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants