Skip to content

subprocess-posix: use clone in Linux and rfork_thread in FreeBSD#14973

Open
ruihe774 wants to merge 4 commits intompv-player:masterfrom
ruihe774:subprocess-clone
Open

subprocess-posix: use clone in Linux and rfork_thread in FreeBSD#14973
ruihe774 wants to merge 4 commits intompv-player:masterfrom
ruihe774:subprocess-clone

Conversation

@ruihe774
Copy link
Copy Markdown
Contributor

@ruihe774 ruihe774 commented Oct 1, 2024

This PR introduces platform-specific implementations, namely clone in Linux and rfork_thread in FreeBSD, to optimize subprocess creation. vfork semantics is used in these implementations: virtual memory is shared between the parent and the child, and the execution of the parent suspends until the child terminates. (However, vfork is not used because its behavior varies across platforms, and it's difficult to be used properly in C.) Besides lower overhead, this simplifies the result passing: no pipe is needed.

Comment thread osdep/subprocess-posix.c Outdated
@sfan5
Copy link
Copy Markdown
Member

sfan5 commented Oct 1, 2024

I feel like introducing such additional complex, platform-specific code is not worth it for a vague "lower overhead" in process creation (when is this a bottleneck??).

@kasper93
Copy link
Copy Markdown
Member

kasper93 commented Oct 1, 2024

when is this a bottleneck??

I agree, we should evaluate whether it is needed using some test cases that show the actual difference. mpv, under normal operation, shouldn't spawn subprocesses, especially not ones that are performance critical.

@ruihe774
Copy link
Copy Markdown
Contributor Author

ruihe774 commented Oct 1, 2024

I feel like introducing such additional complex, platform-specific code is not worth it for a vague "lower overhead" in process creation (when is this a bottleneck??).

Well, I don't think it's complex. clone and rfork_thread share most of the logic; the only difference is their API names. And most code line addition results from the move of spawn_process into spawn_process_inner and child_main; most code blocks are reused from current implementation. I've tried my best to keep it neat.

You can also have a look at the new handling of detach. Previously, it is handled using redundant code, and now it shares the same code path.

These APIs are also used by posix_spawn, which is used before 5309497. So you can think of this PR a restore of performance.

@na-na-hi
Copy link
Copy Markdown
Contributor

na-na-hi commented Oct 1, 2024

Agree with what's already said. I think there is no reason to maintain 2 additional platform-specific implementations for little benefit when the existing one works. Additionally from FreeBSD documentation: "The rfork_thread() function has been deprecated in favor of pthread_create(3)."

You can also have a look at the new handling of detach. Previously, it is handled using redundant code, and now it shares the same code path.

That's a separate issue and does not require the new process creation functions.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Oct 1, 2024

Download the artifacts for this pull request:

Windows
macOS

@ruihe774
Copy link
Copy Markdown
Contributor Author

ruihe774 commented Oct 1, 2024

Additionally from FreeBSD documentation: "The rfork_thread() function has been deprecated in favor of pthread_create(3)."

Not related IMO. The latter is for creating threads. The functionality used in the code cannot be achieved by pthread_create. rfork_thread is just a little wrapper of rfork, which is not deprecated. And the FreeBSD libc also uses it.

maintain 2 additional platform-specific implementations

I don't consider they as two separated platform-specific implementations as they only differ in one single line of code of calling the API.

As Linux has the de facto dominance, I think it's acceptable to optimize for it with little additional code complexity. You can consider the FreeBSD part as a side product.

@ruihe774
Copy link
Copy Markdown
Contributor Author

ruihe774 commented Oct 2, 2024

A microbenchmark:

local utils = require 'mp.utils'
local start_time = mp.get_time()
for i = 1,1000 do
    utils.subprocess_detached{
       args = {"/bin/echo", "test"}
    }
end
local end_time = mp.get_time()
print(end_time - start_time)

Tested on Linux 6.11.0:
Current: 2.881334s
This PR: 0.126008s

Tested on FreeBSD 13.4:
Current: 5.720950s
This PR: 0.457035s

I think the performance improvement is nontrivial.

@avih
Copy link
Copy Markdown
Member

avih commented Oct 2, 2024

I think the performance improvement is nontrivial.

It is non trivial, but, without commenting on the PR itself, on its own it doesn't necessarily carry a lot of weight.

The use cases of mpv where hundreds of processes per second are not enough but (many) thousands is just what the doctor ordered are probably not all that common.

You have a track record of good code, so that's a good start, but the couter argument is that it's currently (probably) good enough and known to work where needed, while new code does carry risks.

But if the reviewers find the new code great, then such a perf improvement is certainly nice.

@ruihe774
Copy link
Copy Markdown
Contributor Author

ruihe774 commented Oct 2, 2024

I think the performance improvement is nontrivial.

It is non trivial, but, without commenting on the PR itself, on its own it doesn't necessarily carry a lot of weight.

The use cases of mpv where hundreds of processes per second are not enough but (many) thousands is just what the doctor ordered are probably not all that common.

You have a track record of good code, so that's a good start, but the couter argument is that it's currently (probably) good enough and known to work where needed, while new code does carry risks.

But if the reviewers find the new code great, then such a perf improvement is certainly nice.

Yeah, we can put this aside.

@ruihe774 ruihe774 changed the title subprocess-posix: use clone in Linux and rfork_thread in FreeBSD [RFC] subprocess-posix: use clone in Linux and rfork_thread in FreeBSD Oct 2, 2024
@q3cpma
Copy link
Copy Markdown

q3cpma commented Oct 10, 2024

These APIs are also used by posix_spawn, which is used before 5309497. So you can think of this PR a restore of performance.

Worth noting that the new POSIX supports this use case: https://www.austingroupbugs.net/view.php?id=1044

@ruihe774
Copy link
Copy Markdown
Contributor Author

ruihe774 commented Oct 10, 2024

These APIs are also used by posix_spawn, which is used before 5309497. So you can think of this PR a restore of performance.

Worth noting that the new POSIX supports this use case: https://www.austingroupbugs.net/view.php?id=1044

Yes. But we still need to perform a double-fork in order to detach a child process. Is posix_spawn going to be async-signal-safe? If not, we still cannot use it after the first fork (or clone); rfork does not have this issue because RFSPAWN clears all signal handlers. We can actually use clone3 with CLONE_CLEAR_SIGHAND on Linux to achieve the similar; but clone3 is not wrapped by glibc and CLONE_CLEAR_SIGHAND requires Linux 5.5+.

BTW in fact we can use pidfd with poll in one separated thread to reap detached child processes in Linux, so that double-fork can be avoided. However, such an implementation will grow too complicated.

@kasper93 kasper93 added the priority:on-ice may be revisited later label Dec 11, 2024
Comment thread osdep/subprocess-posix.c Outdated
@N-R-K
Copy link
Copy Markdown
Contributor

N-R-K commented Jan 8, 2025

But in any case, I agree with the rest. Seems not worth it unless there's actual use cases where this matters.

@N-R-K
Copy link
Copy Markdown
Contributor

N-R-K commented Jan 10, 2025

Worth noting that the new POSIX supports this use case: https://www.austingroupbugs.net/view.php?id=1044

Yes. But we still need to perform a double-fork in order to detach a child process.

setsid already "detaches" the child process, or is mpv expecting some other semantic that setsid doesn't provide?

With setsid() you shouldn't need double fork. And with posix_spawn + POSIX_SPAWN_SETSID you shouldn't need fork at all.

Also, posix_spawn has an option to reset all signal-handlers to default (posix_spawnattr_{setflags,setsigdefault}), so you wouldn't need the reset all signals in a loop that we currently have either.

If POSIX_SPAWN_SETSID is available on targets we care about, then pursuing that seems like it'd be worthwhile.

@ruihe774
Copy link
Copy Markdown
Contributor Author

ruihe774 commented Jan 10, 2025

setsid already "detaches" the child process, or is mpv expecting some other semantic that setsid doesn't provide?

The problem is that we still need to reap the zombies if we do not double-fork.

@ruihe774
Copy link
Copy Markdown
Contributor Author

FWIW for anyone who is still interested in subprocess creation stuff, you may be also interested in my another PR: ziglang/zig#22368.

@N-R-K
Copy link
Copy Markdown
Contributor

N-R-K commented Jan 10, 2025

The problem is that we still need to reap the zombies if we do not double-fork.

Ah, yeah... the zombies. Looked around a bit but it doesn't seem like posix_spawn has any flags/option to solve this.

Maybe there's a better way to reap the child so we don't need to reparent it to pid 1? (Nothing that comes to mind though...)

@ruihe774
Copy link
Copy Markdown
Contributor Author

Maybe there's a better way to reap the child so we don't need to reparent it to pid 1? (Nothing that comes to mind though...)

We can waitpid(-1, ...) in a separated thread. An extra thread always has overhead, though. No silver bullet.

BTW we cannot assume orphans are reparented to pid 1. It's implementation-defined. And systemd handles orphans in a different way. See https://stackoverflow.com/a/40424702

@N-R-K
Copy link
Copy Markdown
Contributor

N-R-K commented Jan 10, 2025

We can waitpid(-1, ...) in a separated thread.

I thought about doing it via SIGCHLD handler. But in either case, that's probably not an option since we'd also need to consider libmpv, I think.

One other thing I thought was to collect the detached pids into a dynamic array and attempt to reap them every once in a while with WNOHANG.

Not sure where the reaping code should go. Would require the dynamic array to be global which has the usual threading issues too. So, yeah...

@ruihe774
Copy link
Copy Markdown
Contributor Author

I have no motivation to make it upstream anymore. Closing it.

@ruihe774 ruihe774 closed this Mar 14, 2025
@llyyr
Copy link
Copy Markdown
Contributor

llyyr commented Mar 29, 2026

Could you revive this, or at least the first commit? It fixes #17669

@ruihe774
Copy link
Copy Markdown
Contributor Author

Could you revive this, or at least the first commit? It fixes #17669

Okay. I can reopen this.

@ruihe774 ruihe774 reopened this Mar 29, 2026
@N-R-K
Copy link
Copy Markdown
Contributor

N-R-K commented Mar 29, 2026

That spawning hundreds of processes per second is a common use case

The yt-thumbnailer which triggered the issue for @llyyr was limited (by default) to 4 parallel processes running at a time. Not thousands.

Wanting to run N (where N isn't absurdly large) processes in parallel isn't some niche use case nor "bad programming practice", it's fairly common way to do multiprocess parallelism.

@na-na-hi
Copy link
Copy Markdown
Contributor

If your audio buffer is small enough, even one subprocess is enough to cause crackle effects. You can test yourself by setting audio buffer to be as small as it can be on your system under normal circumstances, then a single subprocess call is enough to cause underruns

At this point, many factors outside mpv's control can cause underruns. mpv might be preempted out for a few milliseconds by process scheduling.

The yt-thumbnailer which triggered the issue for @llyyr was limited (by default) to 4 parallel processes running at a time. Not thousands.

4 running at a time does not mean 4 processes had ever spawned. The spawned processes die quickly and new ones spawn. So there are still lots of spawned processes.

it's fairly common way to do multiprocess parallelism.

Yes. And they spawn a few persistent processes at start and use IPC to communicate with them and they do not constantly spawn new processes. It has nothing to do with what you script was doing.

@N-R-K
Copy link
Copy Markdown
Contributor

N-R-K commented Mar 29, 2026

Yes. And they spawn a few persistent processes at start and use IPC to communicate with them and they do not constantly spawn new processes.

Just run the mpv build with ninja or w/e. It's obviously not going to spawn N "persistent gcc/clang" processes. It'll spawn one per compilation unit.

I'm not interested in derailing this PR further but this is simply not some sort of esoteric user-case/bad-programming practice like you're claiming it to be.

@na-na-hi
Copy link
Copy Markdown
Contributor

na-na-hi commented Mar 29, 2026

Just run the mpv build with ninja or w/e.

Does ninja provide an interactive UI and playback audio? Spawning lots processes working for build system does not mean it is universally acceptable.

llyyr added a commit to llyyr/mpv that referenced this pull request Mar 29, 2026
On master, this takes around 4000ms for me. While with mpv-player#14973
it completes in 10ms
@llyyr
Copy link
Copy Markdown
Contributor

llyyr commented Mar 29, 2026

This PR causes stack buffer overflow with https://github.com/mpv-player/mpv/blob/master/TOOLS/lua/command-test.lua

Test with: mpv --scripts=TOOLS/lua/command-test.lua --idle --force-window av://lavfi:testsrc

#0 0x000000c174ec in spawn_process_inner ../osdep/subprocess-posix.c:157
    #1 0x000000c17ec6 in child_main ../osdep/subprocess-posix.c:117
    #2 0x7ff7365209a3 in __GI___clone (/lib64/libc.so.6+0x1209a3) (BuildId: 1877ff923c3d7eea4bc6aacc6c37c1bd67bb166d)

Address 0x7bf6ddf30468 is located in stack of thread T61 at offset 40 in frame
    #0 0x000000c17caf in child_main ../osdep/subprocess-posix.c:106

  This frame has 1 object(s):
    [32, 40) 'child_stack' (line 111) <== Memory access at offset 40 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
Thread T61 created by T0 here:
    #0 0x7ff73a722e2e in pthread_create (/lib64/libasan.so.8+0x122e2e) (BuildId: 79d1a160a6e20cb67a17f4791acae429421faac5)
    #1 0x00000080ab1a in add_thread ../misc/thread_pool.c:146
    #2 0x00000080c88b in thread_pool_add ../misc/thread_pool.c:198
    #3 0x00000090d608 in run_command ../player/command.c:5645
    #4 0x0000007fc711 in mp_dispatch_queue_process ../misc/dispatch.c:300
    #5 0x00000095fccf in mp_wait_events ../player/playloop.c:64
    #6 0x00000096f822 in run_playloop ../player/playloop.c:1313
    #7 0x00000094389a in play_current_file ../player/loadfile.c:1952
    #8 0x000000948ec7 in mp_play_files ../player/loadfile.c:2144
    #9 0x00000094d634 in mpv_main ../player/main.c:458
    #10 0x7ff73642b2fa in __libc_start_call_main (/lib64/libc.so.6+0x2b2fa) (BuildId: 1877ff923c3d7eea4bc6aacc6c37c1bd67bb166d)
    #11 0x7fff70b6a1e3  (<unknown module>)
    #12 0x4f543d7374706971  (<unknown module>)

SUMMARY: AddressSanitizer: stack-buffer-overflow ../osdep/subprocess-posix.c:157 in spawn_process_inner

Though I'm not going to tell you to fix it, because it seems like this PR might get blocked to death anyway.

@ruihe774
Copy link
Copy Markdown
Contributor Author

This PR causes stack buffer overflow with https://github.com/mpv-player/mpv/blob/master/TOOLS/lua/command-test.lua

Test with: mpv --scripts=TOOLS/lua/command-test.lua --idle --force-window av://lavfi:testsrc

#0 0x000000c174ec in spawn_process_inner ../osdep/subprocess-posix.c:157
    #1 0x000000c17ec6 in child_main ../osdep/subprocess-posix.c:117
    #2 0x7ff7365209a3 in __GI___clone (/lib64/libc.so.6+0x1209a3) (BuildId: 1877ff923c3d7eea4bc6aacc6c37c1bd67bb166d)

Address 0x7bf6ddf30468 is located in stack of thread T61 at offset 40 in frame
    #0 0x000000c17caf in child_main ../osdep/subprocess-posix.c:106

  This frame has 1 object(s):
    [32, 40) 'child_stack' (line 111) <== Memory access at offset 40 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
Thread T61 created by T0 here:
    #0 0x7ff73a722e2e in pthread_create (/lib64/libasan.so.8+0x122e2e) (BuildId: 79d1a160a6e20cb67a17f4791acae429421faac5)
    #1 0x00000080ab1a in add_thread ../misc/thread_pool.c:146
    #2 0x00000080c88b in thread_pool_add ../misc/thread_pool.c:198
    #3 0x00000090d608 in run_command ../player/command.c:5645
    #4 0x0000007fc711 in mp_dispatch_queue_process ../misc/dispatch.c:300
    #5 0x00000095fccf in mp_wait_events ../player/playloop.c:64
    #6 0x00000096f822 in run_playloop ../player/playloop.c:1313
    #7 0x00000094389a in play_current_file ../player/loadfile.c:1952
    #8 0x000000948ec7 in mp_play_files ../player/loadfile.c:2144
    #9 0x00000094d634 in mpv_main ../player/main.c:458
    #10 0x7ff73642b2fa in __libc_start_call_main (/lib64/libc.so.6+0x2b2fa) (BuildId: 1877ff923c3d7eea4bc6aacc6c37c1bd67bb166d)
    #11 0x7fff70b6a1e3  (<unknown module>)
    #12 0x4f543d7374706971  (<unknown module>)

SUMMARY: AddressSanitizer: stack-buffer-overflow ../osdep/subprocess-posix.c:157 in spawn_process_inner

Though I'm not going to tell you to fix it, because it seems like this PR might get blocked to death anyway.

It's the AddressSanitizer. I believe it's a false positive.

@sfan5 sfan5 removed the priority:on-ice may be revisited later label Mar 29, 2026
@kasper93
Copy link
Copy Markdown
Member

I don't have opinion on this, would have to look into what are the issues and what can we do about them. I think I asked @llyyr is to update command_test.lua to add kind of benchmark there, so we can evaluate possible solutions on objective basis and across the platforms, without needing some audio cracking checks. I asked @llyyr on IRC, but if anyone want's to pick this up, it's fine. I just think we should have some tangible numbers to compare against. And agree how much subprocess spawning is fine for mpv use.

For the first question, it is worth pointing out spawning lots of processes is not necessary for a thumbnailer. The thumbnailer can implement everything by calling native functions, so creating process is not needed. Also thumbfast can do thumbnailing without spawning lots of processes. I think spawning lots of processes is a bad programming practice and there is no reason for mpv to encourage it.

I agree that, we shouldn't spam processes. If we need some separate native process to process thumbnails, it should be spawned once and asked using IPC to do the work. Of course this all is moving the bar higher and is more complex design, but we can see that processes spam is simply not working either. It's not acceptable to stall main playback process for UI related candy. Regardless if it affects audio playback only for "some" users. We identified this limitation among few people testing PR, imagine what will happen if this goes mainline.

llyyr added a commit to llyyr/mpv that referenced this pull request Mar 29, 2026
On master, this takes around 4000ms for me. While with mpv-player#14973
it completes in 10ms
llyyr added a commit to llyyr/mpv that referenced this pull request Mar 29, 2026
On master, this takes around 4000ms for me. While with mpv-player#14973
it completes in 10ms
llyyr added a commit to llyyr/mpv that referenced this pull request Mar 29, 2026
On master, this takes around 4000ms for me. While with mpv-player#14973
it completes in 10ms
Copy link
Copy Markdown
Contributor

@N-R-K N-R-K left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we isolate/reduce the ifdefs?

Also, please update the relavant part with proper attributes to silence the FP ASan warning.

Comment thread osdep/subprocess-posix.c Outdated
Comment thread osdep/subprocess-posix.c Outdated
Comment thread osdep/subprocess-posix.c
Comment thread osdep/subprocess-posix.c Outdated
Comment thread osdep/subprocess-posix.c Outdated
@ruihe774 ruihe774 requested a review from N-R-K April 9, 2026 18:23
@sfan5 sfan5 self-requested a review April 9, 2026 19:05
Copy link
Copy Markdown
Contributor

@N-R-K N-R-K left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using #17670 for testing, this is what I get with this PR (playback works fine, no freezing):

[command_test] done rapid subprocess dispatch in 113.893ms

And this is what I get without it (playback has freezes):

[command_test] done rapid subprocess dispatch in 1179.846ms

Looks good overall. Two small changes needed, commented below:

Comment thread osdep/subprocess-posix.c Outdated
Comment thread osdep/subprocess-posix.c
@N-R-K
Copy link
Copy Markdown
Contributor

N-R-K commented Apr 10, 2026

Oh, and also, please rebase on top of current master. Thanks.

@ruihe774
Copy link
Copy Markdown
Contributor Author

Oh, and also, please rebase on top of current master. Thanks.

✔️

Comment thread osdep/subprocess-posix.c Outdated
Comment thread osdep/subprocess-posix.c Outdated
Comment thread osdep/subprocess-posix.c Outdated
Comment thread osdep/subprocess-posix.c Outdated
Comment thread osdep/subprocess-posix.c Outdated
Comment thread osdep/subprocess-posix.c Outdated
Comment thread osdep/subprocess-posix.c Outdated
Comment thread osdep/subprocess-posix.c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants