Use work-stealing scheduler for parallel runs#1419
Closed
janedbal wants to merge 2 commits intoPHPCSStandards:4.xfrom
Closed
Use work-stealing scheduler for parallel runs#1419janedbal wants to merge 2 commits intoPHPCSStandards:4.xfrom
janedbal wants to merge 2 commits intoPHPCSStandards:4.xfrom
Conversation
The previous parallel implementation pre-divided files into N equal batches up front, one per worker. With heterogeneous file processing times this caused load imbalance — the worker with the slowest slice held everyone up while the others sat idle. Replace the static batching with a PHPStan-style work-stealing scheduler: - Master keeps an ordered queue of paths and a Unix socket pair per worker (stream_socket_pair). - Workers loop "send ready (with progress for last chunk) → receive next chunk → process". When the queue drains, master sends "done" and the worker writes its totals + cache entries to the temp file as before. - Chunk size is fixed at 20 (matching PHPStan's parallel.jobSize). - Per-file progress dots/percentages now appear during parallel runs; previously you got one dot per batch. On the project's own src/ tree, a 4-worker run goes from 8.12s to 2.36s (3.4×) on my machine. Existing testPhpcsParallel suite passes.
Member
|
@janedbal While I appreciate what you are trying to do, please keep in mind that this repo strictly does not accept any AI contributions (as per the contributing guide). |
The fixture for the bashunit E2E tests sets parallel=75 in the ruleset, which used to be harmless because the old static-batching parallel branch fetched files via $todo->current() — that returns the in-memory DummyFile populated with the piped content. The new work-stealing worker only receives paths and does new LocalFile($path, ...), which fails Common::isReadable for 'STDIN', marks the file ignored, and exits with 0. Match the existing runPHPCBF behavior and force parallel=1 when stdin is set; STDIN is a single file so parallelizing was meaningless anyway.
Author
|
@jrfnl Well, it is up to you, but this PR:
I believe the value it brings is significant. |
janedbal
commented
Apr 28, 2026
| // Same default as PHPStan's parallel scheduler — small enough that | ||
| // a fast worker keeps coming back for more, large enough to amortize | ||
| // the IPC round-trip per file. | ||
| $chunkSize = 20; |
Author
There was a problem hiding this comment.
The existing bash e2e tests only exercise --parallel=2 with 2 files - each worker gets exactly one chunk, so the work-stealing loop is never actually exercised.
If we would ever continue on this POC, this should be addressed.
12 tasks
Author
|
I think this idea is better - same results, much lower complexity. Closing. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The current
--parallelimplementation pre-divides files intoNequal batches up front, one per worker, then forks. With heterogeneous file processing times the worker that drew the slowest slice holds everyone up while the others sit idle.This PR swaps the static batching for a PHPStan-style work-stealing scheduler:
stream_socket_pairto each worker.send "ready" (with progress for last chunk) → receive next chunk → process. When the queue drains, master sendsdoneand the worker writes its totals + cache entries to the temp file as before.20, matching PHPStan'sparallel.jobSizedefault.processChildProcsstill merges totals and cache entries; only progress reporting moved into the newdispatchWork.Numbers
Private monorepo backend, ~22,400 PHP files, custom rulesets, 32-core machine,
--no-cache --parallel=24:Even at 24 workers on a 32-core box — plenty of CPU to absorb mild imbalance — the static-batching scheme leaves most workers idle once the unlucky ones hit slow files.
Test plan
vendor/bin/phpunit --filter testPhpcsParallel— 22/22 passvendor/bin/phpunit --filter ExitCode— 66/66 passvendor/bin/phpunit tests/— only failure is a pre-existing unrelated.phptfixtureendtoend.xml.distfixture setsparallel=75, which exposed that the new worker can't see the in-memoryDummyFileholding piped content; matched the existingrunPHPCBFbehavior of forcingparallel=1whenstdinis set)bin/phpcs --parallel=4 src/ --cache=...then re-run — second run is ~180 ms (cache merged correctly across workers)bin/phpcs src/Runner.php— passes the project's own coding standardCo-Authored-By: Claude Code