Skip to content

[Cherry-Pick][CI] Set --workers=1 to avoid intermittent timeout failures(#7846)#7848

Merged
EmmonsCurse merged 1 commit into
PaddlePaddle:release/2.6from
EmmonsCurse:cherry-pick/7846/release/2.6
May 19, 2026
Merged

[Cherry-Pick][CI] Set --workers=1 to avoid intermittent timeout failures(#7846)#7848
EmmonsCurse merged 1 commit into
PaddlePaddle:release/2.6from
EmmonsCurse:cherry-pick/7846/release/2.6

Conversation

@EmmonsCurse
Copy link
Copy Markdown
Collaborator

Cherry-pick of #7846 (authored by @EmmonsCurse) to release/2.6.

devPR:#7846


Motivation

Under the configuration:

  • --max-concurrency 5000
  • --max-waiting-time 1

the test intermittently fails due to insufficient successful responses:

  • concurrent requests: 1333
  • expected successful responses: >= 1024
  • actual successful responses: around 1011
  • timeout failures (500): around 322

The issue is related to worker-level semaphore allocation:

self.semaphore = StatefulSemaphore((FD_SUPPORT_MAX_CONNECTIONS + workers - 1) // workers)

Since FD_SUPPORT_MAX_CONNECTIONS defaults to 1024:

  • workers = 1 → semaphore size = 1024
  • workers = 4 → semaphore size = 256 per worker

Although the total theoretical capacity remains unchanged, requests are not evenly distributed across Gunicorn workers in practice.

With workers = 4, some workers may receive significantly more requests than others, causing local semaphore exhaustion and triggering timeout failures before requests can enter inference execution.

This issue is intermittent because it depends on:

  • OS-level socket accept scheduling
  • runtime request distribution across workers
  • inference latency fluctuations under GPU load
  • boundary-state concurrency conditions

To preserve existing test behavior and avoid introducing unintended variability, it is necessary to explicitly set --workers=1 in test configurations.

Modifications

  • Explicitly set --workers=1 in the related test configuration.
  • Avoided worker-level semaphore fragmentation caused by multi-worker request imbalance.
  • Improved concurrency stability and reduced intermittent timeout-related assertion failures.

Usage or Command

N/A

Accuracy Tests

N/A

Checklist

  • Add at least a tag in the PR title.
  • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
  • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@EmmonsCurse
Copy link
Copy Markdown
Collaborator Author

/skip-ci ci_iluvatar
/skip-ci ci_hpu
/skip-ci build_xpu
/skip-ci coverage
/skip-ci stable_test
/skip-ci pre_ce_test
/skip-ci logprob_test
/skip-ci gpu_4cards_test

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented May 19, 2026

Thanks for your contribution!

@EmmonsCurse EmmonsCurse merged commit ab3c5f4 into PaddlePaddle:release/2.6 May 19, 2026
35 of 37 checks passed
@EmmonsCurse EmmonsCurse deleted the cherry-pick/7846/release/2.6 branch May 19, 2026 03:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant