Skip to content

feat: add configurable scheduling policy#119

Open
Alise-svg wants to merge 2 commits into
sgl-project:mainfrom
Alise-svg:feature/configurable-scheduling-policy
Open

feat: add configurable scheduling policy#119
Alise-svg wants to merge 2 commits into
sgl-project:mainfrom
Alise-svg:feature/configurable-scheduling-policy

Conversation

@Alise-svg
Copy link
Copy Markdown

Add --schedule-policy command line argument to allow users to choose
between different scheduling strategies for batch formation.

Supported policies:

  • prefill_first (default): Prioritizes prefill requests, reducing
    Time To First Token (TTFT) for online serving scenarios where
    users wait for the first token.
  • decode_first: Prioritizes decode requests, improving throughput
    for offline batch inference where maximizing token generation
    rate is more important than latency.

Usage:
python -m minisgl --model "Qwen/Qwen3-0.6B" --schedule-policy decode_first

@DarkSharpness DarkSharpness added the enhancement New feature or request label May 10, 2026
self.prefill_manager.schedule_next_batch(self.prefill_budget)
or self.decode_manager.schedule_next_batch()
)
if self.schedule_policy == "decode_first":
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A decode first policy should be:

  1. Form a decode batch first.
  2. Try to schedule a prefill batch with the remaining token budget (prefill_budget - decode_tokens).
    This is actually mix prefill-decode style batching.

Alise-svg added 2 commits May 12, 2026 22:45
  - Add --schedule-policy command line argument
  - Support 'prefill_first' (default) and 'decode_first' policies
  - prefill_first reduces TTFT for online serving
  - decode_first improves throughput for offline inference
@Alise-svg Alise-svg force-pushed the feature/configurable-scheduling-policy branch from 20dbc44 to 3d221e8 Compare May 12, 2026 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants