You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This guide provides a comprehensive reference for all load generator CLI options in AIPerf, including a compatibility matrix showing which options work together.
Request Scheduling Options
AIPerf determines how to schedule requests based on which CLI options you specify:
CLI Option
Use Case
Description
--request-rate
Rate-based load testing
Schedule requests at a target QPS with configurable arrival patterns
--concurrency (alone)
Saturation/throughput testing
Send requests as fast as possible within concurrency limits
--fixed-schedule
Trace replay
Replay requests at exact timestamps from dataset
--user-centric-rate
KV cache benchmarking
Per-user rate limiting with consistent turn gaps
Option Priority
When multiple options are specified, AIPerf uses this priority:
--fixed-schedule or mooncake_trace dataset → Timestamp-based scheduling
--user-centric-rate → Per-user turn gap scheduling
--request-rate → Rate-based scheduling with arrival patterns
--concurrency only → Burst mode (as fast as possible within limits)
Compatibility Matrix
Legend
✅ Compatible - Option works with this configuration
⚠️Conditional - Works with restrictions (see notes)
❌ Incompatible - Option conflicts or is ignored
🔧 Required - Option is required for this configuration
Scheduling Options
Option
--request-rate
--fixed-schedule
--user-centric-rate
Notes
--request-rate
✅
❌
❌
Conflicts with --user-centric-rate
--user-centric-rate
❌
❌
🔧
Requires --num-users
--fixed-schedule
❌
🔧
❌
Requires trace dataset with timestamps
--num-users
❌
❌
🔧
Required with --user-centric-rate; raises error otherwise
--request-rate-ramp-duration
✅
❌
❌
Raises error with --fixed-schedule or --user-centric-rate
Stop Conditions (at least one required)
Option
--request-rate
--fixed-schedule
--user-centric-rate
Notes
--request-count
✅
✅
✅
Mutually exclusive with --num-sessions
--num-sessions
✅
✅
✅
Mutually exclusive with --request-count
--benchmark-duration
✅
✅
✅
Enables --benchmark-grace-period
Arrival Pattern Options
Option
--request-rate
--fixed-schedule
--user-centric-rate
Notes
--arrival-pattern
✅
❌
❌
Conflicts with --user-centric-rate; values: constant, poisson, gamma
--arrival-smoothness
⚠️
❌
❌
Only with --arrival-pattern gamma
Arrival Pattern Values:
constant - Fixed inter-arrival times (1/rate)
poisson - Exponential inter-arrivals (default with --request-rate)
gamma - Tunable smoothness via --arrival-smoothness
concurrency_burst - As fast as possible within concurrency limits (auto-set when no rate specified)
Concurrency Options
Option
--request-rate
--fixed-schedule
--user-centric-rate
Notes
--concurrency
✅
✅
✅
Limits concurrent sessions with any scheduling option
--prefill-concurrency
⚠️
⚠️
⚠️
Requires --streaming; must be ≤ --concurrency
--concurrency-ramp-duration
✅
✅
✅
Works with any scheduling option
--prefill-concurrency-ramp-duration
⚠️
⚠️
⚠️
Requires --streaming; works with any scheduling option
Concurrency behavior by configuration:
With --request-rate: Concurrency acts as a ceiling; requests scheduled by rate are blocked if at limit
With --concurrency only (no rate options): Concurrency is the primary driver; sends as fast as possible within limit
With --fixed-schedule: Concurrency acts as a ceiling; requests fire at scheduled times but blocked if at limit
With --user-centric-rate: Concurrency acts as a ceiling; user turns fire based on turn_gap but blocked if at limit
Important: If --concurrency is not set, session concurrency limiting is disabled (unlimited). For --user-centric-rate mode, consider setting --concurrency to at least --num-users to ensure all users can have in-flight requests.
With --num-users 15 and --user-centric-rate 1.0, each user has 15 seconds between their turns.
For complete KV cache benchmarking, also configure shared system prompts and user context prompts. See the User-Centric Timing Tutorial for full configuration including --shared-system-prompt-length, --user-context-prompt-length, and other prompt options.
Common Validation Errors
Error
Cause
Solution
--user-centric-rate cannot be used together with --request-rate or --arrival-pattern
Conflicting options
Use only one scheduling option
--user-centric-rate requires --num-users to be set