This document describes the design of the requests generator, which models a stream of user requests to a given endpoint over time.
Following the AsyncFlow philosophy, we accept a small set of input parameters to drive a “what-if” analysis in a pre-production environment. These inputs let you explore reliability and cost implications under different traffic scenarios.
-
Average Concurrent Users (
avg_active_users) Expected number of simultaneous active users (or sessions) interacting with the system.- Modeled as a random variable (
RVConfig). - Allowed distributions: Poisson or Normal.
- Modeled as a random variable (
-
Average Requests per Minute per User (
avg_request_per_minute_per_user) Average request rate per user, expressed in requests per minute.- Modeled as a random variable (
RVConfig). - Must use the Poisson distribution.
- Modeled as a random variable (
-
User Sampling Window (
user_sampling_window) Time interval (in seconds) over which active users are resampled.- Constrained between
MIN_USER_SAMPLING_WINDOWandMAX_USER_SAMPLING_WINDOW. - Defaults to
USER_SAMPLING_WINDOW.
- Constrained between
-
Random variables:
-
Concurrent users and requests per minute per user are independent random variables.
-
Each is configured via the
RVConfigmodel, which specifies:- mean (mandatory, must be numeric and positive),
- distribution (default: Poisson),
- variance (optional; defaults to
meanfor Normal and Log-Normal distributions).
-
-
Supported joint sampling cases:
- Poisson (users) × Poisson (requests)
- Normal (users) × Poisson (requests)
Other combinations are currently unsupported.
-
Variance handling:
- If the distribution is Normal or Log-Normal and
varianceis not provided, it is automatically set to themean.
- If the distribution is Normal or Log-Normal and
-
avg_request_per_minute_per_user:- Must be Poisson-distributed.
- Validation enforces this constraint.
-
avg_active_users:- Must be either Poisson or Normal.
- Validation enforces this constraint.
-
meaninRVConfig:- Must be a positive number (int or float).
- Automatically coerced to
float.
class RVConfig(BaseModel):
"""class to configure random variables"""
mean: float
distribution: Distribution = Distribution.POISSON
variance: float | None = None
@field_validator("mean", mode="before")
def ensure_mean_is_numeric_and_positive(
cls, # noqa: N805
v: float,
) -> float:
"""Ensure `mean` is numeric, then coerce to float."""
err_msg = "mean must be a number (int or float)"
if not isinstance(v, (float, int)):
raise ValueError(err_msg) # noqa: TRY004
return float(v)
@model_validator(mode="after") # type: ignore[arg-type]
def default_variance(cls, model: "RVConfig") -> "RVConfig": # noqa: N805
"""Set variance = mean when distribution require and variance is missing."""
needs_variance: set[Distribution] = {
Distribution.NORMAL,
Distribution.LOG_NORMAL,
}
if model.variance is None and model.distribution in needs_variance:
model.variance = model.mean
return model
class RqsGeneratorInput(BaseModel):
"""Define the expected variables for the simulation"""
id: str
type: SystemNodes = SystemNodes.GENERATOR
avg_active_users: RVConfig
avg_request_per_minute_per_user: RVConfig
user_sampling_window: int = Field(
default=TimeDefaults.USER_SAMPLING_WINDOW,
ge=TimeDefaults.MIN_USER_SAMPLING_WINDOW,
le=TimeDefaults.MAX_USER_SAMPLING_WINDOW,
description=(
"Sampling window in seconds "
f"({TimeDefaults.MIN_USER_SAMPLING_WINDOW}-"
f"{TimeDefaults.MAX_USER_SAMPLING_WINDOW})."
),
)
@field_validator("avg_request_per_minute_per_user", mode="after")
def ensure_avg_request_is_poisson(
cls, # noqa: N805
v: RVConfig,
) -> RVConfig:
"""
Force the distribution for the rqs generator to be poisson
at the moment we have a joint sampler just for the poisson-poisson
and gaussian-poisson case
"""
if v.distribution != Distribution.POISSON:
msg = "At the moment the variable avg request must be Poisson"
raise ValueError(msg)
return v
@field_validator("avg_active_users", mode="after")
def ensure_avg_user_is_poisson_or_gaussian(
cls, # noqa: N805
v: RVConfig,
) -> RVConfig:
"""
Force the distribution for the rqs generator to be poisson
at the moment we have a joint sampler just for the poisson-poisson
and gaussian-poisson case
"""
if v.distribution not in {Distribution.POISSON, Distribution.NORMAL}:
msg = "At the moment the variable active user must be Poisson or Gaussian"
raise ValueError(msg)
return vFrom the two random inputs we define the per-second aggregate rate
A Poisson process of rate
Define
so the CDF is
and the density
and by memorylessness every inter-arrival gap
To draw
- Sample
$U\sim\mathcal U(0,1)$ . - Solve
$U=1-e^{-\lambda,\Delta t}$ ;$\Rightarrow;\Delta t=-\ln(1-U)/\lambda$. - Equivalent compact form:
$\displaystyle \Delta t = -,\ln(U)/\lambda$ .
| Symbol | Meaning | Law |
|---|---|---|
| active users in current 1-minute window | Poisson | |
| requests per minute by user i | Poisson | |
| total requests in that minute | compound | |
| aggregate rate (requests / second) | compound |
The procedure here rely heavily on the independence of our random variables.
Given
By the law of total probability:
This is the Poisson–Poisson compound (Borel–Tanner) distribution.
Rather than invert the discrete CDF above, we exploit the conditional structure:
# Hierarchical sampler code snippet
now = 0.0 # virtual clock (s)
window_end = 0.0 # end of the current user window
Lambda = 0.0 # aggregate rate Λ (req/s)
while now < simulation_time:
# (Re)sample U at the start of each window
if now >= window_end:
window_end = now + float(sampling_window_s)
users = poisson_variable_generator(mean_concurrent_user, rng)
Lambda = users * mean_req_per_sec_per_user
# No users → fast-forward to next window
if Lambda <= 0.0:
now = window_end
continue
# Exponential gap from a protected uniform value
u_raw = max(uniform_variable_generator(rng), 1e-15)
delta_t = -math.log(1.0 - u_raw) / Lambda
# End simulation if the next event exceeds the horizon
if now + delta_t > simulation_time:
break
# If the gap crosses the window boundary, jump to it
if now + delta_t >= window_end:
now = window_end
continue
now += delta_t
yield delta_tBecause each conditional step matches the exact Poisson→Exponential law, this two-stage algorithm reproduces the same joint distribution as analytically inverting the compound CDF, but with minimal computation.
The validity of the hierarchical sampler relies on a structural property of the model:
where each
This result has two important consequences:
-
Deterministic conditional rate – Given
$U=u$ , the aggregate request arrivals constitute a homogeneous Poisson process with the deterministic rate$$ \Lambda = \frac{u,\lambda_r}{60}. $$
All inter-arrival gaps are therefore i.i.d. exponential with parameter
$\Lambda$ , allowing us to use the standard inverse–CDF formula for each gap. -
Layered uncertainty handling – The randomness associated with
$U$ is handled in an outer step (sampling$U$ once per window), while the inner step leverages the well-known Poisson→Exponential correspondence. This two-level construction reproduces exactly the joint distribution obtained by first drawing$\Lambda = N/60$ from the compound Poisson law and then drawing gaps conditional on$\Lambda$ .
If the total count could not be written as a sum of independent Poisson variables, the conditional distribution of
By the law of total probability, for any event set
Step 1 samples
If concurrent users follow a truncated Normal,
steps 2–3 remain unchanged; only step 1 draws
The sampling window length governs how often we re-sample
-
Independence assumption Assumes per-user streams and
$U$ are independent. Real traffic often exhibits user-behavior correlations (e.g., flash crowds). -
Exponential inter-arrival times Implies memorylessness; cannot capture self-throttling or long-range dependence found in real workloads.
-
No diurnal/trend component User count
$U$ is IID per window. To model seasonality or trends, you must vary$\lambda_u(t)$ externally. -
No burst-control or rate-limiting Does not simulate client-side throttling or server back-pressure. Any rate-limit logic must be added externally.
-
Gaussian truncation artifacts In the Gaussian–Poisson variant, truncating negatives to zero and rounding can under-estimate extreme user counts.
Key takeaway: By structuring the generator as