dstack features auto-scaling for services published via the gateway. The general flow is:
- STEP 1:
dstack-gatewayparses nginxaccess.logto collect per-second statistics about requests to the service and request times. - STEP 2:
dstack-gatewayaggregates statistics over a 1-minute window. - STEP 3: The server keeps gateway connections alive in the scheduled
process_gateways_connectionstask and continuously collects stats from active gateways. This is separate fromGatewayPipeline, which handles gateway provisioning and deletion. - STEP 4: When
RunPipelineprocesses a service run, it loads the latest collected gateway stats for that service. - STEP 5: The autoscaler (configured via
dstack.yml) computes the desired replica count for each replica group. - STEP 6:
RunPipelineapplies that desired state.- For scale-up, it creates new
SUBMITTEDjobs.JobSubmittedPipelinethen assigns existing capacity or provisions new capacity for them. - For scale-down, it marks the least-important active replicas as
TERMINATINGwithSCALED_DOWN.JobTerminatingPipelineunregisters and cleans them up.
- For scale-up, it creates new
- STEP 7: If the service is in rolling deployment,
RunPipelinehandles that in the same active-run processing path.- It allows only a limited surge of replacement replicas.
- It delays teardown of old replicas until replacement capacity is available.
- It also cleans up replicas that belong to replica groups removed from the configuration.
RPSAutoscaler implements simple target tracking scaling. The target value represents requests per second per replica (in a 1-minute window).
scale_up_delay tells how much time has to pass since the last upscale or downscale event before the next upscaling. scale_down_delay tells how much time has to pass since the last upscale or downscale event before the next downscaling.