dstackai
diff --git a/‎contributing/AUTOSCALING.md‎
Lines changed: 10 additions & 7 deletions b/‎contributing/AUTOSCALING.md‎
Lines changed: 10 additions & 7 deletions
diff --git a/‎contributing/RUNS-AND-JOBS.md‎
Lines changed: 45 additions & 42 deletions b/‎contributing/RUNS-AND-JOBS.md‎
Lines changed: 45 additions & 42 deletions
diff --git a/‎docs/docs/guides/server-deployment.md‎
Lines changed: 27 additions & 18 deletions b/‎docs/docs/guides/server-deployment.md‎
Lines changed: 27 additions & 18 deletions
diff --git a/‎docs/docs/reference/environment-variables.md‎
Lines changed: 0 additions & 1 deletion b/‎docs/docs/reference/environment-variables.md‎
Lines changed: 0 additions & 1 deletion
diff --git a/‎src/dstack/_internal/cli/commands/export.py‎
Lines changed: 1 addition & 2 deletions b/‎src/dstack/_internal/cli/commands/export.py‎
Lines changed: 1 addition & 2 deletions
diff --git a/‎src/dstack/_internal/cli/commands/import_.py‎
Lines changed: 1 addition & 2 deletions b/‎src/dstack/_internal/cli/commands/import_.py‎
Lines changed: 1 addition & 2 deletions
diff --git a/‎src/dstack/_internal/cli/commands/init.py‎
Lines changed: 16 additions & 5 deletions b/‎src/dstack/_internal/cli/commands/init.py‎
Lines changed: 16 additions & 5 deletions
diff --git a/‎src/dstack/_internal/cli/commands/metrics.py‎
Lines changed: 3 additions & 3 deletions b/‎src/dstack/_internal/cli/commands/metrics.py‎
Lines changed: 3 additions & 3 deletions
@@ -4,13 +4,16 @@
 
 - STEP 1: `dstack-gateway` parses nginx `access.log` to collect per-second statistics about requests to the service and request times.
 - STEP 2: `dstack-gateway` aggregates statistics over a 1-minute window.
-- STEP 3: The dstack server pulls all service statistics in the `process_gateways` background task.
-- STEP 4: The `process_runs` background task passes statistics and current replicas to the autoscaler.
-- STEP 5: The autoscaler (configured via the `dstack.yml` file) returns the replica change as an int.
-- STEP 6: `process_runs` calls `scale_run_replicas` to add or remove replicas.
-- STEP 7: `scale_run_replicas` terminates or starts replicas.
-    - `SUBMITTED` and `PROVISIONING` replicas get terminated before `RUNNING`.
-    - Replicas are terminated by descending `replica_num` and launched by ascending `replica_num`.
+- STEP 3: The server keeps gateway connections alive in the scheduled `process_gateways_connections` task and continuously collects stats from active gateways. This is separate from `GatewayPipeline`, which handles gateway provisioning and deletion.
+- STEP 4: When `RunPipeline` processes a service run, it loads the latest collected gateway stats for that service.
+- STEP 5: The autoscaler (configured via `dstack.yml`) computes the desired replica count for each replica group.
+- STEP 6: `RunPipeline` applies that desired state.
+    - For scale-up, it creates new `SUBMITTED` jobs. `JobSubmittedPipeline` then assigns existing capacity or provisions new capacity for them.
+    - For scale-down, it marks the least-important active replicas as `TERMINATING` with `SCALED_DOWN`. `JobTerminatingPipeline` unregisters and cleans them up.
+- STEP 7: If the service is in rolling deployment, `RunPipeline` handles that in the same active-run processing path.
+    - It allows only a limited surge of replacement replicas.
+    - It delays teardown of old replicas until replacement capacity is available.
+    - It also cleans up replicas that belong to replica groups removed from the configuration.
 
 ## RPSAutoscaler
 
 
@@ -17,31 +17,38 @@ A run can spawn one or multiple jobs, depending on the configuration. A task tha
 
 ## Run's Lifecycle
 
-- STEP 1: The user submits the run. `services.runs.submit_run` creates jobs with status `SUBMITTED`. Now the run has status `SUBMITTED`.
-- STEP 2: `background.tasks.process_runs` periodically pulls unfinished runs and processes them:
-	- If any job is `RUNNING`, the run becomes `RUNNING`.
-	- If any job is `PROVISIONING` or `PULLING`, the run becomes `PROVISIONING`.
-	- If any job fails and cannot be retried, the run becomes `TERMINATING`, and after processing, `FAILED`.
-	- If all jobs are `DONE`, the run becomes `TERMINATING`, and after processing, `DONE`.
-	- If any job fails, can be retried, and there is any other active job, the failed job will be resubmitted in-place.
-	- If any jobs in a replica fail and can be retried and there is other active replicas, the jobs of the failed replica are resubmitted in-place (without stopping other replicas). But if some jobs in a replica fail, then all the jobs in a replica are terminated and resubmitted. This include multi-node tasks that represent one replica with multiple jobs.
-	- If all jobs fail and can be resubmitted, the run becomes `PENDING`.
-- STEP 3: If the run is `TERMINATING`, the server makes all jobs `TERMINATING`. `background.tasks.process_runs` sets their status to `TERMINATING`, assigns `JobTerminationReason`, and sends a graceful stop command to `dstack-runner`. `process_terminating_jobs` then ensures that jobs are terminated assigns a finished status.
-- STEP 4: Once all jobs are finished, the run becomes `TERMINATED`, `DONE`, or `FAILED` based on `RunTerminationReason`.
-- STEP 0: If the run is `PENDING`, `background.tasks.process_runs` will resubmit jobs. The run becomes `SUBMITTED` again.
-
-> Use `switch_run_status()` for all status transitions. Do not set `RunModel.status` directly.
-
-> No one must assign the finished status to the run, except `services.runs.process_terminating_run`. To terminate the run, assign `TERMINATING` status and `RunTerminationReason`.
+- STEP 1: The user submits the run. `services.runs.submit_run` creates jobs with status `SUBMITTED`. The run starts in `SUBMITTED`.
+- STEP 2: `RunPipeline` continuously processes unfinished runs.
+  - For active runs, it derives the run status from the latest job states in priority order:
+    1. If any non-retryable failure is present, the run becomes `TERMINATING` with the relevant `RunTerminationReason`.
+    2. If `stop_criteria == MASTER_DONE` and the master job is done, the run becomes `TERMINATING` with `ALL_JOBS_DONE`.
+    3. Otherwise, if any job is `RUNNING`, the run becomes `RUNNING`.
+    4. Otherwise, if any job is `PROVISIONING` or `PULLING`, the run becomes `PROVISIONING`.
+    5. Otherwise, if jobs are still waiting for placement or provisioning, the run stays `SUBMITTED`.
+    6. Otherwise, if all contributing jobs are `DONE`, the run becomes `TERMINATING` with `ALL_JOBS_DONE`.
+    7. Otherwise, if no active replicas remain and the run should be retried, the run becomes `PENDING`.
+  - Retryable replica failures are handled before the final transition is applied:
+    - If a replica fails with a retryable reason while other replicas are still active, `RunPipeline` creates a new `SUBMITTED` submission for that replica and terminates the old jobs in that replica.
+    - If all remaining work is retryable, the run ends up in `PENDING`.
+- STEP 3: If the run is `PENDING`, `RunPipeline` processes it in the pending phase.
+  - For retrying runs, it waits for an exponential backoff before resubmitting.
+  - For scheduled runs, it waits until `next_triggered_at`.
+  - For scaled-to-zero services, it can keep the run in `PENDING` until autoscaling wants replicas again.
+  - Once the run is ready to continue, `RunPipeline` creates new `SUBMITTED` jobs and moves the run back to `SUBMITTED`.
+- STEP 4: If the run is `TERMINATING`, `RunPipeline` marks active jobs as `TERMINATING` and assigns the corresponding `JobTerminationReason`.
+- STEP 5: Once all jobs are finished, the terminating phase of `RunPipeline` either:
+  - assigns the final run status (`TERMINATED`, `DONE`, or `FAILED`), or
+  - for scheduled runs that were not stopped or aborted by the user, returns the run to `PENDING` and computes a new `next_triggered_at`.
 
 ### Services
 
-Services' lifecycle has some modifications:
+Services' run lifecycle has some modifications:
 
-- During STEP 1, the service is registered on the gateway. If the gateway is not accessible or the domain name is taken, the run submission fails.
-- During STEP 2, downscaled jobs are ignored.
-- During STEP 4, the service is unregistered on the gateway.
-- During STEP 0, the service can stay in `PENDING` status if it was downscaled to zero (WIP).
+- During STEP 1, the service itself is registered on the gateway or the in-server proxy. If the gateway is not accessible or the domain name is taken, submission fails.
+- During STEP 2, active run processing also computes desired replica counts from gateway stats and handles scale-up, scale-down, rolling deployment, and cleanup of removed replica groups.
+- During STEP 2, jobs already marked `SCALED_DOWN` do not contribute to the run status.
+- During STEP 3, a service can stay in `PENDING` when autoscaling currently wants zero replicas.
+- During STEP 5, the terminating phase of `RunPipeline` unregisters the service from the gateway.
 
 ### When can the job be retried?
 
@@ -54,29 +61,25 @@ Services' lifecycle has some modifications:
 ## Job's Lifecycle
 
 - STEP 1: A newly submitted job has status `SUBMITTED`. It is not assigned to any instance yet.
-- STEP 2: `background.tasks.process_submitted_jobs` tries to assign an existing instance or provision a new one.
-	- On success, the job becomes `PROVISIONING`.
-	- On failure, the job becomes `TERMINATING`, and after processing, `FAILED` because of `FAILED_TO_START_DUE_TO_NO_CAPACITY`.
-- STEP 3: `background.tasks.process_running_jobs` periodically pulls unfinished jobs and processes them.
-	- While `dstack-shim`/`dstack-runner` is not responding, the job stays `PROVISIONING`.
-	- Once `dstack-shim` (for VM-featured backends) becomes available, it submits the docker image name, and the job becomes `PULLING`.
-	- Once `dstack-runner` inside a docker container becomes available, it submits the code and the job spec, and the job becomes `RUNNING`.
-	- If `dstack-shim` or `dstack-runner` don't respond for a long time or fail to respond after successful connection and multiple retries, the job becomes `TERMINATING`, and after processing, `FAILED`.
-- STEP 4: `background.tasks.process_running_jobs` processes `RUNNING` jobs, pulling job logs, runner logs, and job status.
-	- If the pulled status is `DONE`, the job becomes `TERMINATING`, and after processing, `DONE`.
-	- Otherwise, the job becomes `TERMINATING`, and after processing, `FAILED`.
-- STEP 5: `background.tasks.process_terminating_jobs` processes `TERMINATING` jobs.
-	- If the job has `remove_at` in the future, nothing happens. This is to give the job some time for a graceful stop.
-	- Once `remove_at` is in the past, it stops the container via `dstack-shim`, detaches instance volumes, and releases the instance. The job becomes `TERMINATED`, `DONE`, `FAILED`, or `ABORTED` based on `JobTerminationReason`.
-	- If some volumes fail to detach, it keeps the job `TERMINATING` and checks volumes attachment status.
-
-> Use `switch_job_status()` for all status transitions. Do not set `JobModel.status` directly.
-
-> No one must assign the finished status to the job, except `services.jobs.process_terminating_job`. To terminate the job, assign `TERMINATING` status and `JobTerminationReason`.
+- STEP 2: `JobSubmittedPipeline` tries to assign an existing instance or provision new capacity.
+  - On success, the job becomes `PROVISIONING`.
+  - On failure, the job becomes `TERMINATING`. `JobTerminatingPipeline` later assigns the final failed status.
+- STEP 3: `JobRunningPipeline` processes `PROVISIONING`, `PULLING`, and `RUNNING` jobs.
+  - While `dstack-shim` / `dstack-runner` is not responding, the job stays `PROVISIONING`.
+  - Once `dstack-shim` (for VM-featured backends) becomes available, the pipeline submits the image and the job becomes `PULLING`.
+  - Once `dstack-runner` inside the container becomes available, the pipeline uploads the code and job spec, and the job becomes `RUNNING`.
+  - While the job is `RUNNING`, the pipeline keeps collecting logs and runner status.
+  - If startup, runner communication, or replica registration fails, the job becomes `TERMINATING`.
+- STEP 4: Once the job is actually ready, `JobRunningPipeline` initializes probes.
+- STEP 5: `JobTerminatingPipeline` processes `TERMINATING` jobs.
+  - If the job has `remove_at` in the future, it waits. This gives the job time for a graceful stop.
+  - Once `remove_at` is in the past, it stops the container, detaches volumes, unregisters service replicas if needed, and releases the instance assignment.
+  - If some volumes are not detached yet, the job stays `TERMINATING` and is retried.
+  - When cleanup is complete, the job becomes `TERMINATED`, `DONE`, `FAILED`, or `ABORTED` based on `JobTerminationReason`.
 
 ### Services' Jobs
 
 Services' jobs lifecycle has some modifications:
 
-- During STEP 3, once the job becomes `RUNNING`, it is registered on the gateway as a replica. If the gateway is not accessible, the job fails.
-- During STEP 5, the job is unregistered on the gateway (WIP).
+- During STEP 3, once the primary job of a replica is `RUNNING` and ready to receive traffic, `JobRunningPipeline` registers that replica on the gateway. If the gateway is not accessible, the job fails with a gateway-related termination reason.
+- During STEP 5, `JobTerminatingPipeline` unregisters the replica from receiving requests before the job is fully cleaned up.
@@ -135,6 +135,11 @@ To store the server state in Postgres, set the `DSTACK_DATABASE_URL` environment
 $ DSTACK_DATABASE_URL=postgresql+asyncpg://user:password@db-host:5432/dstack dstack server
 ```
 
+The minimum requirements for the DB instance are 2 CPU, 2GB of RAM, and at least 50 `max_connections` per server replica
+or a configured connection pooler to handle that many connections.
+If you're using a smaller DB instance, you may need to set lower `DSTACK_DB_POOL_SIZE` and `DSTACK_DB_MAX_OVERFLOW`, e.g.
+`DSTACK_DB_POOL_SIZE=10` and `DSTACK_DB_MAX_OVERFLOW=0`.
+
 ??? info "Migrate from SQLite to PostgreSQL"
     You can migrate the existing state from SQLite to PostgreSQL using `pgloader`:
 
@@ -349,6 +354,22 @@ The bucket must be created beforehand. `dstack` won't try to create it.
     storage.objects.update
     ```
 
+## SSH proxy
+
+[`dstack-sshproxy`](https://github.com/dstackai/sshproxy) is an optional component that provides direct SSH access to workloads.
+
+Without SSH proxy, in order to connect to a job via SSH or use an IDE URL, the `dstack attach` CLI command must be used, which configures user's SSH client in a backend-specific way for each job.
+
+When SSH proxy is deployed, there is one well-known entry point – a proxy address – for all `dstack` jobs, which can be used for SSH access without any additional steps on the user's side (such as installing `dstack` and executing `dstack attach` each time). All the user has to do is to upload their public key to the `dstack` server once – there is a dedicated “SSH keys” tab on the user's page of the control plane UI.
+
+
+To deploy SSH proxy, see `dstack-sshproxy` [Deployment guide](https://github.com/dstackai/sshproxy/blob/main/DEPLOYMENT.md).
+
+To enable SSH proxy integration on the `dstack` server side, set the following environment variables:
+
+* `DSTACK_SSHPROXY_API_TOKEN` – a token used to authenticate SSH proxy API requests, must be the same value as when deploying `dstack-sshproxy`.
+* `DSTACK_SERVER_SSHPROXY_ADDRESS` – an address where SSH proxy is available to `dstack` users, in the `HOSTNAME[:PORT]` form, where `HOSTNAME` is a domain name or an IP address, and `PORT`, if not specified, defaults to 22.
+
 ## Encryption
 
 By default, `dstack` stores data in plaintext. To enforce encryption, you 
@@ -456,26 +477,14 @@ Backward compatibility is maintained based on these principles:
 
 ## Server limits
 
-A single `dstack` server replica can support:
-
-* Up to 150 active runs.
-* Up to 150 active jobs.
-* Up to 150 active instances.
+A single `dstack` server replica can support at least
 
-Having more active resources will work but can affect server performance.
-If you hit these limits, consider using Postgres with multiple server replicas.
-You can also increase processing rates of a replica by setting the `DSTACK_SERVER_BACKGROUND_PROCESSING_FACTOR` environment variable.
-You should also increase `DSTACK_DB_POOL_SIZE` and `DSTACK_DB_MAX_OVERFLOW` proportionally.
-For example, to increase processing rates 4 times, set:
-
-```
-export DSTACK_SERVER_BACKGROUND_PROCESSING_FACTOR=4
-export DSTACK_DB_POOL_SIZE=80
-export DSTACK_DB_MAX_OVERFLOW=80
-```
+* 1000 active instances
+* 1000 active runs
+* 1000 active jobs.
 
-You have to ensure your Postgres installation supports that many connections by
-configuring [`max_connections`](https://www.postgresql.org/docs/current/runtime-config-connection.html#GUC-MAX-CONNECTIONS) and/or using connection pooler.
+If you hit server performance limits, try scale up server instances and/or configure Postgres with multiple server replicas.
+Also, please [submit a GitHub issue](https://github.com/dstackai/dstack/issues) describing your setup – we strive to improve `dstack` scalability and efficiency.
 
 ## Server upgrades
 
 
@@ -130,7 +130,6 @@ For more details on the options below, refer to the [server deployment](../guide
 - `DSTACK_SERVER_GCS_BUCKET`{ #DSTACK_SERVER_GCS_BUCKET } - The bucket that repo diffs will be uploaded to if set. If unset, diffs are uploaded to the database.
 - `DSTACK_DB_POOL_SIZE`{ #DSTACK_DB_POOL_SIZE } - The client DB connections pool size. Defaults to `20`,
 - `DSTACK_DB_MAX_OVERFLOW`{ #DSTACK_DB_MAX_OVERFLOW } - The client DB connections pool allowed overflow. Defaults to `20`.
-- `DSTACK_SERVER_BACKGROUND_PROCESSING_FACTOR`{ #DSTACK_SERVER_BACKGROUND_PROCESSING_FACTOR } - The number of background jobs for processing server resources. Increase if you need to process more resources per server replica quickly. Defaults to `1`.
 - `DSTACK_SERVER_BACKGROUND_PROCESSING_DISABLED`{ #DSTACK_SERVER_BACKGROUND_PROCESSING_DISABLED } - Disables background processing if set to any value. Useful to run only web frontend and API server.
 - `DSTACK_SERVER_MAX_PROBES_PER_JOB`{ #DSTACK_SERVER_MAX_PROBES_PER_JOB } - Maximum number of probes allowed in a run configuration. Validated at apply time.
 - `DSTACK_SERVER_MAX_PROBE_TIMEOUT`{ #DSTACK_SERVER_MAX_PROBE_TIMEOUT } - Maximum allowed timeout for a probe. Validated at apply time.
 
@@ -1,5 +1,4 @@
 import argparse
-from typing import Any, Union
 
 from rich.table import Table
 
@@ -148,7 +147,7 @@ def print_exports_table(exports: list[Export]):
         )
         importers = ", ".join([i.project_name for i in export.imports]) if export.imports else "-"
 
-        row: dict[Union[str, int], Any] = {
+        row = {
             "NAME": export.name,
             "FLEETS": fleets,
             "IMPORTERS": importers,
 
@@ -1,5 +1,4 @@
 import argparse
-from typing import Any, Union
 
 from rich.table import Table
 
@@ -44,7 +43,7 @@ def print_imports_table(imports: list[Import]):
             else "-"
         )
 
-        row: dict[Union[str, int], Any] = {
+        row = {
             "NAME": name,
             "FLEETS": fleets,
         }
 
@@ -11,6 +11,8 @@
     register_init_repo_args,
 )
 from dstack._internal.cli.utils.common import console
+from dstack._internal.core.errors import CLIError, RepoInvalidCredentialsError
+from dstack._internal.core.services.repos import get_repo_creds_and_default_branch
 from dstack.api import Client
 
 
@@ -55,10 +57,19 @@ def _command(self, args: argparse.Namespace):
             repo = get_repo_from_dir(repo_path)
         else:
             assert False, "should not reach here"
+
+        try:
+            repo_creds, _ = get_repo_creds_and_default_branch(
+                repo_url=repo.repo_url,
+                identity_file=args.git_identity_file,
+                oauth_token=args.gh_token,
+            )
+        except RepoInvalidCredentialsError:
+            raise CLIError(
+                "No valid default Git credentials found. Pass valid `--token` or `--git-identity`."
+            )
+
         api = Client.from_config(project_name=args.project)
-        api.repos.init(
-            repo=repo,
-            git_identity_file=args.git_identity_file,
-            oauth_token=args.gh_token,
-        )
+        api.repos.init(repo=repo, creds=repo_creds)
+
         console.print("OK")
@@ -1,6 +1,6 @@
 import argparse
 import time
-from typing import Any, Dict, List, Optional, Union
+from typing import Any, List, Optional
 
 from rich.live import Live
 from rich.table import Table
@@ -79,7 +79,7 @@ def _get_metrics_table(run: Run, metrics: List[JobMetrics]) -> Table:
     table.add_column("MEMORY")
     table.add_column("GPU")
 
-    run_row: Dict[Union[str, int], Any] = {"NAME": run.name, "STATUS": run.status.value}
+    run_row = {"NAME": run.name, "STATUS": run.status.value}
     if len(run._run.jobs) != 1:
         add_row_from_dict(table, run_row)
 
@@ -117,7 +117,7 @@ def _get_metrics_table(run: Run, metrics: List[JobMetrics]) -> Table:
                         )
                     gpu_metrics += f" util={gpu_util_percent}%"
 
-        job_row: Dict[Union[str, int], Any] = {
+        job_row = {
             "NAME": f"  replica={job.job_spec.replica_num} job={job.job_spec.job_num}",
             "STATUS": job.job_submissions[-1].status.value,
             "CPU": cpu_usage or "-",
Original file line number	Diff line number	Diff line change
`@@ -1,5 +1,4 @@`
`1`	`1`	`import argparse`
`2`		`-from typing import Any, Union`
`3`	`2`
`4`	`3`	`from rich.table import Table`
`5`	`4`
`@@ -44,7 +43,7 @@ def print_imports_table(imports: list[Import]):`
`44`	`43`	`else "-"`
`45`	`44`	`)`
`46`	`45`
`47`		`- row: dict[Union[str, int], Any] = {`
	`46`	`+ row = {`
`48`	`47`	`"NAME": name,`
`49`	`48`	`"FLEETS": fleets,`
`50`	`49`	`}`