Skip to content

0.20.16

Latest

Choose a tag to compare

@peterschmidt85 peterschmidt85 released this 06 Apr 12:03
· 9 commits to master since this release
fc9afa9

Server

Performance

This release introduces a major overhaul of dstack server background processing. A single server
replica can now handle ~10x more resources, supporting at least 1000 active instances and runs. In
benchmarks, we observed 2x-10x faster processing (see #3551).

  • Provisioning 200 instances: 12 minutes -> 4 minutes.
  • Running a 200-node task: >25 minutes -> 4 minutes.
  • Terminating 50 instances: 60 seconds -> 10 seconds.

The performance gains come from a new, more efficient background processing architecture. Server
hardware requirements and memory consumption remain the same.

If you need to temporarily revert this behavior, set
DSTACK_FF_PIPELINE_PROCESSING_DISABLED=1 before starting the server.

Upgrade notes

Warning

This release includes significant internal changes to the dstack server. Test in a staging
environment before upgrading production whenever possible.

Warning

Rolling upgrades from 0.20.13 or older directly to 0.20.16 are not supported. Do not run
replicas on 0.20.13 (or older) and 0.20.16 at the same time. Upgrade to 0.20.15 first, or
scale server replicas down to 1 before upgrading.

SSH proxy

Servers can enforce proxy-only SSH access by combining SSH proxy with the new
DSTACK_SERVER_SSHPROXY_ENFORCED flag. When enabled, runs omit user-provided keys from authorized
lists and expect clients to connect via the proxy endpoint that run details expose. For more details, see the server deployment guide.

Note

SSH proxy is experimental, and behavior may change in future releases.

UI

SSH keys

User settings now include an SSH keys tab where you can upload OpenSSH public keys, see their fingerprints, and remove keys that no longer belong to you. Uploaded keys let you open SSH sessions without relying on the client key that dstack attach manages automatically, and duplicate keys are rejected with a clear error.

CLI

dstack attach

When SSH proxy is enabled on the server, dstack attach now routes through the proxy automatically and receives the proxy host, port, and upstream ID from run connection info. Servers can opt into proxy-only access by setting DSTACK_SERVER_SSHPROXY_ENFORCED, which stops embedding direct SSH keys in runs.

export DSTACK_SERVER_SSHPROXY_ENFORCED=1

Backends

RunPod

RunPod backends can now provision on-demand CPU offerings in secure cloud regions, so jobs that request gpu: 0 schedule successfully without tricking the scheduler. Disk size checks respect the per-offer limits RunPod publishes.

resources:
  gpu: 0
  cpu: 8
  memory: 32GB

Verda

Verda startup scripts and SSH keys are now generated per instance and removed reliably on teardown, preventing stale credentials and improving cleanup when a rollout provisions multiple machines.

Major bug-fixes

  • Improved Git-related CLI repo errors with actionable messages for missing credentials, detached HEAD state, and non-repository directories (#3730).

What's changed

  • [Internal] Don't reload server on cli package changes by @un-def in #3706
  • Fix SELinux denials and "Text file busy" on SSH fleet provisioning by @peterschmidt85 in #3712
  • Add support for user-provided SSH public keys by @un-def in #3688
  • Move stop_runner() to JobTerminating pipeline by @r4victor in #3714
  • Add web UI for user public keys by @un-def in #3713
  • [Landing] Update headings and descriptions for clarity in README, installation, and quickstart guides to amplify agentic orchestration (WIP) by @peterschmidt85 in #3710
  • Add pipelines optimizations by @r4victor in #3719
  • Reject user interaction in runner_ssh_tunnel by @un-def in #3716
  • Use sshproxy for CLI attach if enabled by @un-def in #3711
  • Enable pipelines by default by @r4victor in #3728
  • Do not wait in VerdaCompute.create_instance by @jvstme in #3723
  • Pass delete_permanently when deleting Verda instances by @peterschmidt85 in #3734
  • Fix pipelines not running on Python <= 3.10 by @r4victor in #3736
  • Tests: bump pytest-asyncio>=0.25.2 by @un-def in #3733
  • Fix docs Swagger UI rendering for REST API pages by @peterschmidt85 in #3729
  • Guard cached get_offers with an execution lock by @r4victor in #3738
  • Fix JobRunningPipeline not reclaiming stale jobs for terminating runs by @r4victor in #3741
  • runpod: support on-demand CPU offers and provisioning by @peterschmidt85 in #3726
  • Add JobMetricsPoint.job_id index by @r4victor in #3742
  • Fix SENTRY_TRACES_BACKGROUND_SAMPLE_RATE not respected by @r4victor in #3744
  • Update Server Deployment guide for pipelines by @r4victor in #3745
  • [Docs] Add dstack-sshproxy deployment guide by @un-def in #3720
  • Revamp repo errors handling by @un-def in #3730
  • [chore]: Fix add_row_from_dict() typing issues by @jvstme in #3739
  • Handle concurrent repo blob/file archive uploads by @un-def in #3737
  • Verda: make startup script and SSH key lifecycle per-instance with reliable cleanup by @peterschmidt85 in #3718

Full changelog: 0.20.15...0.20.16