0.19.0-v1
Simplified backend integration
To provide best multi-cloud experience and GPU availability, dstack integrates with many cloud GPU providers including AWS, Azure, GCP, RunPod, Lambda, Vultr, and others. As we'd like to see even more GPU providers supported by dstack, this release comes with a major internal refactoring aimed to simplify the process of adding new integrations. See the Backend integration guide for more details. Join our Discord if have any questions about the integration process.
MPI workloads and NCCL tests
dstack now configures internode SSH connectivity for distributed tasks. You can log in to any node from any node via SSH with a simple ssh <node_ip> command. The out-of-the-box SSH connectivity also allows running mpirun. See the NCCL Tests example.
Cost and usage metrics
In addition to DCGM metrics, dstack now exports a set of Prometheus metrics for cost and usage tracking. Here's how it may look in the Grafana dashboard:
See the documentation for a full list of metrics and labels.
Cursor IDE support
dstack can now launch Cursor dev environments. Just specify ide: cursor in the run configuration:
type: dev-environment
ide: cursorDeprecations
- The Python API methods
get_plan(),exec_plan(), andsubmit()are deprecated in favor ofget_run_plan(),apply_plan(), andapply_configuration(). The deprecated methods had clumsy signatures with many top-level parameters. The new signatures align better with the CLI and HTTP API.
Breaking changes
The 0.19.0 release drops several previously deprecated or undocumented features. There are no other significant breaking changes. The 0.19.0 server continues to support 0.18.x CLI versions. But the 0.19.0 CLI does not work with older 0.18.x servers, so you should update the server first or the server and the clients simultaneously.
- Drop the
dstack runCLI command. - Drop the
--attachmode for thedstack logsCLI command. - Drop Pools functionality:
- The
dstack poolCLI commands. /api/project/{project_name}/runs/get_offers,/api/project/{project_name}/runs/create_instance,/api/pools/list_instances,/api/project/{project_name}/pool/*API endpoints.pool_nameandinstance_nameparameters in profiles and run configurations.
- The
- Remove
retry_policyfrom profiles. - Remove
termination_idle_timeandtermination_policyfrom profiles and fleet configurations. - Drop
RUN_NAMEandREPO_IDrun environment variables. - Drop the
/api/backends/config_valuesendpoint used for interactive configuration. - The API accepts and returns
azure_config["regions"]instead ofazure_config["locations"](unified withserver/config.yml).
What's Changed
- Fix gateways with a previously used IP address by @jvstme in dstackai/dstack#2388
- Simplify backend configurators and models by @r4victor in dstackai/dstack#2389
- Store BackendType as string instead of enum in the DB by @r4victor in dstackai/dstack#2393
- Introduce ComputeWith classes to detect compute features by @r4victor in dstackai/dstack#2392
- Move backend/compute configs from config.py to models.py by @r4victor in dstackai/dstack#2395
- Provide default run_job implementation for VM backends by @r4victor in dstackai/dstack#2396
- Configure inter-node SSH on multi-node tasks by @un-def in dstackai/dstack#2394
- [Blog] Using SSH fleets with TensorWave's private AMD cloud by @peterschmidt85 in dstackai/dstack#2391
- Add script to generate boilerplate code for new backend by @r4victor in dstackai/dstack#2397
- Add
datacenter-gpu-manager-4-proprietaryto CUDA images by @un-def in dstackai/dstack#2399 - Drop pools by @r4victor in dstackai/dstack#2401
- Transition high-level Python runs API to new methods by @r4victor in dstackai/dstack#2403
- Drop dstack run by @r4victor in dstackai/dstack#2404
- Drop dstack logs --attach by @r4victor in dstackai/dstack#2405
- Remove retry_policy from profiles by @r4victor in dstackai/dstack#2406
- Remove termination_idle_time and termination_policy by @r4victor in dstackai/dstack#2407
- Clean up models backward compatibility code by @r4victor in dstackai/dstack#2408
- Restore removed models fields for compatibility with 0.18 clients by @r4victor in dstackai/dstack#2414
- Clean up legacy repo fields by @jvstme in dstackai/dstack#2411
- Switch AWS gateways from t2.micro to t3.micro by @r4victor in dstackai/dstack#2416
- Remove old client excludes by @r4victor in dstackai/dstack#2417
- Use new JobTerminationReason values by @r4victor in dstackai/dstack#2418
- Drop RUN_NAME and REPO_ID env vars by @r4victor in dstackai/dstack#2419
- Drop irrelevant Nebius backend implementation by @jvstme in dstackai/dstack#2421
- [Feature]: Support the cursor IDE #2412 by @peterschmidt85 in dstackai/dstack#2413
- Simplify implementation of new backends #2372 by @olgenn in dstackai/dstack#2423
- Support multiple domains with Entra login by @r4victor in dstackai/dstack#2424
- Support setting project members by email by @r4victor in dstackai/dstack#2429
- Fix json schema reference and invalid properties errors by @r4victor in dstackai/dstack#2433
- [Blog]: DeepSeek R1 inference performance: MI300X vs. H200 by @peterschmidt85 in dstackai/dstack#2425
- Add new metrics by @un-def in dstackai/dstack#2434
- Add instance and job cost/usage Prometheus metrics by @un-def in dstackai/dstack#2432
- [Docker] Add dstackai/efa image by @un-def in dstackai/dstack#2422
- Restore fleet termination_policy for 0.18 backward compatibility by @r4victor in dstackai/dstack#2436
- [Bug]: Search over users doesn't work by @olgenn in dstackai/dstack#2439
- [Feature]: Support activating/deactivating users via the UI by @olgenn in dstackai/dstack#2440
- [Feature]: Display Assigned Gateway Information on Run Pages by @olgenn in dstackai/dstack#2438
- [Docs]: Update the
Metricsguide by @peterschmidt85 in dstackai/dstack#2441 - [Examples] Update nccl-tests by @un-def in dstackai/dstack#2415
Full Changelog: dstackai/dstack@0.18.44...0.19.0
