Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/docs/concepts/dev-environments.md
Original file line number Diff line number Diff line change
Expand Up @@ -187,10 +187,16 @@ resources:

</div>

The `cpu` property also allows you to specify the CPU architecture, `x86` or `arm`. Examples:
Comment thread
un-def marked this conversation as resolved.
`x86:16` (16 x86-64 cores), `arm:8..` (at least 8 ARM64 cores).
If the architecture is not specified, `dstack` tries to infer it from the `gpu` specification
using `x86` as the fallback value.

The `gpu` property allows specifying not only memory size but also GPU vendor, names
and their quantity. Examples: `nvidia` (one NVIDIA GPU), `A100` (one A100), `A10G,A100` (either A10G or A100),
`A100:80GB` (one A100 of 80GB), `A100:2` (two A100), `24GB..40GB:2` (two GPUs between 24GB and 40GB),
`A100:40GB:2` (two A100 GPUs of 40GB).
If the vendor is not specified, `dstack` tries to infer it from the GPU name using `nvidia` as the fallback value.

??? info "Google Cloud TPU"
To use TPUs, specify its architecture via the `gpu` property.
Expand Down
6 changes: 6 additions & 0 deletions docs/docs/concepts/services.md
Original file line number Diff line number Diff line change
Expand Up @@ -325,10 +325,16 @@ resources:

</div>

The `cpu` property also allows you to specify the CPU architecture, `x86` or `arm`. Examples:
`x86:16` (16 x86-64 cores), `arm:8..` (at least 8 ARM64 cores).
If the architecture is not specified, `dstack` tries to infer it from the `gpu` specification
using `x86` as the fallback value.

The `gpu` property allows specifying not only memory size but also GPU vendor, names
and their quantity. Examples: `nvidia` (one NVIDIA GPU), `A100` (one A100), `A10G,A100` (either A10G or A100),
`A100:80GB` (one A100 of 80GB), `A100:2` (two A100), `24GB..40GB:2` (two GPUs between 24GB and 40GB),
`A100:40GB:2` (two A100 GPUs of 40GB).
If the vendor is not specified, `dstack` tries to infer it from the GPU name using `nvidia` as the fallback value.

??? info "Google Cloud TPU"
To use TPUs, specify its architecture via the `gpu` property.
Expand Down
6 changes: 6 additions & 0 deletions docs/docs/concepts/tasks.md
Original file line number Diff line number Diff line change
Expand Up @@ -204,10 +204,16 @@ resources:

</div>

The `cpu` property also allows you to specify the CPU architecture, `x86` or `arm`. Examples:
`x86:16` (16 x86-64 cores), `arm:8..` (at least 8 ARM64 cores).
If the architecture is not specified, `dstack` tries to infer it from the `gpu` specification
using `x86` as the fallback value.

The `gpu` property allows specifying not only memory size but also GPU vendor, names
and their quantity. Examples: `nvidia` (one NVIDIA GPU), `A100` (one A100), `A10G,A100` (either A10G or A100),
`A100:80GB` (one A100 of 80GB), `A100:2` (two A100), `24GB..40GB:2` (two GPUs between 24GB and 40GB),
`A100:40GB:2` (two A100 GPUs of 40GB).
If the vendor is not specified, `dstack` tries to infer it from the GPU name using `nvidia` as the fallback value.

??? info "Google Cloud TPU"
To use TPUs, specify its architecture via the `gpu` property.
Expand Down
11 changes: 11 additions & 0 deletions docs/docs/reference/api/python/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,10 +136,21 @@ finally:
show_root_toc_entry: false
heading_level: 4
item_id_mapping:
cpu: dstack.api.CPU
gpu: dstack.api.GPU
memory: dstack.api.Memory
Range: dstack.api.Range

### `dstack.api.CPU` { #dstack.api.CPU data-toc-label="CPU" }

#SCHEMA# dstack.api.CPU
overrides:
show_root_heading: false
show_root_toc_entry: false
heading_level: 4
item_id_mapping:
Range: dstack.api.Range

### `dstack.api.GPU` { #dstack.api.GPU data-toc-label="GPU" }

#SCHEMA# dstack.api.GPU
Expand Down
8 changes: 8 additions & 0 deletions docs/docs/reference/dstack.yml/dev-environment.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,14 @@ The `dev-environment` configuration type allows running [dev environments](../..
required: true
item_id_prefix: resources-

#### `resources.cpu` { #resources-cpu data-toc-label="cpu" }

#SCHEMA# dstack._internal.core.models.resources.CPUSpec
overrides:
show_root_heading: false
type:
required: true

#### `resources.gpu` { #resources-gpu data-toc-label="gpu" }

#SCHEMA# dstack._internal.core.models.resources.GPUSpec
Expand Down
12 changes: 10 additions & 2 deletions docs/docs/reference/dstack.yml/fleet.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,15 +46,23 @@ The `fleet` configuration type allows creating and updating fleets.
required: true
item_id_prefix: resources-

#### `resouces.gpu` { #resources-gpu data-toc-label="gpu" }
#### `resources.cpu` { #resources-cpu data-toc-label="cpu" }

#SCHEMA# dstack._internal.core.models.resources.CPUSpec
overrides:
show_root_heading: false
type:
required: true

#### `resources.gpu` { #resources-gpu data-toc-label="gpu" }

#SCHEMA# dstack._internal.core.models.resources.GPUSpec
overrides:
show_root_heading: false
type:
required: true

#### `resouces.disk` { #resources-disk data-toc-label="disk" }
#### `resources.disk` { #resources-disk data-toc-label="disk" }

#SCHEMA# dstack._internal.core.models.resources.DiskSpec
overrides:
Expand Down
12 changes: 10 additions & 2 deletions docs/docs/reference/dstack.yml/service.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,15 +129,23 @@ The `service` configuration type allows running [services](../../concepts/servic
required: true
item_id_prefix: resources-

#### `resouces.gpu` { #resources-gpu data-toc-label="gpu" }
#### `resources.cpu` { #resources-cpu data-toc-label="cpu" }

#SCHEMA# dstack._internal.core.models.resources.CPUSpec
overrides:
show_root_heading: false
type:
required: true

#### `resources.gpu` { #resources-gpu data-toc-label="gpu" }

#SCHEMA# dstack._internal.core.models.resources.GPUSpec
overrides:
show_root_heading: false
type:
required: true

#### `resouces.disk` { #resources-disk data-toc-label="disk" }
#### `resources.disk` { #resources-disk data-toc-label="disk" }

#SCHEMA# dstack._internal.core.models.resources.DiskSpec
overrides:
Expand Down
12 changes: 10 additions & 2 deletions docs/docs/reference/dstack.yml/task.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,15 +35,23 @@ The `task` configuration type allows running [tasks](../../concepts/tasks.md).
required: true
item_id_prefix: resources-

#### `resouces.gpu` { #resources-gpu data-toc-label="gpu" }
#### `resources.cpu` { #resources-cpu data-toc-label="cpu" }

#SCHEMA# dstack._internal.core.models.resources.CPUSpec
overrides:
show_root_heading: false
type:
required: true

#### `resources.gpu` { #resources-gpu data-toc-label="gpu" }

#SCHEMA# dstack._internal.core.models.resources.GPUSpec
overrides:
show_root_heading: false
type:
required: true

#### `resouces.disk` { #resources-disk data-toc-label="disk" }
#### `resources.disk` { #resources-disk data-toc-label="disk" }

#SCHEMA# dstack._internal.core.models.resources.DiskSpec
overrides:
Expand Down
7 changes: 5 additions & 2 deletions docs/docs/reference/environment-variables.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,8 +117,11 @@ For more details on the options below, refer to the [server deployment](../guide
* `DSTACK_SERVER_MAX_OFFERS_TRIED` - Sets how many instance offers to try when starting a job.
Setting a high value can degrade server performance.
* `DSTACK_RUNNER_VERSION` – Sets exact runner version for debug. Defaults to `latest`. Ignored if `DSTACK_RUNNER_DOWNLOAD_URL` is set.
* `DSTACK_RUNNER_DOWNLOAD_URL` – Overrides `dstack-runner` binary download URL.
* `DSTACK_SHIM_DOWNLOAD_URL` – Overrides `dstack-shim` binary download URL.
* `DSTACK_RUNNER_DOWNLOAD_URL` – Overrides `dstack-runner` binary download URL. The URL can contain `{version}` and/or `{arch}` placeholders,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to update

  1. runner/README.md
  2. runner/.just (currently it only builds/uploads one arch)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather open another PR for justfile — I have some ideas for improvements.

where `{version}` is `dstack` version in the `X.Y.Z` format or `latest`, and `{arch}` is either `amd64` or `arm64`, for example,
`https://dstack.example.com/{arch}/{version}/dstack-runner`.
* `DSTACK_SHIM_DOWNLOAD_URL` – Overrides `dstack-shim` binary download URL. The URL can contain `{version}` and/or `{arch}` placeholders,
see `DSTACK_RUNNER_DOWNLOAD_URL` for the details.
* `DSTACK_DEFAULT_CREDS_DISABLED` – Disables default credentials detection if set. Defaults to `None`.
* `DSTACK_LOCAL_BACKEND_ENABLED` – Enables local backend for debug if set. Defaults to `None`.

Expand Down
28 changes: 27 additions & 1 deletion src/dstack/_internal/cli/services/configurators/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
from typing import Dict, List, Optional, Set, Tuple

import gpuhunt
from pydantic import parse_obj_as

import dstack._internal.core.models.resources as resources
from dstack._internal.cli.services.args import disk_spec, gpu_spec, port_mapping
Expand Down Expand Up @@ -39,6 +40,7 @@
TaskConfiguration,
)
from dstack._internal.core.models.repos.base import Repo
from dstack._internal.core.models.resources import CPUSpec
from dstack._internal.core.models.runs import JobSubmission, JobTerminationReason, RunStatus
from dstack._internal.core.services.configs import ConfigManager
from dstack._internal.core.services.diff import diff_models
Expand Down Expand Up @@ -72,6 +74,7 @@ def apply_configuration(
):
self.apply_args(conf, configurator_args, unknown_args)
self.validate_gpu_vendor_and_image(conf)
self.validate_cpu_arch_and_image(conf)
if repo is None:
repo = self.api.repos.load(Path.cwd())
config_manager = ConfigManager()
Expand Down Expand Up @@ -342,7 +345,7 @@ def interpolate_env(self, conf: BaseRunConfiguration):

def validate_gpu_vendor_and_image(self, conf: BaseRunConfiguration) -> None:
"""
Infers `resources.gpu.vendor` if not set, requires `image` if the vendor is AMD.
Infers and sets `resources.gpu.vendor` if not set, requires `image` if the vendor is AMD.
"""
gpu_spec = conf.resources.gpu
if gpu_spec is None:
Expand Down Expand Up @@ -400,6 +403,29 @@ def validate_gpu_vendor_and_image(self, conf: BaseRunConfiguration) -> None:
"`image` is required if `resources.gpu.vendor` is `tenstorrent`"
)

def validate_cpu_arch_and_image(self, conf: BaseRunConfiguration) -> None:
"""
Infers `resources.cpu.arch` if not set, requires `image` if the architecture is ARM.
"""
# TODO: Remove in 0.20. Use conf.resources.cpu directly
cpu_spec = parse_obj_as(CPUSpec, conf.resources.cpu)
arch = cpu_spec.arch
if arch is None:
gpu_spec = conf.resources.gpu
if (
gpu_spec is not None
and gpu_spec.vendor == gpuhunt.AcceleratorVendor.NVIDIA
and gpu_spec.name
and any(map(gpuhunt.is_nvidia_superchip, gpu_spec.name))
):
arch = gpuhunt.CPUArchitecture.ARM
else:
arch = gpuhunt.CPUArchitecture.X86
# NOTE: We don't set the inferred resources.cpu.arch for compatibility with older servers.
# Servers with ARM support set the arch using the same logic.
if arch == gpuhunt.CPUArchitecture.ARM and conf.image is None:
raise ConfigurationError("`image` is required if `resources.cpu.arch` is `arm`")


class RunWithPortsConfigurator(BaseRunConfigurator):
@classmethod
Expand Down
84 changes: 59 additions & 25 deletions src/dstack/_internal/core/backends/base/compute.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
from abc import ABC, abstractmethod
from functools import lru_cache
from pathlib import Path
from typing import Dict, List, Optional
from typing import Dict, List, Literal, Optional

import git
import requests
Expand Down Expand Up @@ -44,6 +44,8 @@
DSTACK_SHIM_BINARY_NAME = "dstack-shim"
DSTACK_RUNNER_BINARY_NAME = "dstack-runner"

GoArchType = Literal["amd64", "arm64"]


class Compute(ABC):
"""
Expand Down Expand Up @@ -483,13 +485,14 @@ def get_shim_env(
base_path: Optional[PathLike] = None,
bin_path: Optional[PathLike] = None,
backend_shim_env: Optional[Dict[str, str]] = None,
arch: Optional[str] = None,
) -> Dict[str, str]:
log_level = "6" # Trace
envs = {
"DSTACK_SHIM_HOME": get_dstack_working_dir(base_path),
"DSTACK_SHIM_HTTP_PORT": str(DSTACK_SHIM_HTTP_PORT),
"DSTACK_SHIM_LOG_LEVEL": log_level,
"DSTACK_RUNNER_DOWNLOAD_URL": get_dstack_runner_download_url(),
"DSTACK_RUNNER_DOWNLOAD_URL": get_dstack_runner_download_url(arch),
"DSTACK_RUNNER_BINARY_PATH": get_dstack_runner_binary_path(bin_path),
"DSTACK_RUNNER_HTTP_PORT": str(DSTACK_RUNNER_HTTP_PORT),
"DSTACK_RUNNER_SSH_PORT": str(DSTACK_RUNNER_SSH_PORT),
Expand All @@ -509,16 +512,19 @@ def get_shim_commands(
base_path: Optional[PathLike] = None,
bin_path: Optional[PathLike] = None,
backend_shim_env: Optional[Dict[str, str]] = None,
arch: Optional[str] = None,
) -> List[str]:
commands = get_shim_pre_start_commands(
base_path=base_path,
bin_path=bin_path,
arch=arch,
)
shim_env = get_shim_env(
authorized_keys=authorized_keys,
base_path=base_path,
bin_path=bin_path,
backend_shim_env=backend_shim_env,
arch=arch,
)
for k, v in shim_env.items():
commands += [f'export "{k}={v}"']
Expand All @@ -539,35 +545,63 @@ def get_dstack_runner_version() -> str:
return version or "latest"


def get_dstack_runner_download_url() -> str:
if url := os.environ.get("DSTACK_RUNNER_DOWNLOAD_URL"):
return url
build = get_dstack_runner_version()
if settings.DSTACK_VERSION is not None:
bucket = "dstack-runner-downloads"
else:
bucket = "dstack-runner-downloads-stgn"
return (
f"https://{bucket}.s3.eu-west-1.amazonaws.com/{build}/binaries/dstack-runner-linux-amd64"
)


def get_dstack_shim_download_url() -> str:
if url := os.environ.get("DSTACK_SHIM_DOWNLOAD_URL"):
return url
build = get_dstack_runner_version()
if settings.DSTACK_VERSION is not None:
bucket = "dstack-runner-downloads"
else:
bucket = "dstack-runner-downloads-stgn"
return f"https://{bucket}.s3.eu-west-1.amazonaws.com/{build}/binaries/dstack-shim-linux-amd64"
def normalize_arch(arch: Optional[str] = None) -> GoArchType:
"""
Converts the given free-form architecture string to the Go GOARCH format.
Only 64-bit x86 and ARM are supported. If the word size is not specified (e.g., `x86`, `arm`),
64-bit is implied.
If the arch is not specified, falls back to `amd64`.
"""
if not arch:
return "amd64"
arch_lower = arch.lower()
if "32" in arch_lower or arch_lower in ["i386", "i686"]:
raise ValueError(f"32-bit architectures are not supported: {arch}")
if arch_lower.startswith("x86") or arch_lower.startswith("amd"):
return "amd64"
if arch_lower.startswith("arm") or arch_lower.startswith("aarch"):
return "arm64"
raise ValueError(f"Unsupported architecture: {arch}")


def get_dstack_runner_download_url(arch: Optional[str] = None) -> str:
url_template = os.environ.get("DSTACK_RUNNER_DOWNLOAD_URL")
if not url_template:
if settings.DSTACK_VERSION is not None:
bucket = "dstack-runner-downloads"
else:
bucket = "dstack-runner-downloads-stgn"
url_template = (
f"https://{bucket}.s3.eu-west-1.amazonaws.com"
"/{version}/binaries/dstack-runner-linux-{arch}"
)
version = get_dstack_runner_version()
arch = normalize_arch(arch)
return url_template.format(version=version, arch=arch)


def get_dstack_shim_download_url(arch: Optional[str] = None) -> str:
url_template = os.environ.get("DSTACK_SHIM_DOWNLOAD_URL")
if not url_template:
if settings.DSTACK_VERSION is not None:
bucket = "dstack-runner-downloads"
else:
bucket = "dstack-runner-downloads-stgn"
url_template = (
f"https://{bucket}.s3.eu-west-1.amazonaws.com"
"/{version}/binaries/dstack-shim-linux-{arch}"
)
version = get_dstack_runner_version()
arch = normalize_arch(arch)
return url_template.format(version=version, arch=arch)


def get_shim_pre_start_commands(
base_path: Optional[PathLike] = None,
bin_path: Optional[PathLike] = None,
arch: Optional[str] = None,
) -> List[str]:
url = get_dstack_shim_download_url()
url = get_dstack_shim_download_url(arch)
dstack_shim_binary_path = get_dstack_shim_binary_path(bin_path)
dstack_working_dir = get_dstack_working_dir(base_path)
return [
Expand Down
Loading
Loading