|
| 1 | +--- |
| 2 | +title: "Supporting ARM and NVIDIA GH200 on Lambda" |
| 3 | +date: 2025-05-12 |
| 4 | +description: "TBA" |
| 5 | +slug: gh200-on-lambda |
| 6 | +image: https://dstack.ai/static-assets/static-assets/images/dstack-arm--gh200-lambda-min.png |
| 7 | +categories: |
| 8 | + - ARM |
| 9 | + - Cloud fleets |
| 10 | + - SSH fleets |
| 11 | +--- |
| 12 | + |
| 13 | +# Supporting ARM and NVIDIA GH200 on Lambda |
| 14 | + |
| 15 | +The latest update to `dstack` introduces support for NVIDIA GH200 instances on [Lambda](../../docs/concepts/backends.md#lambda) |
| 16 | +and enables ARM-powered hosts, including GH200 and GB200, with [SSH fleets](../../docs/concepts/fleets.md#ssh). |
| 17 | + |
| 18 | +<img src="https://dstack.ai/static-assets/static-assets/images/dstack-arm--gh200-lambda-min.png" width="630"/> |
| 19 | + |
| 20 | +<!-- more --> |
| 21 | + |
| 22 | +## ARM support |
| 23 | + |
| 24 | +Previously, `dstack` only supported x86 architecture with both cloud providers as well as on-prem clusters. With the latest update, it’s now possible to use both cloud and SSH fleets with ARM-based CPUs too. To request ARM CPUs in a run or fleet configuration, specify the arm architecture in the `resources`.`cpu` property: |
| 25 | + |
| 26 | +```yaml |
| 27 | +resources: |
| 28 | + cpu: arm:4.. # 4 or more ARM cores |
| 29 | +``` |
| 30 | +
|
| 31 | +If the hosts in an SSH fleet have ARM CPUs, `dstack` will automatically detect both ARM-based CPUs as well as ARM-based GPU Superchips such as GH200 and enable their use. |
| 32 | + |
| 33 | +To see available offers with ARM CPUs, pass `--cpu arm` to the `dstack offer` command. |
| 34 | + |
| 35 | +## About GH200 |
| 36 | + |
| 37 | +NVIDIA Grace is the first NVIDIA data center CPU, built on top of ARM specifically for AI workloads. The NVIDIA GH200 Superchip brings together a 72-core NVIDIA Grace CPU with an NVIDIA H100 GPU, connected with a high-bandwidth, memory-coherent NVIDIA NVLink-C2C interconnect. |
| 38 | + |
| 39 | +| CPU | GPU | CPU Memory | GPU Memory | NVLink-C2C | |
| 40 | +| ------------- | ---- | ------------------------ | ------------------ | ---------- | |
| 41 | +| Grace 72-core | H100 | 480GB LPDDR5X at 512GB/s | 96GB HBM3 at 4TB/s | 900GB/s | |
| 42 | + |
| 43 | +The GH200 Superchip’s 450 GB/s bidirectional bandwidth enables KV cache offloading to CPU memory. While prefill can leverage CPU memory for optimizations like prefix caching, generation benefits from the GH200’s higher memory bandwidth. |
| 44 | + |
| 45 | +## GH200 on Lambda |
| 46 | + |
| 47 | +[Lambda :material-arrow-top-right-thin:{ .external }](https://cloud.lambda.ai/sign-up?_gl=1*1qovk06*_gcl_au*MTg2MDc3OTAyOS4xNzQyOTA3Nzc0LjE3NDkwNTYzNTYuMTc0NTQxOTE2MS4xNzQ1NDE5MTYw*_ga*MTE2NDM5MzI0My4xNzQyOTA3Nzc0*_ga_43EZT1FM6Q*czE3NDY3MTczOTYkbzM0JGcxJHQxNzQ2NzE4MDU2JGo1NyRsMCRoMTU0Mzg1NTU1OQ..){:target="_blank"} provides secure, user-friendly, reliable, and affordable cloud GPUs. Since end of last year, Lambda started to offer on-demand GH200 instances through their public cloud. Furthermore, they offer these instances at the promotional price of $1.49 per hour until June 30th 2025. |
| 48 | + |
| 49 | +With the latest `dstack` update, it’s now possible to use these instances with your Lambda account whether you’re running a dev environment, task, or service: |
| 50 | + |
| 51 | +<div editor-title=".dstack.yml"> |
| 52 | + |
| 53 | +```yaml |
| 54 | +type: dev-environment |
| 55 | +name: my-env |
| 56 | +image: nvidia/cuda:12.8.1-base-ubuntu20.04 |
| 57 | +ide: vscode |
| 58 | +
|
| 59 | +resources: |
| 60 | + gpu: GH200:1 |
| 61 | +``` |
| 62 | + |
| 63 | +</div> |
| 64 | + |
| 65 | +> Note, you have to use an ARM-based Docker image. |
| 66 | + |
| 67 | +To determine whether Lambda has GH200 on-demand instances available, run `dstack apply`: |
| 68 | + |
| 69 | +<div class="termy"> |
| 70 | + |
| 71 | +```shell |
| 72 | +$ dstack apply -f .dstack.yml |
| 73 | +
|
| 74 | + # BACKEND RESOURCES INSTANCE TYPE PRICE |
| 75 | + 1 lambda (us-east-3) cpu=arm:64 mem=464GB GH200:96GB:1 gpu_1x_gh200 $1.49 |
| 76 | +``` |
| 77 | + |
| 78 | +</div> |
| 79 | + |
| 80 | +!!! info "Retry policy" |
| 81 | + Note, if GH200s are not available at the moment, you can specify the [retry policy](../../docs/concepts/dev-environments.md#retry-policy) in your run configuration so that `dstack` can run the configuration once the GPU becomes available. |
| 82 | + |
| 83 | +> If you have GH200 or GB200-powered hosts already provisioned via Lambda, another cloud provider, or on-prem, you can now use them with [SSH fleets](../../docs/concepts/fleets.md#ssh). |
| 84 | + |
| 85 | +!!! info "What's next?" |
| 86 | + 1. Sign up with [Lambda :material-arrow-top-right-thin:{ .external }](https://cloud.lambda.ai/sign-up?_gl=1*1qovk06*_gcl_au*MTg2MDc3OTAyOS4xNzQyOTA3Nzc0LjE3NDkwNTYzNTYuMTc0NTQxOTE2MS4xNzQ1NDE5MTYw*_ga*MTE2NDM5MzI0My4xNzQyOTA3Nzc0*_ga_43EZT1FM6Q*czE3NDY3MTczOTYkbzM0JGcxJHQxNzQ2NzE4MDU2JGo1NyRsMCRoMTU0Mzg1NTU1OQ..){:target="_blank"} |
| 87 | + 2. Set up the [Lambda](../../docs/concepts/backends.md#lambda) backend |
| 88 | + 3. Follow [Quickstart](../../docs/quickstart.md) |
| 89 | + 4. Check [dev environments](../../docs/concepts/dev-environments.md), [tasks](../../docs/concepts/tasks.md), [services](../../docs/concepts/services.md), and [fleets](../../docs/concepts/fleets.md) |
0 commit comments