Skip to content

Commit 1f13a57

Browse files
[Blog] Supporting ARM and NVIDIA GH200 on Lambda
1 parent 7e5a965 commit 1f13a57

File tree

1 file changed

+89
-0
lines changed

1 file changed

+89
-0
lines changed

docs/blog/posts/gh200-on-lambda.md

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
---
2+
title: "Supporting ARM and NVIDIA GH200 on Lambda"
3+
date: 2025-05-12
4+
description: "TBA"
5+
slug: gh200-on-lambda
6+
image: https://dstack.ai/static-assets/static-assets/images/dstack-arm--gh200-lambda-min.png
7+
categories:
8+
- ARM
9+
- Cloud fleets
10+
- SSH fleets
11+
---
12+
13+
# Supporting ARM and NVIDIA GH200 on Lambda
14+
15+
The latest update to `dstack` introduces support for NVIDIA GH200 instances on [Lambda](../../docs/concepts/backends.md#lambda)
16+
and enables ARM-powered hosts, including GH200 and GB200, with [SSH fleets](../../docs/concepts/fleets.md#ssh).
17+
18+
<img src="https://dstack.ai/static-assets/static-assets/images/dstack-arm--gh200-lambda-min.png" width="630"/>
19+
20+
<!-- more -->
21+
22+
## ARM support
23+
24+
Previously, `dstack` only supported x86 architecture with both cloud providers as well as on-prem clusters. With the latest update, it’s now possible to use both cloud and SSH fleets with ARM-based CPUs too. To request ARM CPUs in a run or fleet configuration, specify the arm architecture in the `resources`.`cpu` property:
25+
26+
```yaml
27+
resources:
28+
cpu: arm:4.. # 4 or more ARM cores
29+
```
30+
31+
If the hosts in an SSH fleet have ARM CPUs, `dstack` will automatically detect both ARM-based CPUs as well as ARM-based GPU Superchips such as GH200 and enable their use.
32+
33+
To see available offers with ARM CPUs, pass `--cpu arm` to the `dstack offer` command.
34+
35+
## About GH200
36+
37+
NVIDIA Grace is the first NVIDIA data center CPU, built on top of ARM specifically for AI workloads. The NVIDIA GH200 Superchip brings together a 72-core NVIDIA Grace CPU with an NVIDIA H100 GPU, connected with a high-bandwidth, memory-coherent NVIDIA NVLink-C2C interconnect.
38+
39+
| CPU | GPU | CPU Memory | GPU Memory | NVLink-C2C |
40+
| ------------- | ---- | ------------------------ | ------------------ | ---------- |
41+
| Grace 72-core | H100 | 480GB LPDDR5X at 512GB/s | 96GB HBM3 at 4TB/s | 900GB/s |
42+
43+
The GH200 Superchip’s 450 GB/s bidirectional bandwidth enables KV cache offloading to CPU memory. While prefill can leverage CPU memory for optimizations like prefix caching, generation benefits from the GH200’s higher memory bandwidth.
44+
45+
## GH200 on Lambda
46+
47+
[Lambda :material-arrow-top-right-thin:{ .external }](https://cloud.lambda.ai/sign-up?_gl=1*1qovk06*_gcl_au*MTg2MDc3OTAyOS4xNzQyOTA3Nzc0LjE3NDkwNTYzNTYuMTc0NTQxOTE2MS4xNzQ1NDE5MTYw*_ga*MTE2NDM5MzI0My4xNzQyOTA3Nzc0*_ga_43EZT1FM6Q*czE3NDY3MTczOTYkbzM0JGcxJHQxNzQ2NzE4MDU2JGo1NyRsMCRoMTU0Mzg1NTU1OQ..){:target="_blank"} provides secure, user-friendly, reliable, and affordable cloud GPUs. Since end of last year, Lambda started to offer on-demand GH200 instances through their public cloud. Furthermore, they offer these instances at the promotional price of $1.49 per hour until June 30th 2025.
48+
49+
With the latest `dstack` update, it’s now possible to use these instances with your Lambda account whether you’re running a dev environment, task, or service:
50+
51+
<div editor-title=".dstack.yml">
52+
53+
```yaml
54+
type: dev-environment
55+
name: my-env
56+
image: nvidia/cuda:12.8.1-base-ubuntu20.04
57+
ide: vscode
58+
59+
resources:
60+
gpu: GH200:1
61+
```
62+
63+
</div>
64+
65+
> Note, you have to use an ARM-based Docker image.
66+
67+
To determine whether Lambda has GH200 on-demand instances available, run `dstack apply`:
68+
69+
<div class="termy">
70+
71+
```shell
72+
$ dstack apply -f .dstack.yml
73+
74+
# BACKEND RESOURCES INSTANCE TYPE PRICE
75+
1 lambda (us-east-3) cpu=arm:64 mem=464GB GH200:96GB:1 gpu_1x_gh200 $1.49
76+
```
77+
78+
</div>
79+
80+
!!! info "Retry policy"
81+
Note, if GH200s are not available at the moment, you can specify the [retry policy](../../docs/concepts/dev-environments.md#retry-policy) in your run configuration so that `dstack` can run the configuration once the GPU becomes available.
82+
83+
> If you have GH200 or GB200-powered hosts already provisioned via Lambda, another cloud provider, or on-prem, you can now use them with [SSH fleets](../../docs/concepts/fleets.md#ssh).
84+
85+
!!! info "What's next?"
86+
1. Sign up with [Lambda :material-arrow-top-right-thin:{ .external }](https://cloud.lambda.ai/sign-up?_gl=1*1qovk06*_gcl_au*MTg2MDc3OTAyOS4xNzQyOTA3Nzc0LjE3NDkwNTYzNTYuMTc0NTQxOTE2MS4xNzQ1NDE5MTYw*_ga*MTE2NDM5MzI0My4xNzQyOTA3Nzc0*_ga_43EZT1FM6Q*czE3NDY3MTczOTYkbzM0JGcxJHQxNzQ2NzE4MDU2JGo1NyRsMCRoMTU0Mzg1NTU1OQ..){:target="_blank"}
87+
2. Set up the [Lambda](../../docs/concepts/backends.md#lambda) backend
88+
3. Follow [Quickstart](../../docs/quickstart.md)
89+
4. Check [dev environments](../../docs/concepts/dev-environments.md), [tasks](../../docs/concepts/tasks.md), [services](../../docs/concepts/services.md), and [fleets](../../docs/concepts/fleets.md)

0 commit comments

Comments
 (0)