Skip to content

Commit 488104a

Browse files
[Docs] Added Tenstorrent example (#2596)
1 parent 2916120 commit 488104a

File tree

9 files changed

+257
-6
lines changed

9 files changed

+257
-6
lines changed

docs/docs/concepts/fleets.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -260,6 +260,10 @@ Define a fleet configuration as a YAML file in your project directory. The file
260260
2. Hosts with Intel Gaudi accelerators should be pre-installed with [Gaudi software and drivers](https://docs.habana.ai/en/latest/Installation_Guide/Driver_Installation.html#driver-installation).
261261
This should include the drivers, `hl-smi`, and Habana Container Runtime.
262262

263+
=== "Tenstorrent"
264+
2. Hosts with Tenstorrent accelerators should be pre-installed with [Tenstorrent software](https://docs.tenstorrent.com/getting-started/README.html#software-installation).
265+
This should include the drivers, `tt-smi`, and HugePages.
266+
263267
3. The user specified should have passwordless `sudo` access.
264268

265269
To create or update the fleet, pass the fleet configuration to [`dstack apply`](../reference/cli/dstack/apply.md):

docs/docs/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ for AI workloads both in the cloud and on-prem, speeding up the development, tra
77

88
#### Accelerators
99

10-
`dstack` supports `NVIDIA`, `AMD`, `Google TPU`, and `Intel Gaudi` accelerators out of the box.
10+
`dstack` supports `NVIDIA`, `AMD`, `TPU`, `Intel Gaudi`, and `Tenstorrent` accelerators out of the box.
1111

1212
## How does it work?
1313

docs/examples.md

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,17 @@ hide:
101101
</p>
102102
</a>
103103

104+
<a href="/examples/accelerators/tpu"
105+
class="feature-cell sky">
106+
<h3>
107+
TPU
108+
</h3>
109+
110+
<p>
111+
Deploy and fine-tune LLMs on TPU
112+
</p>
113+
</a>
114+
104115
<a href="/examples/accelerators/intel"
105116
class="feature-cell sky">
106117
<h3>
@@ -112,15 +123,14 @@ hide:
112123
</p>
113124
</a>
114125

115-
116-
<a href="/examples/accelerators/tpu"
126+
<a href="/examples/accelerators/tenstorrent"
117127
class="feature-cell sky">
118128
<h3>
119-
TPU
129+
Tenstorrent
120130
</h3>
121131

122132
<p>
123-
Deploy and fine-tune LLMs on TPU
133+
Deploy and fine-tune LLMs on Tenstorrent
124134
</p>
125135
</a>
126136
</div>

docs/examples/accelerators/tenstorrent/index.md

Whitespace-only changes.
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
type: dev-environment
2+
name: cursor
3+
4+
image: dstackai/tt-smi:latest
5+
6+
ide: cursor
7+
8+
resources:
9+
gpu: n150:1
Lines changed: 193 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,193 @@
1+
# Tenstorrent
2+
3+
`dstack` supports running dev environments, tasks, and services on Tenstorrent
4+
[Wormwhole :material-arrow-top-right-thin:{ .external }](https://tenstorrent.com/en/hardware/wormhole){:target="_blank"} accelerators via SSH fleets.
5+
6+
7+
??? info "SSH fleets"
8+
<div editor-title="examples/acceleators/tenstorrent/fleet.dstack.yml">
9+
10+
```yaml
11+
type: fleet
12+
name: wormwhole-fleet
13+
14+
ssh_config:
15+
user: root
16+
identity_file: ~/.ssh/id_rsa
17+
# Configure any number of hosts with n150 or n300 PCEe boards
18+
hosts:
19+
- 192.168.2.108
20+
```
21+
22+
</div>
23+
24+
> Hosts should be pre-installed with [Tenstorrent software](https://docs.tenstorrent.com/getting-started/README.html#software-installation).
25+
This should include the drivers, `tt-smi`, and HugePages.
26+
27+
To apply the fleet configuration, run:
28+
29+
<div class="termy">
30+
31+
```bash
32+
$ dstack apply -f examples/acceleators/tenstorrent/fleet.dstack.yml
33+
34+
FLEET RESOURCES PRICE STATUS CREATED
35+
wormwhole-fleet cpu=12 mem=32GB disk=243GB n150:12GB $0 idle 18 sec ago
36+
```
37+
38+
</div>
39+
40+
For more details on fleet configuration, refer to [SSH fleets](https://dstack.ai/docs/concepts/fleets#ssh).
41+
42+
## Services
43+
44+
Here's an example of a service that deploys
45+
[`Llama-3.2-1B-Instruct` :material-arrow-top-right-thin:{ .external }](https://huggingface.co/meta-llama/Llama-3.2-1B){:target="_blank"}
46+
using [Tenstorrent Inference Service :material-arrow-top-right-thin:{ .external }](https://github.com/tenstorrent/tt-inference-server){:target="_blank"}.
47+
48+
<div editor-title="examples/acceleators/tenstorrent/tt-inference-server.dstack.yml">
49+
50+
```yaml
51+
type: service
52+
name: tt-inference-server
53+
54+
env:
55+
- HF_TOKEN
56+
- HF_MODEL_REPO_ID=meta-llama/Llama-3.2-1B-Instruct
57+
image: ghcr.io/tenstorrent/tt-inference-server/vllm-tt-metal-src-release-ubuntu-20.04-amd64:0.0.4-v0.56.0-rc47-e2e0002ac7dc
58+
commands:
59+
- |
60+
. ${PYTHON_ENV_DIR}/bin/activate
61+
pip install "huggingface_hub[cli]"
62+
export LLAMA_DIR="/data/models--$(echo "$HF_MODEL_REPO_ID" | sed 's/\//--/g')/"
63+
huggingface-cli download $HF_MODEL_REPO_ID --local-dir $LLAMA_DIR
64+
python /home/container_app_user/app/src/run_vllm_api_server.py
65+
port: 7000
66+
67+
model: meta-llama/Llama-3.2-1B-Instruct
68+
69+
# Cache downloaded model
70+
volumes:
71+
- /mnt/data/tt-inference-server/data:/data
72+
73+
resources:
74+
gpu: n150:1
75+
```
76+
77+
</div>
78+
79+
Go ahead and run configuration using `dstack apply`:
80+
81+
<div class="termy">
82+
83+
```bash
84+
$ dstack apply -f examples/acceleators/tenstorrent/tt-inference-server.dstack.yml
85+
```
86+
</div>
87+
88+
Once the service is up, it will be available via the service endpoint
89+
at `<dstack server URL>/proxy/services/<project name>/<run name>/`.
90+
91+
<div class="termy">
92+
93+
```shell
94+
$ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
95+
-X POST \
96+
-H 'Authorization: Bearer &lt;dstack token&gt;' \
97+
-H 'Content-Type: application/json' \
98+
-d '{
99+
"model": "meta-llama/Llama-3.2-1B-Instruct",
100+
"messages": [
101+
{
102+
"role": "system",
103+
"content": "You are a helpful assistant."
104+
},
105+
{
106+
"role": "user",
107+
"content": "What is Deep Learning?"
108+
}
109+
],
110+
"stream": true,
111+
"max_tokens": 512
112+
}'
113+
```
114+
115+
</div>
116+
117+
Additionally, the model is available via `dstack`'s control plane UI:
118+
119+
![](https://github.com/dstackai/static-assets/blob/main/static-assets/images/dstack-tenstorrent-model-ui.png?raw=true){ width=800 }
120+
121+
When a [gateway](https://dstack.ai/docs/concepts/gateways.md) is configured, the service endpoint
122+
is available at `https://<run name>.<gateway domain>/`.
123+
124+
> Services support many options, including authentication, auto-scaling policies, etc. To learn more, refer to [Services](https://dstack.ai/docs/concepts/services).
125+
126+
## Tasks
127+
128+
Below is a task that simply runs `tt-smi -s`. Tasks can be used for training, fine-tuning, batch inference, or antything else.
129+
130+
<div editor-title="examples/acceleators/tenstorrent/tt-smi.dstack.yml">
131+
132+
```yaml
133+
type: task
134+
# The name is optional, if not specified, generated randomly
135+
name: tt-smi
136+
137+
env:
138+
- HF_TOKEN
139+
140+
# (Required) Use any image with TT drivers
141+
image: dstackai/tt-smi:latest
142+
143+
# Use any commands
144+
commands:
145+
- tt-smi -s
146+
147+
# Specify the number of accelerators, model, etc
148+
resources:
149+
gpu: n150:1
150+
151+
# Uncomment if you want to run on a cluster of nodes
152+
#nodes: 2
153+
```
154+
155+
</div>
156+
157+
> Tasks support many options, including multi-node configuration, max duration, etc. To learn more, refer to [Tasks](https://dstack.ai/docs/concepts/tasks).
158+
159+
## Dev environments
160+
161+
Below is an example of a dev environment configuration. It can be used to provision a dev environemnt that can be accessed via your desktop IDE.
162+
163+
<div editor-title="examples/acceleators/tenstorrent/.dstack.yml">
164+
165+
```yaml
166+
type: dev-environment
167+
# The name is optional, if not specified, generated randomly
168+
name: cursor
169+
170+
# (Optional) List required env variables
171+
env:
172+
- HF_TOKEN
173+
174+
image: dstackai/tt-smi:latest
175+
176+
# Can be `vscode` or `cursor`
177+
ide: cursor
178+
179+
resources:
180+
gpu: n150:1
181+
```
182+
183+
</div>
184+
185+
If you run it via `dstack apply`, it will output the URL to access it via your desktop IDE.
186+
187+
![](https://github.com/dstackai/static-assets/blob/main/static-assets/images/dstack-tenstorrent-cursor.png?raw=true){ width=800 }
188+
189+
> Dev nevironments support many options, including inactivity and max duration, IDE configuration, etc. To learn more, refer to [Dev environments](https://dstack.ai/docs/concepts/tasks).
190+
191+
??? info "Feedback"
192+
Found a bug, or want to request a feature? File it in the [issue tracker :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/issues){:target="_blank"},
193+
or share via [Discord :material-arrow-top-right-thin:{ .external }](https://discord.gg/u8SmfwPpMd){:target="_blank"}.
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
type: service
2+
name: tt-inference-server
3+
4+
env:
5+
- HF_TOKEN
6+
- HF_MODEL_REPO_ID=meta-llama/Llama-3.2-1B-Instruct
7+
image: ghcr.io/tenstorrent/tt-inference-server/vllm-tt-metal-src-release-ubuntu-20.04-amd64:0.0.4-v0.56.0-rc47-e2e0002ac7dc
8+
commands:
9+
- |
10+
. ${PYTHON_ENV_DIR}/bin/activate
11+
pip install "huggingface_hub[cli]"
12+
export LLAMA_DIR="/data/models--$(echo "$HF_MODEL_REPO_ID" | sed 's/\//--/g')/"
13+
huggingface-cli download $HF_MODEL_REPO_ID --local-dir $LLAMA_DIR
14+
python /home/container_app_user/app/src/run_vllm_api_server.py
15+
port: 7000
16+
17+
model: meta-llama/Llama-3.2-1B-Instruct
18+
19+
# Cache downloaded model
20+
volumes:
21+
- /mnt/data/tt-inference-server/data:/data
22+
23+
resources:
24+
gpu: n150:1
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
type: task
2+
name: tt-smi
3+
4+
image: dstackai/tt-smi:latest
5+
6+
commands:
7+
- tt-smi -s
8+
9+
resources:
10+
gpu: n150:1

mkdocs.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -256,8 +256,9 @@ nav:
256256
- Llama: examples/llms/llama/index.md
257257
- Accelerators:
258258
- AMD: examples/accelerators/amd/index.md
259-
- Intel Gaudi: examples/accelerators/intel/index.md
260259
- TPU: examples/accelerators/tpu/index.md
260+
- Intel Gaudi: examples/accelerators/intel/index.md
261+
- Tenstorrent: examples/accelerators/tenstorrent/index.md
261262
- Misc:
262263
- Docker Compose: examples/misc/docker-compose/index.md
263264
- NCCL Tests: examples/misc/nccl-tests/index.md

0 commit comments

Comments
 (0)