Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/docs/concepts/fleets.md
Original file line number Diff line number Diff line change
Expand Up @@ -260,6 +260,10 @@ Define a fleet configuration as a YAML file in your project directory. The file
2. Hosts with Intel Gaudi accelerators should be pre-installed with [Gaudi software and drivers](https://docs.habana.ai/en/latest/Installation_Guide/Driver_Installation.html#driver-installation).
This should include the drivers, `hl-smi`, and Habana Container Runtime.

=== "Tenstorrent"
2. Hosts with Tenstorrent accelerators should be pre-installed with [Tenstorrent software](https://docs.tenstorrent.com/getting-started/README.html#software-installation).
This should include the drivers, `tt-smi`, and HugePages.

3. The user specified should have passwordless `sudo` access.

To create or update the fleet, pass the fleet configuration to [`dstack apply`](../reference/cli/dstack/apply.md):
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ for AI workloads both in the cloud and on-prem, speeding up the development, tra

#### Accelerators

`dstack` supports `NVIDIA`, `AMD`, `Google TPU`, and `Intel Gaudi` accelerators out of the box.
`dstack` supports `NVIDIA`, `AMD`, `TPU`, `Intel Gaudi`, and `Tenstorrent` accelerators out of the box.

## How does it work?

Expand Down
18 changes: 14 additions & 4 deletions docs/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,17 @@ hide:
</p>
</a>

<a href="/examples/accelerators/tpu"
class="feature-cell sky">
<h3>
TPU
</h3>

<p>
Deploy and fine-tune LLMs on TPU
</p>
</a>

<a href="/examples/accelerators/intel"
class="feature-cell sky">
<h3>
Expand All @@ -112,15 +123,14 @@ hide:
</p>
</a>


<a href="/examples/accelerators/tpu"
<a href="/examples/accelerators/tenstorrent"
class="feature-cell sky">
<h3>
TPU
Tenstorrent
</h3>

<p>
Deploy and fine-tune LLMs on TPU
Deploy and fine-tune LLMs on Tenstorrent
</p>
</a>
</div>
Expand Down
Empty file.
9 changes: 9 additions & 0 deletions examples/accelerators/tenstorrent/.dstack.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
type: dev-environment
name: cursor

image: dstackai/tt-smi:latest

ide: cursor

resources:
gpu: n150:1
193 changes: 193 additions & 0 deletions examples/accelerators/tenstorrent/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
# Tenstorrent

`dstack` supports running dev environments, tasks, and services on Tenstorrent
[Wormwhole :material-arrow-top-right-thin:{ .external }](https://tenstorrent.com/en/hardware/wormhole){:target="_blank"} accelerators via SSH fleets.


??? info "SSH fleets"
<div editor-title="examples/acceleators/tenstorrent/fleet.dstack.yml">

```yaml
type: fleet
name: wormwhole-fleet

ssh_config:
user: root
identity_file: ~/.ssh/id_rsa
# Configure any number of hosts with n150 or n300 PCEe boards
hosts:
- 192.168.2.108
```

</div>

> Hosts should be pre-installed with [Tenstorrent software](https://docs.tenstorrent.com/getting-started/README.html#software-installation).
This should include the drivers, `tt-smi`, and HugePages.

To apply the fleet configuration, run:

<div class="termy">

```bash
$ dstack apply -f examples/acceleators/tenstorrent/fleet.dstack.yml

FLEET RESOURCES PRICE STATUS CREATED
wormwhole-fleet cpu=12 mem=32GB disk=243GB n150:12GB $0 idle 18 sec ago
```

</div>

For more details on fleet configuration, refer to [SSH fleets](https://dstack.ai/docs/concepts/fleets#ssh).

## Services

Here's an example of a service that deploys
[`Llama-3.2-1B-Instruct` :material-arrow-top-right-thin:{ .external }](https://huggingface.co/meta-llama/Llama-3.2-1B){:target="_blank"}
using [Tenstorrent Inference Service :material-arrow-top-right-thin:{ .external }](https://github.com/tenstorrent/tt-inference-server){:target="_blank"}.

<div editor-title="examples/acceleators/tenstorrent/tt-inference-server.dstack.yml">

```yaml
type: service
name: tt-inference-server

env:
- HF_TOKEN
- HF_MODEL_REPO_ID=meta-llama/Llama-3.2-1B-Instruct
image: ghcr.io/tenstorrent/tt-inference-server/vllm-tt-metal-src-release-ubuntu-20.04-amd64:0.0.4-v0.56.0-rc47-e2e0002ac7dc
commands:
- |
. ${PYTHON_ENV_DIR}/bin/activate
pip install "huggingface_hub[cli]"
export LLAMA_DIR="/data/models--$(echo "$HF_MODEL_REPO_ID" | sed 's/\//--/g')/"
huggingface-cli download $HF_MODEL_REPO_ID --local-dir $LLAMA_DIR
python /home/container_app_user/app/src/run_vllm_api_server.py
port: 7000

model: meta-llama/Llama-3.2-1B-Instruct

# Cache downloaded model
volumes:
- /mnt/data/tt-inference-server/data:/data

resources:
gpu: n150:1
```

</div>

Go ahead and run configuration using `dstack apply`:

<div class="termy">

```bash
$ dstack apply -f examples/acceleators/tenstorrent/tt-inference-server.dstack.yml
```
</div>

Once the service is up, it will be available via the service endpoint
at `<dstack server URL>/proxy/services/<project name>/<run name>/`.

<div class="termy">

```shell
$ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
-X POST \
-H 'Authorization: Bearer &lt;dstack token&gt;' \
-H 'Content-Type: application/json' \
-d '{
"model": "meta-llama/Llama-3.2-1B-Instruct",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is Deep Learning?"
}
],
"stream": true,
"max_tokens": 512
}'
```

</div>

Additionally, the model is available via `dstack`'s control plane UI:

![](https://github.com/dstackai/static-assets/blob/main/static-assets/images/dstack-tenstorrent-model-ui.png?raw=true){ width=800 }

When a [gateway](https://dstack.ai/docs/concepts/gateways.md) is configured, the service endpoint
is available at `https://<run name>.<gateway domain>/`.

> Services support many options, including authentication, auto-scaling policies, etc. To learn more, refer to [Services](https://dstack.ai/docs/concepts/services).

## Tasks

Below is a task that simply runs `tt-smi -s`. Tasks can be used for training, fine-tuning, batch inference, or antything else.

<div editor-title="examples/acceleators/tenstorrent/tt-smi.dstack.yml">

```yaml
type: task
# The name is optional, if not specified, generated randomly
name: tt-smi

env:
- HF_TOKEN

# (Required) Use any image with TT drivers
image: dstackai/tt-smi:latest

# Use any commands
commands:
- tt-smi -s

# Specify the number of accelerators, model, etc
resources:
gpu: n150:1

# Uncomment if you want to run on a cluster of nodes
#nodes: 2
```

</div>

> Tasks support many options, including multi-node configuration, max duration, etc. To learn more, refer to [Tasks](https://dstack.ai/docs/concepts/tasks).

## Dev environments

Below is an example of a dev environment configuration. It can be used to provision a dev environemnt that can be accessed via your desktop IDE.

<div editor-title="examples/acceleators/tenstorrent/.dstack.yml">

```yaml
type: dev-environment
# The name is optional, if not specified, generated randomly
name: cursor

# (Optional) List required env variables
env:
- HF_TOKEN

image: dstackai/tt-smi:latest

# Can be `vscode` or `cursor`
ide: cursor

resources:
gpu: n150:1
```

</div>

If you run it via `dstack apply`, it will output the URL to access it via your desktop IDE.

![](https://github.com/dstackai/static-assets/blob/main/static-assets/images/dstack-tenstorrent-cursor.png?raw=true){ width=800 }

> Dev nevironments support many options, including inactivity and max duration, IDE configuration, etc. To learn more, refer to [Dev environments](https://dstack.ai/docs/concepts/tasks).

??? info "Feedback"
Found a bug, or want to request a feature? File it in the [issue tracker :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/issues){:target="_blank"},
or share via [Discord :material-arrow-top-right-thin:{ .external }](https://discord.gg/u8SmfwPpMd){:target="_blank"}.
24 changes: 24 additions & 0 deletions examples/accelerators/tenstorrent/tt-inference-server.dstack.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
type: service
name: tt-inference-server

env:
- HF_TOKEN
- HF_MODEL_REPO_ID=meta-llama/Llama-3.2-1B-Instruct
image: ghcr.io/tenstorrent/tt-inference-server/vllm-tt-metal-src-release-ubuntu-20.04-amd64:0.0.4-v0.56.0-rc47-e2e0002ac7dc
commands:
- |
. ${PYTHON_ENV_DIR}/bin/activate
pip install "huggingface_hub[cli]"
export LLAMA_DIR="/data/models--$(echo "$HF_MODEL_REPO_ID" | sed 's/\//--/g')/"
huggingface-cli download $HF_MODEL_REPO_ID --local-dir $LLAMA_DIR
python /home/container_app_user/app/src/run_vllm_api_server.py
port: 7000

model: meta-llama/Llama-3.2-1B-Instruct

# Cache downloaded model
volumes:
- /mnt/data/tt-inference-server/data:/data

resources:
gpu: n150:1
10 changes: 10 additions & 0 deletions examples/accelerators/tenstorrent/tt-smi.dstack.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
type: task
name: tt-smi

image: dstackai/tt-smi:latest

commands:
- tt-smi -s

resources:
gpu: n150:1
3 changes: 2 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -256,8 +256,9 @@ nav:
- Llama: examples/llms/llama/index.md
- Accelerators:
- AMD: examples/accelerators/amd/index.md
- Intel Gaudi: examples/accelerators/intel/index.md
- TPU: examples/accelerators/tpu/index.md
- Intel Gaudi: examples/accelerators/intel/index.md
- Tenstorrent: examples/accelerators/tenstorrent/index.md
- Misc:
- Docker Compose: examples/misc/docker-compose/index.md
- NCCL Tests: examples/misc/nccl-tests/index.md
Expand Down
Loading