|
| 1 | +# Tenstorrent |
| 2 | + |
| 3 | +`dstack` supports running dev environments, tasks, and services on Tenstorrent |
| 4 | +[Wormwhole :material-arrow-top-right-thin:{ .external }](https://tenstorrent.com/en/hardware/wormhole){:target="_blank"} accelerators via SSH fleets. |
| 5 | + |
| 6 | + |
| 7 | +??? info "SSH fleets" |
| 8 | + <div editor-title="examples/acceleators/tenstorrent/fleet.dstack.yml"> |
| 9 | + |
| 10 | + ```yaml |
| 11 | + type: fleet |
| 12 | + name: wormwhole-fleet |
| 13 | + |
| 14 | + ssh_config: |
| 15 | + user: root |
| 16 | + identity_file: ~/.ssh/id_rsa |
| 17 | + # Configure any number of hosts with n150 or n300 PCEe boards |
| 18 | + hosts: |
| 19 | + - 192.168.2.108 |
| 20 | + ``` |
| 21 | + |
| 22 | + </div> |
| 23 | + |
| 24 | + > Hosts should be pre-installed with [Tenstorrent software](https://docs.tenstorrent.com/getting-started/README.html#software-installation). |
| 25 | + This should include the drivers, `tt-smi`, and HugePages. |
| 26 | + |
| 27 | + To apply the fleet configuration, run: |
| 28 | + |
| 29 | + <div class="termy"> |
| 30 | + |
| 31 | + ```bash |
| 32 | + $ dstack apply -f examples/acceleators/tenstorrent/fleet.dstack.yml |
| 33 | + |
| 34 | + FLEET RESOURCES PRICE STATUS CREATED |
| 35 | + wormwhole-fleet cpu=12 mem=32GB disk=243GB n150:12GB $0 idle 18 sec ago |
| 36 | + ``` |
| 37 | + |
| 38 | + </div> |
| 39 | + |
| 40 | + For more details on fleet configuration, refer to [SSH fleets](https://dstack.ai/docs/concepts/fleets#ssh). |
| 41 | + |
| 42 | +## Services |
| 43 | + |
| 44 | +Here's an example of a service that deploys |
| 45 | +[`Llama-3.2-1B-Instruct` :material-arrow-top-right-thin:{ .external }](https://huggingface.co/meta-llama/Llama-3.2-1B){:target="_blank"} |
| 46 | +using [Tenstorrent Inference Service :material-arrow-top-right-thin:{ .external }](https://github.com/tenstorrent/tt-inference-server){:target="_blank"}. |
| 47 | + |
| 48 | +<div editor-title="examples/acceleators/tenstorrent/tt-inference-server.dstack.yml"> |
| 49 | + |
| 50 | +```yaml |
| 51 | +type: service |
| 52 | +name: tt-inference-server |
| 53 | + |
| 54 | +env: |
| 55 | + - HF_TOKEN |
| 56 | + - HF_MODEL_REPO_ID=meta-llama/Llama-3.2-1B-Instruct |
| 57 | +image: ghcr.io/tenstorrent/tt-inference-server/vllm-tt-metal-src-release-ubuntu-20.04-amd64:0.0.4-v0.56.0-rc47-e2e0002ac7dc |
| 58 | +commands: |
| 59 | + - | |
| 60 | + . ${PYTHON_ENV_DIR}/bin/activate |
| 61 | + pip install "huggingface_hub[cli]" |
| 62 | + export LLAMA_DIR="/data/models--$(echo "$HF_MODEL_REPO_ID" | sed 's/\//--/g')/" |
| 63 | + huggingface-cli download $HF_MODEL_REPO_ID --local-dir $LLAMA_DIR |
| 64 | + python /home/container_app_user/app/src/run_vllm_api_server.py |
| 65 | +port: 7000 |
| 66 | + |
| 67 | +model: meta-llama/Llama-3.2-1B-Instruct |
| 68 | + |
| 69 | +# Cache downloaded model |
| 70 | +volumes: |
| 71 | + - /mnt/data/tt-inference-server/data:/data |
| 72 | + |
| 73 | +resources: |
| 74 | + gpu: n150:1 |
| 75 | +``` |
| 76 | +
|
| 77 | +</div> |
| 78 | +
|
| 79 | +Go ahead and run configuration using `dstack apply`: |
| 80 | + |
| 81 | +<div class="termy"> |
| 82 | + |
| 83 | + ```bash |
| 84 | + $ dstack apply -f examples/acceleators/tenstorrent/tt-inference-server.dstack.yml |
| 85 | + ``` |
| 86 | +</div> |
| 87 | + |
| 88 | +Once the service is up, it will be available via the service endpoint |
| 89 | +at `<dstack server URL>/proxy/services/<project name>/<run name>/`. |
| 90 | + |
| 91 | +<div class="termy"> |
| 92 | + |
| 93 | +```shell |
| 94 | +$ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \ |
| 95 | + -X POST \ |
| 96 | + -H 'Authorization: Bearer <dstack token>' \ |
| 97 | + -H 'Content-Type: application/json' \ |
| 98 | + -d '{ |
| 99 | + "model": "meta-llama/Llama-3.2-1B-Instruct", |
| 100 | + "messages": [ |
| 101 | + { |
| 102 | + "role": "system", |
| 103 | + "content": "You are a helpful assistant." |
| 104 | + }, |
| 105 | + { |
| 106 | + "role": "user", |
| 107 | + "content": "What is Deep Learning?" |
| 108 | + } |
| 109 | + ], |
| 110 | + "stream": true, |
| 111 | + "max_tokens": 512 |
| 112 | + }' |
| 113 | +``` |
| 114 | + |
| 115 | +</div> |
| 116 | + |
| 117 | +Additionally, the model is available via `dstack`'s control plane UI: |
| 118 | + |
| 119 | +{ width=800 } |
| 120 | + |
| 121 | +When a [gateway](https://dstack.ai/docs/concepts/gateways.md) is configured, the service endpoint |
| 122 | +is available at `https://<run name>.<gateway domain>/`. |
| 123 | + |
| 124 | +> Services support many options, including authentication, auto-scaling policies, etc. To learn more, refer to [Services](https://dstack.ai/docs/concepts/services). |
| 125 | + |
| 126 | +## Tasks |
| 127 | + |
| 128 | +Below is a task that simply runs `tt-smi -s`. Tasks can be used for training, fine-tuning, batch inference, or antything else. |
| 129 | + |
| 130 | +<div editor-title="examples/acceleators/tenstorrent/tt-smi.dstack.yml"> |
| 131 | + |
| 132 | +```yaml |
| 133 | +type: task |
| 134 | +# The name is optional, if not specified, generated randomly |
| 135 | +name: tt-smi |
| 136 | +
|
| 137 | +env: |
| 138 | + - HF_TOKEN |
| 139 | +
|
| 140 | +# (Required) Use any image with TT drivers |
| 141 | +image: dstackai/tt-smi:latest |
| 142 | +
|
| 143 | +# Use any commands |
| 144 | +commands: |
| 145 | + - tt-smi -s |
| 146 | +
|
| 147 | +# Specify the number of accelerators, model, etc |
| 148 | +resources: |
| 149 | + gpu: n150:1 |
| 150 | +
|
| 151 | +# Uncomment if you want to run on a cluster of nodes |
| 152 | +#nodes: 2 |
| 153 | +``` |
| 154 | + |
| 155 | +</div> |
| 156 | + |
| 157 | +> Tasks support many options, including multi-node configuration, max duration, etc. To learn more, refer to [Tasks](https://dstack.ai/docs/concepts/tasks). |
| 158 | + |
| 159 | +## Dev environments |
| 160 | + |
| 161 | +Below is an example of a dev environment configuration. It can be used to provision a dev environemnt that can be accessed via your desktop IDE. |
| 162 | + |
| 163 | +<div editor-title="examples/acceleators/tenstorrent/.dstack.yml"> |
| 164 | + |
| 165 | +```yaml |
| 166 | +type: dev-environment |
| 167 | +# The name is optional, if not specified, generated randomly |
| 168 | +name: cursor |
| 169 | +
|
| 170 | +# (Optional) List required env variables |
| 171 | +env: |
| 172 | + - HF_TOKEN |
| 173 | +
|
| 174 | +image: dstackai/tt-smi:latest |
| 175 | +
|
| 176 | +# Can be `vscode` or `cursor` |
| 177 | +ide: cursor |
| 178 | + |
| 179 | +resources: |
| 180 | + gpu: n150:1 |
| 181 | +``` |
| 182 | +
|
| 183 | +</div> |
| 184 | +
|
| 185 | +If you run it via `dstack apply`, it will output the URL to access it via your desktop IDE. |
| 186 | + |
| 187 | +{ width=800 } |
| 188 | + |
| 189 | +> Dev nevironments support many options, including inactivity and max duration, IDE configuration, etc. To learn more, refer to [Dev environments](https://dstack.ai/docs/concepts/tasks). |
| 190 | + |
| 191 | +??? info "Feedback" |
| 192 | + Found a bug, or want to request a feature? File it in the [issue tracker :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/issues){:target="_blank"}, |
| 193 | + or share via [Discord :material-arrow-top-right-thin:{ .external }](https://discord.gg/u8SmfwPpMd){:target="_blank"}. |
0 commit comments