diff --git a/docs/docs/concepts/fleets.md b/docs/docs/concepts/fleets.md index b59848ab7b..5c24f73c5c 100644 --- a/docs/docs/concepts/fleets.md +++ b/docs/docs/concepts/fleets.md @@ -260,6 +260,10 @@ Define a fleet configuration as a YAML file in your project directory. The file 2. Hosts with Intel Gaudi accelerators should be pre-installed with [Gaudi software and drivers](https://docs.habana.ai/en/latest/Installation_Guide/Driver_Installation.html#driver-installation). This should include the drivers, `hl-smi`, and Habana Container Runtime. + === "Tenstorrent" + 2. Hosts with Tenstorrent accelerators should be pre-installed with [Tenstorrent software](https://docs.tenstorrent.com/getting-started/README.html#software-installation). + This should include the drivers, `tt-smi`, and HugePages. + 3. The user specified should have passwordless `sudo` access. To create or update the fleet, pass the fleet configuration to [`dstack apply`](../reference/cli/dstack/apply.md): diff --git a/docs/docs/index.md b/docs/docs/index.md index fdcb1d383c..cf94916936 100644 --- a/docs/docs/index.md +++ b/docs/docs/index.md @@ -7,7 +7,7 @@ for AI workloads both in the cloud and on-prem, speeding up the development, tra #### Accelerators -`dstack` supports `NVIDIA`, `AMD`, `Google TPU`, and `Intel Gaudi` accelerators out of the box. +`dstack` supports `NVIDIA`, `AMD`, `TPU`, `Intel Gaudi`, and `Tenstorrent` accelerators out of the box. ## How does it work? diff --git a/docs/examples.md b/docs/examples.md index 88c1440978..ff6122a998 100644 --- a/docs/examples.md +++ b/docs/examples.md @@ -101,6 +101,17 @@ hide:

+ +

+ TPU +

+ +

+ Deploy and fine-tune LLMs on TPU +

+
+

@@ -112,15 +123,14 @@ hide:

- -

- TPU + Tenstorrent

- Deploy and fine-tune LLMs on TPU + Deploy and fine-tune LLMs on Tenstorrent

diff --git a/docs/examples/accelerators/tenstorrent/index.md b/docs/examples/accelerators/tenstorrent/index.md new file mode 100644 index 0000000000..e69de29bb2 diff --git a/examples/accelerators/tenstorrent/.dstack.yml b/examples/accelerators/tenstorrent/.dstack.yml new file mode 100644 index 0000000000..6e3319a001 --- /dev/null +++ b/examples/accelerators/tenstorrent/.dstack.yml @@ -0,0 +1,9 @@ +type: dev-environment +name: cursor + +image: dstackai/tt-smi:latest + +ide: cursor + +resources: + gpu: n150:1 diff --git a/examples/accelerators/tenstorrent/README.md b/examples/accelerators/tenstorrent/README.md new file mode 100644 index 0000000000..557bac7df0 --- /dev/null +++ b/examples/accelerators/tenstorrent/README.md @@ -0,0 +1,193 @@ +# Tenstorrent + +`dstack` supports running dev environments, tasks, and services on Tenstorrent +[Wormwhole :material-arrow-top-right-thin:{ .external }](https://tenstorrent.com/en/hardware/wormhole){:target="_blank"} accelerators via SSH fleets. + + +??? info "SSH fleets" +
+ + ```yaml + type: fleet + name: wormwhole-fleet + + ssh_config: + user: root + identity_file: ~/.ssh/id_rsa + # Configure any number of hosts with n150 or n300 PCEe boards + hosts: + - 192.168.2.108 + ``` + +
+ + > Hosts should be pre-installed with [Tenstorrent software](https://docs.tenstorrent.com/getting-started/README.html#software-installation). + This should include the drivers, `tt-smi`, and HugePages. + + To apply the fleet configuration, run: + +
+ + ```bash + $ dstack apply -f examples/acceleators/tenstorrent/fleet.dstack.yml + + FLEET RESOURCES PRICE STATUS CREATED + wormwhole-fleet cpu=12 mem=32GB disk=243GB n150:12GB $0 idle 18 sec ago + ``` + +
+ + For more details on fleet configuration, refer to [SSH fleets](https://dstack.ai/docs/concepts/fleets#ssh). + +## Services + +Here's an example of a service that deploys +[`Llama-3.2-1B-Instruct` :material-arrow-top-right-thin:{ .external }](https://huggingface.co/meta-llama/Llama-3.2-1B){:target="_blank"} +using [Tenstorrent Inference Service :material-arrow-top-right-thin:{ .external }](https://github.com/tenstorrent/tt-inference-server){:target="_blank"}. + +
+ +```yaml +type: service +name: tt-inference-server + +env: + - HF_TOKEN + - HF_MODEL_REPO_ID=meta-llama/Llama-3.2-1B-Instruct +image: ghcr.io/tenstorrent/tt-inference-server/vllm-tt-metal-src-release-ubuntu-20.04-amd64:0.0.4-v0.56.0-rc47-e2e0002ac7dc +commands: + - | + . ${PYTHON_ENV_DIR}/bin/activate + pip install "huggingface_hub[cli]" + export LLAMA_DIR="/data/models--$(echo "$HF_MODEL_REPO_ID" | sed 's/\//--/g')/" + huggingface-cli download $HF_MODEL_REPO_ID --local-dir $LLAMA_DIR + python /home/container_app_user/app/src/run_vllm_api_server.py +port: 7000 + +model: meta-llama/Llama-3.2-1B-Instruct + +# Cache downloaded model +volumes: + - /mnt/data/tt-inference-server/data:/data + +resources: + gpu: n150:1 +``` + +
+ +Go ahead and run configuration using `dstack apply`: + +
+ + ```bash + $ dstack apply -f examples/acceleators/tenstorrent/tt-inference-server.dstack.yml + ``` +
+ +Once the service is up, it will be available via the service endpoint +at `/proxy/services///`. + +
+ +```shell +$ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \ + -X POST \ + -H 'Authorization: Bearer <dstack token>' \ + -H 'Content-Type: application/json' \ + -d '{ + "model": "meta-llama/Llama-3.2-1B-Instruct", + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." + }, + { + "role": "user", + "content": "What is Deep Learning?" + } + ], + "stream": true, + "max_tokens": 512 + }' +``` + +
+ +Additionally, the model is available via `dstack`'s control plane UI: + +![](https://github.com/dstackai/static-assets/blob/main/static-assets/images/dstack-tenstorrent-model-ui.png?raw=true){ width=800 } + +When a [gateway](https://dstack.ai/docs/concepts/gateways.md) is configured, the service endpoint +is available at `https://./`. + +> Services support many options, including authentication, auto-scaling policies, etc. To learn more, refer to [Services](https://dstack.ai/docs/concepts/services). + +## Tasks + +Below is a task that simply runs `tt-smi -s`. Tasks can be used for training, fine-tuning, batch inference, or antything else. + +
+ +```yaml +type: task +# The name is optional, if not specified, generated randomly +name: tt-smi + +env: + - HF_TOKEN + +# (Required) Use any image with TT drivers +image: dstackai/tt-smi:latest + +# Use any commands +commands: + - tt-smi -s + +# Specify the number of accelerators, model, etc +resources: + gpu: n150:1 + +# Uncomment if you want to run on a cluster of nodes +#nodes: 2 +``` + +
+ +> Tasks support many options, including multi-node configuration, max duration, etc. To learn more, refer to [Tasks](https://dstack.ai/docs/concepts/tasks). + +## Dev environments + +Below is an example of a dev environment configuration. It can be used to provision a dev environemnt that can be accessed via your desktop IDE. + +
+ +```yaml +type: dev-environment +# The name is optional, if not specified, generated randomly +name: cursor + +# (Optional) List required env variables +env: + - HF_TOKEN + +image: dstackai/tt-smi:latest + +# Can be `vscode` or `cursor` +ide: cursor + +resources: + gpu: n150:1 +``` + +
+ +If you run it via `dstack apply`, it will output the URL to access it via your desktop IDE. + +![](https://github.com/dstackai/static-assets/blob/main/static-assets/images/dstack-tenstorrent-cursor.png?raw=true){ width=800 } + +> Dev nevironments support many options, including inactivity and max duration, IDE configuration, etc. To learn more, refer to [Dev environments](https://dstack.ai/docs/concepts/tasks). + +??? info "Feedback" + Found a bug, or want to request a feature? File it in the [issue tracker :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/issues){:target="_blank"}, + or share via [Discord :material-arrow-top-right-thin:{ .external }](https://discord.gg/u8SmfwPpMd){:target="_blank"}. diff --git a/examples/accelerators/tenstorrent/tt-inference-server.dstack.yml b/examples/accelerators/tenstorrent/tt-inference-server.dstack.yml new file mode 100644 index 0000000000..6f1815ead1 --- /dev/null +++ b/examples/accelerators/tenstorrent/tt-inference-server.dstack.yml @@ -0,0 +1,24 @@ +type: service +name: tt-inference-server + +env: + - HF_TOKEN + - HF_MODEL_REPO_ID=meta-llama/Llama-3.2-1B-Instruct +image: ghcr.io/tenstorrent/tt-inference-server/vllm-tt-metal-src-release-ubuntu-20.04-amd64:0.0.4-v0.56.0-rc47-e2e0002ac7dc +commands: + - | + . ${PYTHON_ENV_DIR}/bin/activate + pip install "huggingface_hub[cli]" + export LLAMA_DIR="/data/models--$(echo "$HF_MODEL_REPO_ID" | sed 's/\//--/g')/" + huggingface-cli download $HF_MODEL_REPO_ID --local-dir $LLAMA_DIR + python /home/container_app_user/app/src/run_vllm_api_server.py +port: 7000 + +model: meta-llama/Llama-3.2-1B-Instruct + +# Cache downloaded model +volumes: + - /mnt/data/tt-inference-server/data:/data + +resources: + gpu: n150:1 diff --git a/examples/accelerators/tenstorrent/tt-smi.dstack.yml b/examples/accelerators/tenstorrent/tt-smi.dstack.yml new file mode 100644 index 0000000000..b9478cb166 --- /dev/null +++ b/examples/accelerators/tenstorrent/tt-smi.dstack.yml @@ -0,0 +1,10 @@ +type: task +name: tt-smi + +image: dstackai/tt-smi:latest + +commands: + - tt-smi -s + +resources: + gpu: n150:1 diff --git a/mkdocs.yml b/mkdocs.yml index f7f8464613..b86176de0d 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -256,8 +256,9 @@ nav: - Llama: examples/llms/llama/index.md - Accelerators: - AMD: examples/accelerators/amd/index.md - - Intel Gaudi: examples/accelerators/intel/index.md - TPU: examples/accelerators/tpu/index.md + - Intel Gaudi: examples/accelerators/intel/index.md + - Tenstorrent: examples/accelerators/tenstorrent/index.md - Misc: - Docker Compose: examples/misc/docker-compose/index.md - NCCL Tests: examples/misc/nccl-tests/index.md