Name	Name	Last commit message	Last commit date
parent directory ..
.dstack.yml	.dstack.yml
README.md	README.md

Name

Last commit message

Last commit date

title	NVIDIA NIM
description	This example shows how to deploy DeepSeek-R1-Distill-Llama-8B to any cloud or on-premises environment using NVIDIA NIM and dstack.

NVIDIA NIM

This example shows how to deploy DeepSeek-R1-Distill-Llama-8B using NVIDIA NIM :material-arrow-top-right-thin:{ .external }{:target="_blank"} and dstack.

??? info "Prerequisites" Once dstack is installed, clone the repo with examples.

<div class="termy">

```shell
$ git clone https://github.com/dstackai/dstack
$ cd dstack
```

</div>

Deployment

Here's an example of a service that deploys DeepSeek-R1-Distill-Llama-8B using NIM.

type: service
name: serve-distill-deepseek

image: nvcr.io/nim/deepseek-ai/deepseek-r1-distill-llama-8b
env:
  - NGC_API_KEY
  - NIM_MAX_MODEL_LEN=4096
registry_auth:
  username: $oauthtoken
  password: ${{ env.NGC_API_KEY }}
port: 8000
# Register the model
model: deepseek-ai/deepseek-r1-distill-llama-8b

# Uncomment to leverage spot instances
#spot_policy: auto

# Cache downloaded models
volumes:
  - instance_path: /root/.cache/nim
    path: /opt/nim/.cache
    optional: true

resources:
  gpu: A100:40GB
  # Uncomment if using multiple GPUs
  #shm_size: 16GB

Running a configuration

To run a configuration, use the dstack apply command.

$ NGC_API_KEY=...
$ dstack apply -f examples/inference/nim/.dstack.yml

 #  BACKEND  REGION    RESOURCES                  SPOT  PRICE
 1  vultr    ewr       6xCPU, 60GB, 1xA100 (40GB) no    $1.199
 2  vultr    ewr       6xCPU, 60GB, 1xA100 (40GB) no    $1.199
 3  vultr    nrt       6xCPU, 60GB, 1xA100 (40GB) no    $1.199

Submit the run serve-distill-deepseek? [y/n]: y

Provisioning...
---> 100%

If no gateway is created, the model will be available via the OpenAI-compatible endpoint at <dstack server URL>/proxy/models/<project name>/.

$ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
    -X POST \
    -H 'Authorization: Bearer &lt;dstack token&gt;' \
    -H 'Content-Type: application/json' \
    -d '{
      "model": "meta/llama3-8b-instruct",
      "messages": [
        {
          "role": "system",
          "content": "You are a helpful assistant."
        },
        {
          "role": "user",
          "content": "What is Deep Learning?"
        }
      ],
      "max_tokens": 128
    }'

When a gateway is configured, the OpenAI-compatible endpoint is available at https://gateway.<gateway domain>/.

Source code

The source-code of this example can be found in examples/inference/nim :material-arrow-top-right-thin:{ .external }{:target="_blank"}.

What's next?

Check services
Browse the DeepSeek AI NIM

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

README.md

NVIDIA NIM

Deployment

Running a configuration

Source code

What's next?

Uh oh!

FilesExpand file tree

nim

Directory actions

More options

Directory actions

More options

Latest commit

History

nim

Folders and files

parent directory

README.md

NVIDIA NIM

Deployment

Running a configuration

Source code

What's next?