Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

README.md

title NVIDIA NIM
description This example shows how to deploy DeepSeek-R1-Distill-Llama-8B to any cloud or on-premises environment using NVIDIA NIM and dstack.

NVIDIA NIM

This example shows how to deploy DeepSeek-R1-Distill-Llama-8B using NVIDIA NIM :material-arrow-top-right-thin:{ .external }{:target="_blank"} and dstack.

??? info "Prerequisites" Once dstack is installed, clone the repo with examples.

<div class="termy">

```shell
$ git clone https://github.com/dstackai/dstack
$ cd dstack
```

</div>

Deployment

Here's an example of a service that deploys DeepSeek-R1-Distill-Llama-8B using NIM.

type: service
name: serve-distill-deepseek

image: nvcr.io/nim/deepseek-ai/deepseek-r1-distill-llama-8b
env:
  - NGC_API_KEY
  - NIM_MAX_MODEL_LEN=4096
registry_auth:
  username: $oauthtoken
  password: ${{ env.NGC_API_KEY }}
port: 8000
# Register the model
model: deepseek-ai/deepseek-r1-distill-llama-8b

# Uncomment to leverage spot instances
#spot_policy: auto

# Cache downloaded models
volumes:
  - instance_path: /root/.cache/nim
    path: /opt/nim/.cache
    optional: true

resources:
  gpu: A100:40GB
  # Uncomment if using multiple GPUs
  #shm_size: 16GB

Running a configuration

To run a configuration, use the dstack apply command.

$ NGC_API_KEY=...
$ dstack apply -f examples/inference/nim/.dstack.yml

 #  BACKEND  REGION    RESOURCES                  SPOT  PRICE
 1  vultr    ewr       6xCPU, 60GB, 1xA100 (40GB) no    $1.199
 2  vultr    ewr       6xCPU, 60GB, 1xA100 (40GB) no    $1.199
 3  vultr    nrt       6xCPU, 60GB, 1xA100 (40GB) no    $1.199

Submit the run serve-distill-deepseek? [y/n]: y

Provisioning...
---> 100%

If no gateway is created, the model will be available via the OpenAI-compatible endpoint at <dstack server URL>/proxy/models/<project name>/.

$ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
    -X POST \
    -H 'Authorization: Bearer &lt;dstack token&gt;' \
    -H 'Content-Type: application/json' \
    -d '{
      "model": "meta/llama3-8b-instruct",
      "messages": [
        {
          "role": "system",
          "content": "You are a helpful assistant."
        },
        {
          "role": "user",
          "content": "What is Deep Learning?"
        }
      ],
      "max_tokens": 128
    }'

When a gateway is configured, the OpenAI-compatible endpoint is available at https://gateway.<gateway domain>/.

Source code

The source-code of this example can be found in examples/inference/nim :material-arrow-top-right-thin:{ .external }{:target="_blank"}.

What's next?

  1. Check services
  2. Browse the DeepSeek AI NIM