Skip to content

Commit ebbef3c

Browse files
BihanBihan  Rana
andauthored
Update NIM example with DeepSeek-R1-Distill-Llama-8b (#2454)
Resolve Review Comments Update NIM deepseek-r1-distill-llama-8b to latest Update NIM deepseek-r1-distill-llama-8b to latest Co-authored-by: Bihan Rana <bihan@Bihans-MacBook-Pro.local>
1 parent ead8c8a commit ebbef3c

File tree

3 files changed

+23
-24
lines changed

3 files changed

+23
-24
lines changed

docs/examples.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ hide:
3838
NIM
3939
</h3>
4040
<p>
41-
Deploy Llama 3.1 with NIM
41+
Deploy DeepSeek R1 Distill Llama 8B with NIM
4242
</p>
4343
</a>
4444
<a href="/examples/deployment/sglang"
Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
type: service
2-
name: qwen-nim
2+
name: serve-distill-deepseek
33

4-
image: nvcr.io/nim/qwen/qwen-2.5-7b-instruct:latest
4+
image: nvcr.io/nim/deepseek-ai/deepseek-r1-distill-llama-8b
55
env:
66
- NGC_API_KEY
77
- NIM_MAX_MODEL_LEN=4096
@@ -10,7 +10,7 @@ registry_auth:
1010
password: ${{ env.NGC_API_KEY }}
1111
port: 8000
1212
# Register the model
13-
model: qwen/qwen-2.5-7b-instruct
13+
model: deepseek-ai/deepseek-r1-distill-llama-8b
1414

1515
# Uncomment to leverage spot instances
1616
#spot_policy: auto
@@ -22,6 +22,6 @@ volumes:
2222
optional: true
2323

2424
resources:
25-
gpu: 24GB
25+
gpu: A100:40GB
2626
# Uncomment if using multiple GPUs
27-
shm_size: 16GB
27+
#shm_size: 16GB

examples/deployment/nim/README.md

Lines changed: 17 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
---
22
title: NVIDIA NIM
3-
description: "This example shows how to deploy Llama 3.1 to any cloud or on-premises environment using NVIDIA NIM and dstack."
3+
description: "This example shows how to deploy DeepSeek-R1-Distill-Llama-8B to any cloud or on-premises environment using NVIDIA NIM and dstack."
44
---
55

66
# NVIDIA NIM
77

8-
This example shows how to deploy LLama 3.1 using [NVIDIA NIM :material-arrow-top-right-thin:{ .external }](https://docs.nvidia.com/nim/large-language-models/latest/getting-started.html){:target="_blank"} and `dstack`.
8+
This example shows how to deploy DeepSeek-R1-Distill-Llama-8B using [NVIDIA NIM :material-arrow-top-right-thin:{ .external }](https://docs.nvidia.com/nim/large-language-models/latest/getting-started.html){:target="_blank"} and `dstack`.
99

1010
??? info "Prerequisites"
1111
Once `dstack` is [installed](https://dstack.ai/docs/installation), go ahead clone the repo, and run `dstack init`.
@@ -22,15 +22,15 @@ This example shows how to deploy LLama 3.1 using [NVIDIA NIM :material-arrow-top
2222

2323
## Deployment
2424

25-
Here's an example of a service that deploys Llama 3.1 8B using vLLM.
25+
Here's an example of a service that deploys DeepSeek-R1-Distill-Llama-8B using NIM.
2626

2727
<div editor-title="examples/deployment/nim/.dstack.yml">
2828

2929
```yaml
3030
type: service
31-
name: llama31
31+
name: serve-distill-deepseek
3232

33-
image: nvcr.io/nim/meta/llama-3.1-8b-instruct:latest
33+
image: nvcr.io/nim/deepseek-ai/deepseek-r1-distill-llama-8b
3434
env:
3535
- NGC_API_KEY
3636
- NIM_MAX_MODEL_LEN=4096
@@ -39,19 +39,21 @@ registry_auth:
3939
password: ${{ env.NGC_API_KEY }}
4040
port: 8000
4141
# Register the model
42-
model: meta/llama-3.1-8b-instruct
42+
model: deepseek-ai/deepseek-r1-distill-llama-8b
4343

4444
# Uncomment to leverage spot instances
4545
#spot_policy: auto
4646

4747
# Cache downloaded models
4848
volumes:
49-
- /root/.cache/nim:/opt/nim/.cache
49+
- instance_path: /root/.cache/nim
50+
path: /opt/nim/.cache
51+
optional: true
5052

5153
resources:
52-
gpu: 24GB
54+
gpu: A100:40GB
5355
# Uncomment if using multiple GPUs
54-
#shm_size: 24GB
56+
#shm_size: 16GB
5557
```
5658
</div>
5759

@@ -65,12 +67,12 @@ To run a configuration, use the [`dstack apply`](https://dstack.ai/docs/referenc
6567
$ NGC_API_KEY=...
6668
$ dstack apply -f examples/deployment/nim/.dstack.yml
6769

68-
# BACKEND REGION RESOURCES SPOT PRICE
69-
1 gcp asia-northeast3 4xCPU, 16GB, 1xL4 (24GB) yes $0.17
70-
2 gcp asia-east1 4xCPU, 16GB, 1xL4 (24GB) yes $0.21
71-
3 gcp asia-northeast3 8xCPU, 32GB, 1xL4 (24GB) yes $0.21
70+
# BACKEND REGION RESOURCES SPOT PRICE
71+
1 vultr ewr 6xCPU, 60GB, 1xA100 (40GB) no $1.199
72+
2 vultr ewr 6xCPU, 60GB, 1xA100 (40GB) no $1.199
73+
3 vultr nrt 6xCPU, 60GB, 1xA100 (40GB) no $1.199
7274

73-
Submit the run llama3-nim-task? [y/n]: y
75+
Submit the run serve-distill-deepseek? [y/n]: y
7476

7577
Provisioning...
7678
---> 100%
@@ -116,7 +118,4 @@ The source-code of this example can be found in
116118
## What's next?
117119

118120
1. Check [services](https://dstack.ai/docs/services)
119-
2. Browse the [Llama 3.1](https://dstack.ai/examples/llms/llama31/), [TGI](https://dstack.ai/examples/deployment/tgi/),
120-
and [vLLM](https://dstack.ai/examples/deployment/vllm/) examples
121-
3. See also [AMD](https://dstack.ai/examples/accelerators/amd/) and
122-
[TPU](https://dstack.ai/examples/accelerators/tpu/)
121+
2. Browse the [DeepSeek AI NIM](https://build.nvidia.com/deepseek-ai)

0 commit comments

Comments
 (0)