| title | Integration with Dynamo |
|---|
⚠️ Experimental Feature: ChReK is currently in beta/preview. The ChReK DaemonSet runs in privileged mode to perform CRIU operations. See Limitations for details.
Checkpointing captures the complete state of a running worker pod (including GPU memory) and saves it to storage. New pods can restore from this checkpoint instead of performing a full cold start.
| Startup Type | Time | What Happens |
|---|---|---|
| Cold Start | ~1 min | Download model, load to GPU, initialize engine |
| Warm Start (checkpoint) | < 10 sec | Restore from checkpoint tar |
- Dynamo Platform installed (v0.4.0+) on k8s cluster with GPU nodes
- ChReK Helm chart installed (separate from platform)
- RWX PVC storage (PVC is currently the only supported backend)
First, install the ChReK Helm chart in each namespace where you need checkpointing:
# Install ChReK infrastructure
helm install chrek nvidia/chrek \
--namespace my-team \
--create-namespace \
--set storage.pvc.size=100GiThis creates:
- A PVC for checkpoint storage (
chrek-pvc) - A DaemonSet for CRIU operations (
chrek-agent)
Update your Helm values to point to the ChReK infrastructure:
# values.yaml
dynamo-operator:
checkpoint:
enabled: true
storage:
type: pvc # Only PVC is currently supported (S3/OCI planned)
pvc:
pvcName: "chrek-pvc" # Must match ChReK chart
basePath: "/checkpoints"
signalHostPath: "/var/lib/chrek/signals" # Must match ChReK chartAdd checkpoint configuration to your worker service. Both vLLM and SGLang are supported — use the appropriate backendFramework, command, and CLI flags.
apiVersion: nvidia.com/v1alpha1
kind: DynamoGraphDeployment
metadata:
name: my-llm
spec:
services:
worker:
replicas: 1
extraPodSpec:
mainContainer:
image: nvcr.io/nvidia/ai-dynamo/dynamo-vllm-placeholder:latest
command: ["python3"]
args:
- "-m"
- "dynamo.vllm"
- "--model"
- "meta-llama/Llama-3-8B"
- "--max-model-len"
- "4096"
- "--gpu-memory-utilization"
- "0.90"
env:
# Required for cross-node checkpoint/restore
- name: GLOO_SOCKET_IFNAME
value: "lo"
- name: NCCL_SOCKET_IFNAME
value: "lo"
resources:
limits:
nvidia.com/gpu: "1"
checkpoint:
enabled: true
mode: auto
identity:
model: "meta-llama/Llama-3-8B"
backendFramework: "vllm"
tensorParallelSize: 1
dtype: "bfloat16"
maxModelLen: 4096apiVersion: nvidia.com/v1alpha1
kind: DynamoGraphDeployment
metadata:
name: my-sglang-llm
spec:
services:
worker:
replicas: 1
extraPodSpec:
mainContainer:
image: nvcr.io/nvidia/ai-dynamo/dynamo-sglang-placeholder:latest
command: ["python3"]
args:
- "-m"
- "dynamo.sglang"
- "--model"
- "meta-llama/Llama-3-8B"
- "--mem-fraction-static"
- "0.90"
env:
# Required for cross-node checkpoint/restore
- name: GLOO_SOCKET_IFNAME
value: "lo"
- name: NCCL_SOCKET_IFNAME
value: "lo"
resources:
limits:
nvidia.com/gpu: "1"
checkpoint:
enabled: true
mode: auto
identity:
model: "meta-llama/Llama-3-8B"
backendFramework: "sglang"
tensorParallelSize: 1
dtype: "bfloat16"
maxModelLen: 4096Key differences between backends:
| Setting | vLLM | SGLang |
|---|---|---|
| Module | dynamo.vllm |
dynamo.sglang |
| Max context (optional) | --max-model-len |
--context-length |
| GPU memory | --gpu-memory-utilization |
--mem-fraction-static |
| Placeholder image | dynamo-vllm-placeholder |
dynamo-sglang-placeholder |
Identity backendFramework |
"vllm" |
"sglang" |
Note: Do not set
DYN_READY_FOR_CHECKPOINT_FILEorDYN_CHECKPOINT_READY_FILEin the DGD worker env vars. These are injected automatically by the operator's checkpoint controller into checkpoint job pods only. Setting them on worker pods causes all workers to enter checkpoint mode instead of cold-starting normally.
kubectl apply -f my-llm.yaml -n dynamo-systemOn first deployment:
- A checkpoint job runs to create the checkpoint
- Worker pods start with cold start (checkpoint not ready yet)
- Once checkpoint is ready, new pods (scale-up, restarts) restore from checkpoint
The operator automatically creates a DynamoCheckpoint CR if one doesn't exist:
checkpoint:
enabled: true
mode: auto
identity:
model: "meta-llama/Llama-3-8B"
backendFramework: "vllm" # or "sglang"
tensorParallelSize: 1
dtype: "bfloat16"
maxModelLen: 4096Reference an existing DynamoCheckpoint CR by its 16-character hash using checkpointRef:
checkpoint:
enabled: true
checkpointRef: "e5962d34ba272638" # 16-char hash of DynamoCheckpoint CRThis is useful when:
- You want to pre-warm checkpoints before creating DGDs
- You want to explicit control over which checkpoint to use
Flow:
- Create a
DynamoCheckpointCR (see DynamoCheckpoint CRD section) - Wait for it to become
Ready - Reference it in your DGD using
checkpointRefwith the hash
# Check checkpoint status (using 16-char hash name)
kubectl get dynamocheckpoint e5962d34ba272638 -n dynamo-system
NAME MODEL BACKEND PHASE HASH AGE
e5962d34ba272638 meta-llama/Llama-3-8B vllm Ready e5962d34ba272638 5m
# Now create DGD referencing it
kubectl apply -f my-dgd.yamlCheckpoints are uniquely identified by a 16-character SHA256 hash (64 bits) of configuration that affects runtime state:
| Field | Required | Affects Hash | Example |
|---|---|---|---|
model |
✓ | ✓ | meta-llama/Llama-3-8B |
framework |
✓ | ✓ | sglang, trtllm, vllm |
dynamoVersion |
✓ | 0.9.0, 1.0.0 |
|
tensorParallelSize |
✓ | 1, 2, 4, 8 (default: 1) |
|
pipelineParallelSize |
✓ | 1, 2 (default: 1) |
|
dtype |
✓ | float16, bfloat16, fp8 |
|
maxModelLen |
✓ | 4096, 8192 |
|
extraParameters |
✓ | Custom key-value pairs |
Not included in hash (don't invalidate checkpoint):
replicasnodeSelector,affinity,tolerationsresources(requests/limits)- Logging/observability config
Example with all fields:
checkpoint:
enabled: true
mode: auto
identity:
model: "meta-llama/Llama-3-8B"
backendFramework: "vllm"
dynamoVersion: "0.9.0"
tensorParallelSize: 1
pipelineParallelSize: 1
dtype: "bfloat16"
maxModelLen: 8192
extraParameters:
enableChunkedPrefill: "true"
quantization: "awq"Checkpoint Naming: The DynamoCheckpoint CR is automatically named using the 16-character identity hash (e.g., e5962d34ba272638).
Checkpoint Sharing: Multiple DGDs with the same identity automatically share the same checkpoint.
The DynamoCheckpoint (shortname: dckpt) is a Kubernetes Custom Resource that manages checkpoint lifecycle.
When to create a DynamoCheckpoint directly:
- Pre-warming: Create checkpoints before deploying DGDs for instant startup
- Explicit control: Manage checkpoint lifecycle independently from DGDs
Note: With the new hash-based naming, checkpoint names are automatically generated (16-character hash). The operator handles checkpoint discovery and reuse automatically in auto mode.
Create a checkpoint:
apiVersion: nvidia.com/v1alpha1
kind: DynamoCheckpoint
metadata:
name: e5962d34ba272638 # Use the computed 16-char hash
spec:
identity:
model: "meta-llama/Llama-3-8B"
backendFramework: "vllm"
tensorParallelSize: 1
dtype: "bfloat16"
job:
activeDeadlineSeconds: 3600
podTemplateSpec:
spec:
containers:
- name: main
image: nvcr.io/nvidia/ai-dynamo/dynamo-vllm:latest
command: ["python3", "-m", "dynamo.vllm"]
args: ["--model", "meta-llama/Llama-3-8B"]
resources:
limits:
nvidia.com/gpu: "1"
env:
- name: HF_TOKEN
valueFrom:
secretKeyRef:
name: hf-token-secret
key: HF_TOKENNote: You can compute the hash yourself, or use auto mode to let the operator create it.
Check status:
# List all checkpoints
kubectl get dynamocheckpoint -n dynamo-system
# Or use shortname
kubectl get dckpt -n dynamo-system
NAME MODEL BACKEND PHASE HASH AGE
e5962d34ba272638 meta-llama/Llama-3-8B vllm Ready e5962d34ba272638 5m
a7b4f89c12de3456 meta-llama/Llama-3-70B vllm Creating a7b4f89c12de3456 2mPhases:
| Phase | Description |
|---|---|
Pending |
CR created, waiting for job to start |
Creating |
Checkpoint job is running |
Ready |
Checkpoint available for use |
Failed |
Checkpoint creation failed |
Detailed status:
kubectl describe dckpt e5962d34ba272638 -n dynamo-systemStatus:
Phase: Ready
IdentityHash: e5962d34ba272638
Location: /checkpoints/e5962d34ba272638
StorageType: pvc
CreatedAt: 2026-01-29T10:05:00ZReference from DGD:
Once the checkpoint is Ready, you can reference it by hash:
spec:
services:
VllmWorker:
checkpoint:
enabled: true
checkpointRef: "e5962d34ba272638" # 16-char hashOr use auto mode and the operator will find/create it automatically.
- vLLM and SGLang backends only: TensorRT-LLM support is planned.
- LLM workers only: Checkpoint/restore supports LLM decode and prefill workers. Specialized workers (multimodal, embedding, diffusion) are not supported.
- Single-GPU only: Multi-GPU configurations are not yet supported (planned)
- Network state: Active TCP connections are closed during restore (handled with
tcp-closeCRIU option) - Storage: Only PVC backend currently implemented (S3/OCI planned)
- Security: ChReK runs as a privileged DaemonSet which is required to run CRIU
-
Check the checkpoint job:
kubectl get jobs -l nvidia.com/chrek-is-checkpoint-source=true -n dynamo-system kubectl logs job/checkpoint-<name> -n dynamo-system
-
Check the DaemonSet:
kubectl logs daemonset/chrek-agent -n dynamo-system
-
Verify storage access:
kubectl exec -it <checkpoint-agent-pod> -- ls -la /checkpoints
-
Check pod logs:
kubectl logs <worker-pod> -n dynamo-system
-
Verify checkpoint file exists:
# For PVC kubectl exec -it <any-pod-with-pvc> -- ls -la /checkpoints/
-
Check environment variables:
kubectl exec <worker-pod> -- env | grep DYN_CHECKPOINT
Pods fall back to cold start if:
- Checkpoint file doesn't exist yet (still being created)
- Checkpoint file is corrupted
- CRIU restore fails
Check logs for "Falling back to cold start" message.
| Variable | Description |
|---|---|
DYN_CHECKPOINT_STORAGE_TYPE |
Backend: pvc, s3, oci (s3 and oci are currently no-ops) |
DYN_CHECKPOINT_LOCATION |
Full checkpoint location (checkpoint jobs) |
DYN_CHECKPOINT_PATH |
Base checkpoint directory (restore pods, PVC) |
DYN_CHECKPOINT_HASH |
Identity hash |
DYN_READY_FOR_CHECKPOINT_FILE |
Ready-for-checkpoint file path (checkpoint jobs) |
Create a checkpoint and use it in a DGD:
# 1. Create the DynamoCheckpoint CR
apiVersion: nvidia.com/v1alpha1
kind: DynamoCheckpoint
metadata:
name: e5962d34ba272638 # 16-char hash (computed from identity)
namespace: dynamo-system
spec:
identity:
model: "meta-llama/Meta-Llama-3-8B-Instruct"
backendFramework: "vllm"
tensorParallelSize: 1
dtype: "bfloat16"
job:
activeDeadlineSeconds: 3600
backoffLimit: 3
podTemplateSpec:
spec:
containers:
- name: main
image: nvcr.io/nvidia/ai-dynamo/dynamo-vllm-placeholder:latest
command: ["python3"]
args:
- "-m"
- "dynamo.vllm"
- "--model"
- "meta-llama/Meta-Llama-3-8B-Instruct"
- "--max-model-len"
- "4096"
- "--gpu-memory-utilization"
- "0.90"
env:
- name: HF_TOKEN
valueFrom:
secretKeyRef:
name: hf-token-secret
key: HF_TOKEN
- name: GLOO_SOCKET_IFNAME
value: "lo"
- name: NCCL_SOCKET_IFNAME
value: "lo"
resources:
limits:
nvidia.com/gpu: "1"
restartPolicy: Never
---
# 2. Wait for Ready: kubectl get dckpt e5962d34ba272638 -n dynamo-system -w
---
# 3. Reference the checkpoint in your DGD
apiVersion: nvidia.com/v1alpha1
kind: DynamoGraphDeployment
metadata:
name: my-llm
namespace: dynamo-system
spec:
services:
worker:
replicas: 2
extraPodSpec:
mainContainer:
image: nvcr.io/nvidia/ai-dynamo/dynamo-vllm-placeholder:latest
command: ["python3"]
args:
- "-m"
- "dynamo.vllm"
- "--model"
- "meta-llama/Meta-Llama-3-8B-Instruct"
- "--max-model-len"
- "4096"
- "--gpu-memory-utilization"
- "0.90"
env:
- name: GLOO_SOCKET_IFNAME
value: "lo"
- name: NCCL_SOCKET_IFNAME
value: "lo"
resources:
limits:
nvidia.com/gpu: "1"
checkpoint:
enabled: true
checkpointRef: "e5962d34ba272638" # Reference by hash- ChReK Overview - ChReK architecture and use cases
- ChReK Helm Chart README - Chart configuration
- Installation Guide - Platform installation
- API Reference - Complete CRD specifications