SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
⚠️ Important: This documentation is automatically generated from source code. Do not edit this file directly.
Package v1alpha1 contains API Schema definitions for the nvidia.com v1alpha1 API group.
This package defines the DynamoGraphDeploymentRequest (DGDR) custom resource, which provides a high-level, SLA-driven interface for deploying machine learning models on Dynamo.
Package v1alpha1 contains API Schema definitions for the nvidia.com v1alpha1 API group.
- DynamoCheckpoint
- DynamoComponentDeployment
- DynamoGraphDeployment
- DynamoGraphDeploymentRequest
- DynamoGraphDeploymentScalingAdapter
- DynamoModel
Deprecated: This field is deprecated and ignored. Use DynamoGraphDeploymentScalingAdapter with HPA, KEDA, or Planner for autoscaling instead. See docs/kubernetes/autoscaling.md for migration guidance. This field will be removed in a future API version.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
enabled boolean |
Deprecated: This field is ignored. | ||
minReplicas integer |
Deprecated: This field is ignored. | ||
maxReplicas integer |
Deprecated: This field is ignored. | ||
behavior HorizontalPodAutoscalerBehavior |
Deprecated: This field is ignored. | ||
metrics MetricSpec array |
Deprecated: This field is ignored. |
Underlying type: string
CheckpointMode defines how checkpoint creation is handled
Validation:
- Enum: [Auto Manual]
Appears in:
| Field | Description |
|---|---|
Auto |
CheckpointModeAuto means the DGD controller will automatically create a Checkpoint CR |
Manual |
CheckpointModeManual means the user must create the Checkpoint CR themselves |
Underlying type: string
ComponentKind represents the type of underlying Kubernetes resource.
Validation:
- Enum: [PodClique PodCliqueScalingGroup Deployment LeaderWorkerSet]
Appears in:
| Field | Description |
|---|---|
PodClique |
ComponentKindPodClique represents a PodClique resource. |
PodCliqueScalingGroup |
ComponentKindPodCliqueScalingGroup represents a PodCliqueScalingGroup resource. |
Deployment |
ComponentKindDeployment represents a Deployment resource. |
LeaderWorkerSet |
ComponentKindLeaderWorkerSet represents a LeaderWorkerSet resource. |
ConfigMapKeySelector selects a specific key from a ConfigMap. Used to reference external configuration data stored in ConfigMaps.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
name string |
Name of the ConfigMap containing the desired data. | Required: {} |
|
key string |
Key in the ConfigMap to select. If not specified, defaults to "disagg.yaml". | disagg.yaml |
Underlying type: string
Validation:
- Enum: [Initializing Pending Profiling Deploying Ready DeploymentDeleted Failed]
Appears in:
| Field | Description |
|---|---|
Initializing |
|
Pending |
|
Profiling |
|
Deploying |
|
Ready |
|
DeploymentDeleted |
|
Failed |
Underlying type: string
Validation:
- Enum: [initializing pending successful failed]
Appears in:
| Field | Description |
|---|---|
initializing |
|
pending |
|
successful |
|
failed |
DeploymentOverridesSpec allows users to customize metadata for auto-created DynamoGraphDeployments. When autoApply is enabled, these overrides are applied to the generated DGD resource.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
name string |
Name is the desired name for the created DynamoGraphDeployment. If not specified, defaults to the DGDR name. |
Optional: {} |
|
namespace string |
Namespace is the desired namespace for the created DynamoGraphDeployment. If not specified, defaults to the DGDR namespace. |
Optional: {} |
|
labels object (keys:string, values:string) |
Labels are additional labels to add to the DynamoGraphDeployment metadata. These are merged with auto-generated labels from the profiling process. |
Optional: {} |
|
annotations object (keys:string, values:string) |
Annotations are additional annotations to add to the DynamoGraphDeployment metadata. | Optional: {} |
|
workersImage string |
WorkersImage specifies the container image to use for DynamoGraphDeployment worker components. This image is used for both temporary DGDs created during online profiling and the final DGD. If omitted, the image from the base config file (e.g., disagg.yaml) is used. Example: "nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.9.0" |
Optional: {} |
DeploymentStatus tracks the state of an auto-created DynamoGraphDeployment. This status is populated when autoApply is enabled and a DGD is created.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
name string |
Name is the name of the created DynamoGraphDeployment. | ||
namespace string |
Namespace is the namespace of the created DynamoGraphDeployment. | ||
state DGDState |
State is the current state of the DynamoGraphDeployment. This value is mirrored from the DGD's status.state field. |
initializing | Enum: [initializing pending successful failed] |
created boolean |
Created indicates whether the DGD has been successfully created. Used to prevent recreation if the DGD is manually deleted by users. |
DynamoCheckpoint is the Schema for the dynamocheckpoints API It represents a container checkpoint that can be used to restore pods to a warm state
| Field | Description | Default | Validation |
|---|---|---|---|
apiVersion string |
nvidia.com/v1alpha1 |
||
kind string |
DynamoCheckpoint |
||
metadata ObjectMeta |
Refer to Kubernetes API documentation for fields of metadata. |
||
spec DynamoCheckpointSpec |
|||
status DynamoCheckpointStatus |
DynamoCheckpointIdentity defines the inputs that determine checkpoint equivalence Two checkpoints with the same identity hash are considered equivalent
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
model string |
Model is the model identifier (e.g., "meta-llama/Llama-3-70B") | Required: {} |
|
backendFramework string |
BackendFramework is the runtime framework (vllm, sglang, trtllm) | Enum: [vllm sglang trtllm] Required: {} |
|
dynamoVersion string |
DynamoVersion is the Dynamo platform version (optional) If not specified, version is not included in identity hash This ensures checkpoint compatibility across Dynamo releases |
Optional: {} |
|
tensorParallelSize integer |
TensorParallelSize is the tensor parallel configuration | 1 | Minimum: 1 Optional: {} |
pipelineParallelSize integer |
PipelineParallelSize is the pipeline parallel configuration | 1 | Minimum: 1 Optional: {} |
dtype string |
Dtype is the data type (fp16, bf16, fp8, etc.) | Optional: {} |
|
maxModelLen integer |
MaxModelLen is the maximum sequence length | Minimum: 1 Optional: {} |
|
extraParameters object (keys:string, values:string) |
ExtraParameters are additional parameters that affect the checkpoint hash Use for any framework-specific or custom parameters not covered above |
Optional: {} |
DynamoCheckpointJobConfig defines the configuration for the checkpoint creation Job
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
podTemplateSpec PodTemplateSpec |
PodTemplateSpec allows customizing the checkpoint Job pod This should include the container that runs the workload to be checkpointed |
Required: {} |
|
activeDeadlineSeconds integer |
ActiveDeadlineSeconds specifies the maximum time the Job can run | 3600 | Optional: {} |
backoffLimit integer |
BackoffLimit specifies the number of retries before marking the Job failed | 3 | Optional: {} |
ttlSecondsAfterFinished integer |
TTLSecondsAfterFinished specifies how long to keep the Job after completion | 300 | Optional: {} |
Underlying type: string
DynamoCheckpointPhase represents the current phase of the checkpoint lifecycle
Validation:
- Enum: [Pending Creating Ready Failed]
Appears in:
| Field | Description |
|---|---|
Pending |
DynamoCheckpointPhasePending indicates the checkpoint CR has been created but the Job has not started |
Creating |
DynamoCheckpointPhaseCreating indicates the checkpoint Job is running |
Ready |
DynamoCheckpointPhaseReady indicates the checkpoint tar file is available on the PVC |
Failed |
DynamoCheckpointPhaseFailed indicates the checkpoint creation failed |
DynamoCheckpointSpec defines the desired state of DynamoCheckpoint
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
identity DynamoCheckpointIdentity |
Identity defines the inputs that determine checkpoint equivalence | Required: {} |
|
job DynamoCheckpointJobConfig |
Job defines the configuration for the checkpoint creation Job | Required: {} |
DynamoCheckpointStatus defines the observed state of DynamoCheckpoint
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
phase DynamoCheckpointPhase |
Phase represents the current phase of the checkpoint lifecycle | Enum: [Pending Creating Ready Failed] Optional: {} |
|
identityHash string |
IdentityHash is the computed hash of the checkpoint identity This hash is used to identify equivalent checkpoints |
Optional: {} |
|
location string |
Location is the full URI/path to the checkpoint in the storage backend For PVC: same as TarPath (e.g., /checkpoints/{hash}.tar) For S3: s3://bucket/prefix/{hash}.tar For OCI: oci://registry/repo:{hash} |
Optional: {} |
|
storageType DynamoCheckpointStorageType |
StorageType indicates the storage backend type used for this checkpoint | Enum: [pvc s3 oci] Optional: {} |
|
jobName string |
JobName is the name of the checkpoint creation Job | Optional: {} |
|
createdAt Time |
CreatedAt is the timestamp when the checkpoint tar was created | Optional: {} |
|
message string |
Message provides additional information about the current state | Optional: {} |
|
conditions Condition array |
Conditions represent the latest available observations of the checkpoint's state | Optional: {} |
Underlying type: string
DynamoCheckpointStorageType defines the supported storage backends for checkpoints
Validation:
- Enum: [pvc s3 oci]
Appears in:
DynamoComponentDeployment is the Schema for the dynamocomponentdeployments API
| Field | Description | Default | Validation |
|---|---|---|---|
apiVersion string |
nvidia.com/v1alpha1 |
||
kind string |
DynamoComponentDeployment |
||
metadata ObjectMeta |
Refer to Kubernetes API documentation for fields of metadata. |
||
spec DynamoComponentDeploymentSpec |
Spec defines the desired state for this Dynamo component deployment. |
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
annotations object (keys:string, values:string) |
Annotations to add to generated Kubernetes resources for this component (such as Pod, Service, and Ingress when applicable). |
||
labels object (keys:string, values:string) |
Labels to add to generated Kubernetes resources for this component. | ||
serviceName string |
The name of the component | ||
componentType string |
ComponentType indicates the role of this component (for example, "main"). | ||
subComponentType string |
SubComponentType indicates the sub-role of this component (for example, "prefill"). | ||
dynamoNamespace string |
DynamoNamespace is deprecated and will be removed in a future version. The DGD Kubernetes namespace and DynamoGraphDeployment name are used to construct the Dynamo namespace for each component |
Optional: {} |
|
globalDynamoNamespace boolean |
GlobalDynamoNamespace indicates that the Component will be placed in the global Dynamo namespace | ||
resources Resources |
Resources requested and limits for this component, including CPU, memory, GPUs/devices, and any runtime-specific resources. |
||
autoscaling Autoscaling |
Deprecated: This field is deprecated and ignored. Use DynamoGraphDeploymentScalingAdapter with HPA, KEDA, or Planner for autoscaling instead. See docs/kubernetes/autoscaling.md for migration guidance. This field will be removed in a future API version. |
||
envs EnvVar array |
Envs defines additional environment variables to inject into the component containers. | ||
envFromSecret string |
EnvFromSecret references a Secret whose key/value pairs will be exposed as environment variables in the component containers. |
||
volumeMounts VolumeMount array |
VolumeMounts references PVCs defined at the top level for volumes to be mounted by the component. | ||
ingress IngressSpec |
Ingress config to expose the component outside the cluster (or through a service mesh). | ||
modelRef ModelReference |
ModelRef references a model that this component serves When specified, a headless service will be created for endpoint discovery |
Optional: {} |
|
sharedMemory SharedMemorySpec |
SharedMemory controls the tmpfs mounted at /dev/shm (enable/disable and size). | ||
extraPodMetadata ExtraPodMetadata |
ExtraPodMetadata adds labels/annotations to the created Pods. | Optional: {} |
|
extraPodSpec ExtraPodSpec |
ExtraPodSpec allows to override the main pod spec configuration. It is a k8s standard PodSpec. It also contains a MainContainer (standard k8s Container) field that allows overriding the main container configuration. |
Optional: {} |
|
livenessProbe Probe |
LivenessProbe to detect and restart unhealthy containers. | ||
readinessProbe Probe |
ReadinessProbe to signal when the container is ready to receive traffic. | ||
replicas integer |
Replicas is the desired number of Pods for this component. When scalingAdapter is enabled, this field is managed by the DynamoGraphDeploymentScalingAdapter and should not be modified directly. |
Minimum: 0 |
|
multinode MultinodeSpec |
Multinode is the configuration for multinode components. | ||
scalingAdapter ScalingAdapter |
ScalingAdapter configures whether this service uses the DynamoGraphDeploymentScalingAdapter. When enabled, replicas are managed via DGDSA and external autoscalers can scale the service using the Scale subresource. When disabled, replicas can be modified directly. |
Optional: {} |
|
eppConfig EPPConfig |
EPPConfig defines EPP-specific configuration options for Endpoint Picker Plugin components. Only applicable when ComponentType is "epp". |
Optional: {} |
|
frontendSidecar FrontendSidecarSpec |
FrontendSidecar configures an auto-generated frontend sidecar container. When specified, the operator injects a fully configured frontend container with all standard Dynamo environment variables, health probes, and ports. This eliminates the need to manually specify these in extraPodSpec.containers. (GAIE) |
Optional: {} |
|
checkpoint ServiceCheckpointConfig |
Checkpoint configures container checkpointing for this service. When enabled, pods can be restored from a checkpoint files for faster cold start. |
Optional: {} |
DynamoComponentDeploymentSpec defines the desired state of DynamoComponentDeployment
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
backendFramework string |
BackendFramework specifies the backend framework (e.g., "sglang", "vllm", "trtllm") | Enum: [sglang vllm trtllm] |
|
annotations object (keys:string, values:string) |
Annotations to add to generated Kubernetes resources for this component (such as Pod, Service, and Ingress when applicable). |
||
labels object (keys:string, values:string) |
Labels to add to generated Kubernetes resources for this component. | ||
serviceName string |
The name of the component | ||
componentType string |
ComponentType indicates the role of this component (for example, "main"). | ||
subComponentType string |
SubComponentType indicates the sub-role of this component (for example, "prefill"). | ||
dynamoNamespace string |
DynamoNamespace is deprecated and will be removed in a future version. The DGD Kubernetes namespace and DynamoGraphDeployment name are used to construct the Dynamo namespace for each component |
Optional: {} |
|
globalDynamoNamespace boolean |
GlobalDynamoNamespace indicates that the Component will be placed in the global Dynamo namespace | ||
resources Resources |
Resources requested and limits for this component, including CPU, memory, GPUs/devices, and any runtime-specific resources. |
||
autoscaling Autoscaling |
Deprecated: This field is deprecated and ignored. Use DynamoGraphDeploymentScalingAdapter with HPA, KEDA, or Planner for autoscaling instead. See docs/kubernetes/autoscaling.md for migration guidance. This field will be removed in a future API version. |
||
envs EnvVar array |
Envs defines additional environment variables to inject into the component containers. | ||
envFromSecret string |
EnvFromSecret references a Secret whose key/value pairs will be exposed as environment variables in the component containers. |
||
volumeMounts VolumeMount array |
VolumeMounts references PVCs defined at the top level for volumes to be mounted by the component. | ||
ingress IngressSpec |
Ingress config to expose the component outside the cluster (or through a service mesh). | ||
modelRef ModelReference |
ModelRef references a model that this component serves When specified, a headless service will be created for endpoint discovery |
Optional: {} |
|
sharedMemory SharedMemorySpec |
SharedMemory controls the tmpfs mounted at /dev/shm (enable/disable and size). | ||
extraPodMetadata ExtraPodMetadata |
ExtraPodMetadata adds labels/annotations to the created Pods. | Optional: {} |
|
extraPodSpec ExtraPodSpec |
ExtraPodSpec allows to override the main pod spec configuration. It is a k8s standard PodSpec. It also contains a MainContainer (standard k8s Container) field that allows overriding the main container configuration. |
Optional: {} |
|
livenessProbe Probe |
LivenessProbe to detect and restart unhealthy containers. | ||
readinessProbe Probe |
ReadinessProbe to signal when the container is ready to receive traffic. | ||
replicas integer |
Replicas is the desired number of Pods for this component. When scalingAdapter is enabled, this field is managed by the DynamoGraphDeploymentScalingAdapter and should not be modified directly. |
Minimum: 0 |
|
multinode MultinodeSpec |
Multinode is the configuration for multinode components. | ||
scalingAdapter ScalingAdapter |
ScalingAdapter configures whether this service uses the DynamoGraphDeploymentScalingAdapter. When enabled, replicas are managed via DGDSA and external autoscalers can scale the service using the Scale subresource. When disabled, replicas can be modified directly. |
Optional: {} |
|
eppConfig EPPConfig |
EPPConfig defines EPP-specific configuration options for Endpoint Picker Plugin components. Only applicable when ComponentType is "epp". |
Optional: {} |
|
frontendSidecar FrontendSidecarSpec |
FrontendSidecar configures an auto-generated frontend sidecar container. When specified, the operator injects a fully configured frontend container with all standard Dynamo environment variables, health probes, and ports. This eliminates the need to manually specify these in extraPodSpec.containers. (GAIE) |
Optional: {} |
|
checkpoint ServiceCheckpointConfig |
Checkpoint configures container checkpointing for this service. When enabled, pods can be restored from a checkpoint files for faster cold start. |
Optional: {} |
DynamoGraphDeployment is the Schema for the dynamographdeployments API.
| Field | Description | Default | Validation |
|---|---|---|---|
apiVersion string |
nvidia.com/v1alpha1 |
||
kind string |
DynamoGraphDeployment |
||
metadata ObjectMeta |
Refer to Kubernetes API documentation for fields of metadata. |
||
spec DynamoGraphDeploymentSpec |
Spec defines the desired state for this graph deployment. | ||
status DynamoGraphDeploymentStatus |
Status reflects the current observed state of this graph deployment. |
DynamoGraphDeploymentRequest is the Schema for the dynamographdeploymentrequests API. It serves as the primary interface for users to request model deployments with specific performance and resource constraints, enabling SLA-driven deployments.
Lifecycle:
- Initializing → Pending: Validates spec and prepares for profiling
- Pending → Profiling: Creates and runs profiling job (online or AIC)
- Profiling → Ready/Deploying: Generates DGD spec after profiling completes
- Deploying → Ready: When autoApply=true, monitors DGD until Ready
- Ready: Terminal state when DGD is operational or spec is available
- DeploymentDeleted: Terminal state when auto-created DGD is manually deleted
The spec becomes immutable once profiling starts. Users must delete and recreate the DGDR to modify configuration after this point.
DEPRECATION NOTICE: v1alpha1 DynamoGraphDeploymentRequest is deprecated. Please migrate to nvidia.com/v1beta1 DynamoGraphDeploymentRequest. v1alpha1 will be removed in a future release.
| Field | Description | Default | Validation |
|---|---|---|---|
apiVersion string |
nvidia.com/v1alpha1 |
||
kind string |
DynamoGraphDeploymentRequest |
||
metadata ObjectMeta |
Refer to Kubernetes API documentation for fields of metadata. |
||
spec DynamoGraphDeploymentRequestSpec |
Spec defines the desired state for this deployment request. | ||
status DynamoGraphDeploymentRequestStatus |
Status reflects the current observed state of this deployment request. |
DynamoGraphDeploymentRequestSpec defines the desired state of a DynamoGraphDeploymentRequest. This CRD serves as the primary interface for users to request model deployments with specific performance constraints and resource requirements, enabling SLA-driven deployments.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
model string |
Model specifies the model to deploy (e.g., "Qwen/Qwen3-0.6B", "meta-llama/Llama-3-70b"). This is a high-level identifier for easy reference in kubectl output and logs. The controller automatically sets this value in profilingConfig.config.deployment.model. |
Required: {} |
|
backend string |
Backend specifies the inference backend for profiling. The controller automatically sets this value in profilingConfig.config.engine.backend. Profiling runs on real GPUs or via AIC simulation to collect performance data. |
Enum: [auto vllm sglang trtllm] Required: {} |
|
useMocker boolean |
UseMocker indicates whether to deploy a mocker DynamoGraphDeployment instead of a real backend deployment. When true, the deployment uses simulated engines that don't require GPUs, using the profiling data to simulate realistic timing behavior. Mocker is available in all backend images and useful for large-scale experiments. Profiling still runs against the real backend (specified above) to collect performance data. |
false | |
profilingConfig ProfilingConfigSpec |
ProfilingConfig provides the complete configuration for the profiling job. Note: GPU discovery is automatically attempted to detect GPU resources from Kubernetes cluster nodes. If the operator has node read permissions (cluster-wide or explicitly granted), discovered GPU configuration is used as defaults when hardware configuration is not manually specified (minNumGpusPerEngine, maxNumGpusPerEngine, numGpusPerNode). User-specified values always take precedence over auto-discovered values. If GPU discovery fails (e.g., namespace-restricted operator without node permissions), manual hardware config is required. This configuration is passed directly to the profiler. The structure matches the profile_sla config format exactly (see ProfilingConfigSpec for schema). Note: deployment.model and engine.backend are automatically set from the high-level modelName and backend fields and should not be specified in this config. |
Required: {} |
|
enableGpuDiscovery boolean |
EnableGPUDiscovery controls whether the operator attempts to discover GPU hardware from cluster nodes. DEPRECATED: This field is deprecated and will be removed in v1beta1. GPU discovery is now always attempted automatically. Setting this field has no effect - the operator will always try to discover GPU hardware when node read permissions are available. If discovery is unavailable (e.g., namespace-scoped operator without permissions), manual hardware configuration is required regardless of this setting. |
true | Optional: {} |
autoApply boolean |
AutoApply indicates whether to automatically create a DynamoGraphDeployment after profiling completes. If false, only the spec is generated and stored in status. Users can then manually create a DGD using the generated spec. |
false | |
deploymentOverrides DeploymentOverridesSpec |
DeploymentOverrides allows customizing metadata for the auto-created DGD. Only applicable when AutoApply is true. |
Optional: {} |
DynamoGraphDeploymentRequestStatus represents the observed state of a DynamoGraphDeploymentRequest. The controller updates this status as the DGDR progresses through its lifecycle.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
state DGDRState |
State is a high-level textual status of the deployment request lifecycle. | Initializing | Enum: [Initializing Pending Profiling Deploying Ready DeploymentDeleted Failed] |
backend string |
Backend is extracted from profilingConfig.config.engine.backend for display purposes. This field is populated by the controller and shown in kubectl output. |
Optional: {} |
|
observedGeneration integer |
ObservedGeneration reflects the generation of the most recently observed spec. Used to detect spec changes and enforce immutability after profiling starts. |
||
conditions Condition array |
Conditions contains the latest observed conditions of the deployment request. Standard condition types include: Validation, Profiling, SpecGenerated, DeploymentReady. Conditions are merged by type on patch updates. |
||
profilingResults string |
ProfilingResults contains a reference to the ConfigMap holding profiling data. Format: "configmap/<name>" |
Optional: {} |
|
generatedDeployment RawExtension |
GeneratedDeployment contains the full generated DynamoGraphDeployment specification including metadata, based on profiling results. Users can extract this to create a DGD manually, or it's used automatically when autoApply is true. Stored as RawExtension to preserve all fields including metadata. For mocker backends, this contains the mocker DGD spec. |
EmbeddedResource: {} Optional: {} |
|
deployment DeploymentStatus |
Deployment tracks the auto-created DGD when AutoApply is true. Contains name, namespace, state, and creation status of the managed DGD. |
Optional: {} |
DynamoGraphDeploymentScalingAdapter provides a scaling interface for individual services within a DynamoGraphDeployment. It implements the Kubernetes scale subresource, enabling integration with HPA, KEDA, and custom autoscalers.
The adapter acts as an intermediary between autoscalers and the DGD, ensuring that only the adapter controller modifies the DGD's service replicas. This prevents conflicts when multiple autoscaling mechanisms are in play.
| Field | Description | Default | Validation |
|---|---|---|---|
apiVersion string |
nvidia.com/v1alpha1 |
||
kind string |
DynamoGraphDeploymentScalingAdapter |
||
metadata ObjectMeta |
Refer to Kubernetes API documentation for fields of metadata. |
||
spec DynamoGraphDeploymentScalingAdapterSpec |
|||
status DynamoGraphDeploymentScalingAdapterStatus |
DynamoGraphDeploymentScalingAdapterSpec defines the desired state of DynamoGraphDeploymentScalingAdapter
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
replicas integer |
Replicas is the desired number of replicas for the target service. This field is modified by external autoscalers (HPA/KEDA/Planner) or manually by users. |
Minimum: 0 Required: {} |
|
dgdRef DynamoGraphDeploymentServiceRef |
DGDRef references the DynamoGraphDeployment and the specific service to scale. | Required: {} |
DynamoGraphDeploymentScalingAdapterStatus defines the observed state of DynamoGraphDeploymentScalingAdapter
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
replicas integer |
Replicas is the current number of replicas for the target service. This is synced from the DGD's service replicas and is required for the scale subresource. |
Optional: {} |
|
selector string |
Selector is a label selector string for the pods managed by this adapter. Required for HPA compatibility via the scale subresource. |
Optional: {} |
|
lastScaleTime Time |
LastScaleTime is the last time the adapter scaled the target service. | Optional: {} |
DynamoGraphDeploymentServiceRef identifies a specific service within a DynamoGraphDeployment
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
name string |
Name of the DynamoGraphDeployment | MinLength: 1 Required: {} |
|
serviceName string |
ServiceName is the key name of the service within the DGD's spec.services map to scale | MinLength: 1 Required: {} |
DynamoGraphDeploymentSpec defines the desired state of DynamoGraphDeployment.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
pvcs PVC array |
PVCs defines a list of persistent volume claims that can be referenced by components. Each PVC must have a unique name that can be referenced in component specifications. |
MaxItems: 100 Optional: {} |
|
services object (keys:string, values:DynamoComponentDeploymentSharedSpec) |
Services are the services to deploy as part of this deployment. | MaxProperties: 25 Optional: {} |
|
envs EnvVar array |
Envs are environment variables applied to all services in the deployment unless overridden by service-specific configuration. |
Optional: {} |
|
backendFramework string |
BackendFramework specifies the backend framework (e.g., "sglang", "vllm", "trtllm"). | Enum: [sglang vllm trtllm] |
|
restart Restart |
Restart specifies the restart policy for the graph deployment. | Optional: {} |
DynamoGraphDeploymentStatus defines the observed state of DynamoGraphDeployment.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
observedGeneration integer |
ObservedGeneration is the most recent generation observed by the controller. | Optional: {} |
|
state DGDState |
State is a high-level textual status of the graph deployment lifecycle. | initializing | Enum: [initializing pending successful failed] |
conditions Condition array |
Conditions contains the latest observed conditions of the graph deployment. The slice is merged by type on patch updates. |
||
services object (keys:string, values:ServiceReplicaStatus) |
Services contains per-service replica status information. The map key is the service name from spec.services. |
Optional: {} |
|
restart RestartStatus |
Restart contains the status of the restart of the graph deployment. | Optional: {} |
|
checkpoints object (keys:string, values:ServiceCheckpointStatus) |
Checkpoints contains per-service checkpoint status information. The map key is the service name from spec.services. |
Optional: {} |
|
rollingUpdate RollingUpdateStatus |
RollingUpdate tracks the progress of operator manged rolling updates. Currently only supported for singl-node, non-Grove deployments (DCD/Deployment). |
Optional: {} |
DynamoModel is the Schema for the dynamo models API
| Field | Description | Default | Validation |
|---|---|---|---|
apiVersion string |
nvidia.com/v1alpha1 |
||
kind string |
DynamoModel |
||
metadata ObjectMeta |
Refer to Kubernetes API documentation for fields of metadata. |
||
spec DynamoModelSpec |
|||
status DynamoModelStatus |
DynamoModelSpec defines the desired state of DynamoModel
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
modelName string |
ModelName is the full model identifier (e.g., "meta-llama/Llama-3.3-70B-Instruct-lora") | Required: {} |
|
baseModelName string |
BaseModelName is the base model identifier that matches the service label This is used to discover endpoints via headless services |
Required: {} |
|
modelType string |
ModelType specifies the type of model (e.g., "base", "lora", "adapter") | base | Enum: [base lora adapter] Optional: {} |
source ModelSource |
Source specifies the model source location (only applicable for lora model type) | Optional: {} |
DynamoModelStatus defines the observed state of DynamoModel
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
endpoints EndpointInfo array |
Endpoints is the current list of all endpoints for this model | Optional: {} |
|
readyEndpoints integer |
ReadyEndpoints is the count of endpoints that are ready | ||
totalEndpoints integer |
TotalEndpoints is the total count of endpoints | ||
conditions Condition array |
Conditions represents the latest available observations of the model's state | Optional: {} |
EPPConfig contains configuration for EPP (Endpoint Picker Plugin) components. EPP is responsible for intelligent endpoint selection and KV-aware routing.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
configMapRef ConfigMapKeySelector |
ConfigMapRef references a user-provided ConfigMap containing EPP configuration. The ConfigMap should contain EndpointPickerConfig YAML. Mutually exclusive with Config. |
Optional: {} |
|
config EndpointPickerConfig |
Config allows specifying EPP EndpointPickerConfig directly as a structured object. The operator will marshal this to YAML and create a ConfigMap automatically. Mutually exclusive with ConfigMapRef. One of ConfigMapRef or Config must be specified (no default configuration). Uses the upstream type from github.com/kubernetes-sigs/gateway-api-inference-extension |
Type: object Optional: {} |
EndpointInfo represents a single endpoint (pod) serving the model
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
address string |
Address is the full address of the endpoint (e.g., "http://10.0.1.5:9090") | ||
podName string |
PodName is the name of the pod serving this endpoint | Optional: {} |
|
ready boolean |
Ready indicates whether the endpoint is ready to serve traffic For LoRA models: true if the POST /loras request succeeded with a 2xx status code For base models: always false (no probing performed) |
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
annotations object (keys:string, values:string) |
|||
labels object (keys:string, values:string) |
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
mainContainer Container |
FrontendSidecarSpec configures the auto-generated frontend sidecar container. The operator uses these fields together with built-in frontend defaults (command, probes, ports, and Dynamo env vars) to produce a fully configured sidecar container.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
image string |
Image is the container image for the frontend sidecar. | Required: {} |
|
args string array |
Args overrides the default frontend arguments. When specified, these replace the default ["-m", "dynamo.frontend"] entirely. For example, ["-m", "dynamo.frontend", "--router-mode", "direct"] for GAIE deployments. |
Optional: {} |
|
envFromSecret string |
EnvFromSecret references a Secret whose key/value pairs will be exposed as environment variables in the frontend sidecar container. |
Optional: {} |
|
envs EnvVar array |
Envs defines additional environment variables for the frontend sidecar. These are merged with (and can override) the auto-generated Dynamo env vars. |
Optional: {} |
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
enabled boolean |
Enabled exposes the component through an ingress or virtual service when true. | ||
host string |
Host is the base host name to route external traffic to this component. | ||
useVirtualService boolean |
UseVirtualService indicates whether to configure a service-mesh VirtualService instead of a standard Ingress. | ||
virtualServiceGateway string |
VirtualServiceGateway optionally specifies the gateway name to attach the VirtualService to. | ||
hostPrefix string |
HostPrefix is an optional prefix added before the host. | ||
annotations object (keys:string, values:string) |
Annotations to set on the generated Ingress/VirtualService resources. | ||
labels object (keys:string, values:string) |
Labels to set on the generated Ingress/VirtualService resources. | ||
tls IngressTLSSpec |
TLS holds the TLS configuration used by the Ingress/VirtualService. | ||
hostSuffix string |
HostSuffix is an optional suffix appended after the host. | ||
ingressControllerClassName string |
IngressControllerClassName selects the ingress controller class (e.g., "nginx"). |
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
secretName string |
SecretName is the name of a Kubernetes Secret containing the TLS certificate and key. |
ModelReference identifies a model served by this component
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
name string |
Name is the base model identifier (e.g., "llama-3-70b-instruct-v1") | Required: {} |
|
revision string |
Revision is the model revision/version (optional) | Optional: {} |
ModelSource defines the source location of a model
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
uri string |
URI is the model source URI Supported formats: - S3: s3://bucket/path/to/model - HuggingFace: hf://org/model@revision_sha |
Required: {} |
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
nodeCount integer |
Indicates the number of nodes to deploy for multinode components. Total number of GPUs is NumberOfNodes * GPU limit. Must be greater than 1. |
2 | Minimum: 2 |
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
create boolean |
Create indicates to create a new PVC | ||
name string |
Name is the name of the PVC | Required: {} |
|
storageClass string |
StorageClass to be used for PVC creation. Required when create is true. | ||
size Quantity |
Size of the volume in Gi, used during PVC creation. Required when create is true. | ||
volumeAccessMode PersistentVolumeAccessMode |
VolumeAccessMode is the volume access mode of the PVC. Required when create is true. |
ProfilingConfigSpec defines configuration for the profiling process. This structure maps directly to the profile_sla.py config format. See dynamo/profiler/utils/profiler_argparse.py for the complete schema.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
config JSON |
Config is the profiling configuration as arbitrary JSON/YAML. This will be passed directly to the profiler. The profiler will validate the configuration and report any errors. |
Optional: {} Type: object |
|
configMapRef ConfigMapKeySelector |
ConfigMapRef is an optional reference to a ConfigMap containing the DynamoGraphDeployment base config file (disagg.yaml). This is separate from the profiling config above. The path to this config will be set as engine.config in the profiling config. |
Optional: {} |
|
profilerImage string |
ProfilerImage specifies the container image to use for profiling jobs. This image contains the profiler code and dependencies needed for SLA-based profiling. Example: "nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.9.0" |
Required: {} |
|
outputPVC string |
OutputPVC is an optional PersistentVolumeClaim name for storing profiling output. If specified, all profiling artifacts (logs, plots, configs, raw data) will be written to this PVC instead of an ephemeral emptyDir volume. This allows users to access complete profiling results after the job completes by mounting the PVC. The PVC must exist in the same namespace as the DGDR. If not specified, profiling uses emptyDir and only essential data is saved to ConfigMaps. Note: ConfigMaps are still created regardless of this setting for planner integration. |
Optional: {} |
|
resources ResourceRequirements |
Resources specifies the compute resource requirements for the profiling job container. If not specified, no resource requests or limits are set. |
Optional: {} |
|
tolerations Toleration array |
Tolerations allows the profiling job to be scheduled on nodes with matching taints. For example, to schedule on GPU nodes, add a toleration for the nvidia.com/gpu taint. |
Optional: {} |
|
nodeSelector object (keys:string, values:string) |
NodeSelector is a selector which must match a node's labels for the profiling pod to be scheduled on that node. For example, to schedule on ARM64 nodes, use {"kubernetes.io/arch": "arm64"}. |
Optional: {} |
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
cpu string |
CPU specifies the CPU resource request/limit (e.g., "1000m", "2") | ||
memory string |
Memory specifies the memory resource request/limit (e.g., "4Gi", "8Gi") | ||
gpu string |
GPU indicates the number of GPUs to request. Total number of GPUs is NumberOfNodes * GPU in case of multinode deployment. |
||
gpuType string |
GPUType can specify a custom GPU type, e.g. "gpu.intel.com/xe" By default if not specified, the GPU type is "nvidia.com/gpu" |
||
custom object (keys:string, values:string) |
Custom specifies additional custom resource requests/limits |
Resources defines requested and limits for a component, including CPU, memory, GPUs/devices, and any runtime-specific resources.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
requests ResourceItem |
Requests specifies the minimum resources required by the component | ||
limits ResourceItem |
Limits specifies the maximum resources allowed for the component | ||
claims ResourceClaim array |
Claims specifies resource claims for dynamic resource allocation |
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
id string |
ID is an arbitrary string that triggers a restart when changed. Any modification to this value will initiate a restart of the graph deployment according to the strategy. |
MinLength: 1 Required: {} |
|
strategy RestartStrategy |
Strategy specifies the restart strategy for the graph deployment. | Optional: {} |
Underlying type: string
Appears in:
| Field | Description |
|---|---|
Pending |
|
Restarting |
|
Completed |
|
Failed |
|
Superseded |
RestartStatus contains the status of the restart of the graph deployment.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
observedID string |
ObservedID is the restart ID that has been observed and is being processed. Matches the Restart.ID field in the spec. |
||
phase RestartPhase |
Phase is the phase of the restart. | ||
inProgress string array |
InProgress contains the names of the services that are currently being restarted. | Optional: {} |
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
type RestartStrategyType |
Type specifies the restart strategy type. | Sequential | Enum: [Sequential Parallel] |
order string array |
Order specifies the order in which the services should be restarted. | Optional: {} |
Underlying type: string
Appears in:
| Field | Description |
|---|---|
Sequential |
|
Parallel |
Underlying type: string
RollingUpdatePhase represents the current phase of a rolling update.
Validation:
- Enum: [Pending InProgress Completed Failed ]
Appears in:
| Field | Description |
|---|---|
Pending |
|
InProgress |
|
Completed |
|
| `` |
RollingUpdateStatus tracks the progress of a rolling update.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
phase RollingUpdatePhase |
Phase indicates the current phase of the rolling update. | Enum: [Pending InProgress Completed Failed ] Optional: {} |
|
startTime Time |
StartTime is when the rolling update began. | Optional: {} |
|
endTime Time |
EndTime is when the rolling update completed (successfully or failed). | Optional: {} |
|
updatedServices string array |
UpdatedServices is the list of services that have completed the rolling update. A service is considered updated when its new replicas are all ready and old replicas are fully scaled down. Only services of componentType Worker (or Prefill/Decode) are considered. |
Optional: {} |
ScalingAdapter configures whether a service uses the DynamoGraphDeploymentScalingAdapter for replica management. When enabled, the DGDSA owns the replicas field and external autoscalers (HPA, KEDA, Planner) can control scaling via the Scale subresource.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
enabled boolean |
Enabled indicates whether the ScalingAdapter should be enabled for this service. When true, a DGDSA is created and owns the replicas field. When false (default), no DGDSA is created and replicas can be modified directly in the DGD. |
false | Optional: {} |
ServiceCheckpointConfig configures checkpointing for a DGD service
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
enabled boolean |
Enabled indicates whether checkpointing is enabled for this service | false | Optional: {} |
mode CheckpointMode |
Mode defines how checkpoint creation is handled - Auto: DGD controller creates Checkpoint CR automatically - Manual: User must create Checkpoint CR |
Auto | Enum: [Auto Manual] Optional: {} |
checkpointRef string |
CheckpointRef references an existing Checkpoint CR to use If specified, Identity is ignored and this checkpoint is used directly |
Optional: {} |
|
identity DynamoCheckpointIdentity |
Identity defines the checkpoint identity for hash computation Used when Mode is Auto or when looking up existing checkpoints Required when checkpointRef is not specified |
Optional: {} |
ServiceCheckpointStatus contains checkpoint information for a single service.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
checkpointName string |
CheckpointName is the name of the associated Checkpoint CR | Optional: {} |
|
identityHash string |
IdentityHash is the computed hash of the checkpoint identity | Optional: {} |
|
ready boolean |
Ready indicates if the checkpoint is ready for use | Optional: {} |
ServiceReplicaStatus contains replica information for a single service.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
componentKind ComponentKind |
ComponentKind is the underlying resource kind (e.g., "PodClique", "PodCliqueScalingGroup", "Deployment", "LeaderWorkerSet"). | Enum: [PodClique PodCliqueScalingGroup Deployment LeaderWorkerSet] |
|
componentName string |
ComponentName is the name of the primary underlying resource. DEPRECATED: Use ComponentNames instead. This field will be removed in a future release. During rolling updates, this reflects the new (target) component name. |
||
componentNames string array |
ComponentNames is the list of underlying resource names for this service. During normal operation, this contains a single name. During rolling updates, this contains both old and new component names. |
Optional: {} |
|
replicas integer |
Replicas is the total number of non-terminated replicas. Required for all component kinds. |
Minimum: 0 |
|
updatedReplicas integer |
UpdatedReplicas is the number of replicas at the current/desired revision. Required for all component kinds. |
Minimum: 0 |
|
readyReplicas integer |
ReadyReplicas is the number of ready replicas. Populated for PodClique, Deployment, and LeaderWorkerSet. Not available for PodCliqueScalingGroup. When nil, the field is omitted from the API response. |
Minimum: 0 Optional: {} |
|
availableReplicas integer |
AvailableReplicas is the number of available replicas. For Deployment: replicas ready for >= minReadySeconds. For PodCliqueScalingGroup: replicas where all constituent PodCliques have >= MinAvailable ready pods. Not available for PodClique or LeaderWorkerSet. When nil, the field is omitted from the API response. |
Minimum: 0 Optional: {} |
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
disabled boolean |
|||
size Quantity |
VolumeMount references a PVC defined at the top level for volumes to be mounted by the component
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
name string |
Name references a PVC name defined in the top-level PVCs map | Required: {} |
|
mountPoint string |
MountPoint specifies where to mount the volume. If useAsCompilationCache is true and mountPoint is not specified, a backend-specific default will be used. |
||
useAsCompilationCache boolean |
UseAsCompilationCache indicates this volume should be used as a compilation cache. When true, backend-specific environment variables will be set and default mount points may be used. |
false |
Package v1beta1 contains API Schema definitions for the nvidia.com v1beta1 API group.
Underlying type: string
BackendType specifies the inference backend.
Validation:
- Enum: [auto sglang trtllm vllm]
Appears in:
| Field | Description |
|---|---|
auto |
|
sglang |
|
trtllm |
|
vllm |
Underlying type: string
DGDRPhase represents the lifecycle phase of a DynamoGraphDeploymentRequest.
Validation:
- Enum: [Pending Profiling Ready Deploying Deployed Failed]
Appears in:
| Field | Description |
|---|---|
Pending |
|
Profiling |
|
Ready |
|
Deploying |
|
Deployed |
|
Failed |
DeploymentInfoStatus tracks the state of the deployed DynamoGraphDeployment.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
replicas integer |
Replicas is the desired number of replicas. | Optional: {} |
|
availableReplicas integer |
AvailableReplicas is the number of replicas that are available and ready. | Optional: {} |
DynamoGraphDeploymentRequest is the Schema for the dynamographdeploymentrequests API. It provides a simplified, SLA-driven interface for deploying inference models on Dynamo. Users specify a model and optional performance targets; the controller handles profiling, configuration selection, and deployment.
Lifecycle:
- Pending: Spec validated, preparing for profiling
- Profiling: Profiling job is running to discover optimal configurations
- Ready: Profiling complete, generated DGD spec available in status
- Deploying: DGD is being created and rolled out (when autoApply=true)
- Deployed: DGD is running and healthy
- Failed: An unrecoverable error occurred
| Field | Description | Default | Validation |
|---|---|---|---|
apiVersion string |
nvidia.com/v1beta1 |
||
kind string |
DynamoGraphDeploymentRequest |
||
metadata ObjectMeta |
Refer to Kubernetes API documentation for fields of metadata. |
||
spec DynamoGraphDeploymentRequestSpec |
Spec defines the desired state for this deployment request. | ||
status DynamoGraphDeploymentRequestStatus |
Status reflects the current observed state of this deployment request. |
DynamoGraphDeploymentRequestSpec defines the desired state of a DynamoGraphDeploymentRequest. Only the Model field is required; all other fields are optional and have sensible defaults.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
model string |
Model specifies the model to deploy (e.g., "Qwen/Qwen3-0.6B", "meta-llama/Llama-3-70b"). Can be a HuggingFace ID or a private model name. |
MinLength: 1 Required: {} |
|
backend BackendType |
Backend specifies the inference backend to use for profiling and deployment. | auto | Enum: [auto sglang trtllm vllm] Optional: {} |
image string |
Image is the container image reference for the profiling job (frontend image). Example: "nvcr.io/nvidia/ai-dynamo/dynamo-frontend:1.0.0". |
Optional: {} |
|
modelCache ModelCacheSpec |
ModelCache provides optional PVC configuration for pre-downloaded model weights. When provided, weights are loaded from the PVC instead of downloading from HuggingFace. |
Optional: {} |
|
hardware HardwareSpec |
Hardware describes the hardware resources available for profiling and deployment. Typically auto-filled by the operator from cluster discovery. |
Optional: {} |
|
workload WorkloadSpec |
Workload defines the expected workload characteristics for SLA-based profiling. | Optional: {} |
|
sla SLASpec |
SLA defines service-level agreement targets that drive profiling optimization. | Optional: {} |
|
overrides OverridesSpec |
Overrides allows customizing the profiling job and the generated DynamoGraphDeployment. | Optional: {} |
|
features FeaturesSpec |
Features controls optional Dynamo platform features in the generated deployment. | Optional: {} |
|
searchStrategy SearchStrategy |
SearchStrategy controls the profiling search depth. "rapid" performs a fast sweep; "thorough" explores more configurations. |
rapid | Enum: [rapid thorough] Optional: {} |
autoApply boolean |
AutoApply indicates whether to automatically create a DynamoGraphDeployment after profiling completes. If false, the generated spec is stored in status for manual review and application. |
true | Optional: {} |
DynamoGraphDeploymentRequestStatus represents the observed state of a DynamoGraphDeploymentRequest.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
phase DGDRPhase |
Phase is the high-level lifecycle phase of the deployment request. | Enum: [Pending Profiling Ready Deploying Deployed Failed] Optional: {} |
|
profilingPhase ProfilingPhase |
ProfilingPhase indicates the current sub-phase of the profiling pipeline. Only meaningful when Phase is "Profiling". Cleared when profiling completes or fails. |
Enum: [Initializing SweepingPrefill SweepingDecode SelectingConfig BuildingCurves GeneratingDGD Done] Optional: {} |
|
dgdName string |
DGDName is the name of the generated or created DynamoGraphDeployment. | Optional: {} |
|
profilingJobName string |
ProfilingJobName is the name of the Kubernetes Job running the profiler. | Optional: {} |
|
conditions Condition array |
Conditions contains the latest observed conditions of the deployment request. Standard condition types include: Succeeded, Validation, Profiling, SpecGenerated, DeploymentReady. |
Optional: {} |
|
profilingResults ProfilingResultsStatus |
ProfilingResults contains the output of the profiling process including Pareto-optimal configurations and the selected deployment configuration. |
Optional: {} |
|
deploymentInfo DeploymentInfoStatus |
DeploymentInfo tracks the state of the deployed DynamoGraphDeployment. Populated when a DGD has been created (either via autoApply or manually). |
Optional: {} |
|
observedGeneration integer |
ObservedGeneration is the most recent generation observed by the controller. | Optional: {} |
FeaturesSpec controls optional Dynamo platform features in the generated deployment.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
planner RawExtension |
Planner is the raw SLA planner configuration passed to the planner service. Its schema is defined by dynamo.planner.utils.planner_config.PlannerConfig. Go treats this as opaque bytes; the Planner service validates it at startup. The presence of this field (non-null) enables the planner in the generated DGD. |
Type: object Optional: {} |
|
mocker MockerSpec |
Mocker configures the simulated (mocker) backend for testing without GPUs. | Optional: {} |
HardwareSpec describes the hardware resources available for profiling and deployment. These fields are typically auto-filled by the operator from cluster discovery.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
gpuSku string |
GPUSKU is the GPU SKU identifier (e.g., "H100_SXM", "A100_80GB"). | Optional: {} |
|
vramMb float |
VRAMMB is the VRAM per GPU in MiB. | Optional: {} |
|
totalGpus integer |
TotalGPUs is the total number of GPUs available in the cluster. | Optional: {} |
|
numGpusPerNode integer |
NumGPUsPerNode is the number of GPUs per node. | Optional: {} |
MockerSpec configures the simulated (mocker) backend.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
enabled boolean |
Enabled indicates whether to deploy mocker workers instead of real inference workers. Useful for large-scale testing without GPUs. |
Optional: {} |
ModelCacheSpec references a PVC containing pre-downloaded model weights.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
pvcName string |
PVCName is the name of the PersistentVolumeClaim containing model weights. The PVC must exist in the same namespace as the DGDR. |
Optional: {} |
|
pvcModelPath string |
PVCModelPath is the path to the model checkpoint directory within the PVC (e.g. "deepseek-r1" or "models/Llama-3.1-405B-FP8"). |
Optional: {} |
|
pvcMountPath string |
PVCMountPath is the mount path for the PVC inside the container. | /opt/model-cache | Optional: {} |
Underlying type: string
OptimizationType specifies the profiling optimization strategy.
Validation:
- Enum: [latency throughput]
Appears in:
| Field | Description |
|---|---|
latency |
|
throughput |
OverridesSpec allows customizing the profiling job and the generated DynamoGraphDeployment.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
profilingJob JobSpec |
ProfilingJob allows overriding the profiling Job specification. Fields set here are merged into the controller-generated Job spec. |
Optional: {} |
|
dgd RawExtension |
DGD allows providing a full or partial nvidia.com/v1alpha1 DynamoGraphDeployment to use as the base for the generated deployment. Fields from profiling results are merged on top. Use this to override backend worker images. The field is stored as a raw embedded resource rather than a typed *v1alpha1.DynamoGraphDeployment to avoid a circular import: v1alpha1 already imports v1beta1 as the conversion hub and Go does not allow import cycles. The EmbeddedResource marker tells the API server to validate that the value is a well-formed Kubernetes object (has apiVersion/kind), but does not enforce that it is specifically a DynamoGraphDeployment. Full type validation (correct apiVersion, kind, and field schema) is performed by the controller during reconciliation. |
EmbeddedResource: {} Optional: {} |
ParetoConfig represents a single Pareto-optimal deployment configuration discovered during profiling.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
config RawExtension |
Config is the full deployment configuration for this Pareto point. | Type: object |
Underlying type: string
ProfilingPhase represents a sub-phase within the profiling pipeline. When the DGDR Phase is "Profiling", this value indicates which step of the profiling pipeline is currently executing.
Validation:
- Enum: [Initializing SweepingPrefill SweepingDecode SelectingConfig BuildingCurves GeneratingDGD Done]
Appears in:
| Field | Description |
|---|---|
Initializing |
Profiler is loading the DGD template, detecting GPU hardware, and resolving the model architecture from HuggingFace. |
SweepingPrefill |
Sweeping parallelization strategies (TP/TEP/DEP) across GPU counts for prefill, measuring TTFT at each configuration. |
SweepingDecode |
Sweeping parallelization strategies and concurrency levels for decode, measuring ITL at each configuration. |
SelectingConfig |
Filtering results against SLA targets and selecting the most cost-efficient configuration that meets TTFT/ITL requirements. |
BuildingCurves |
Building detailed interpolation curves (ISL→TTFT for prefill, KV-usage×context-length→ITL for decode) using the selected configs. |
GeneratingDGD |
Packaging profiling data into a ConfigMap and generating the final DGD YAML with planner integration. |
Done |
Profiling pipeline finished successfully. |
ProfilingResultsStatus contains the output of the profiling process.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
pareto ParetoConfig array |
Pareto is the list of Pareto-optimal deployment configurations discovered during profiling. Each entry represents a different cost/performance trade-off. |
Optional: {} |
|
selectedConfig RawExtension |
SelectedConfig is the recommended configuration chosen by the profiler based on the SLA targets. This is the configuration used for deployment when autoApply is true. |
Type: object Optional: {} |
SLASpec defines the service-level agreement targets for profiling optimization. Exactly one mode should be active: ttft+itl (default), e2eLatency, or optimizationType.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
optimizationType OptimizationType |
OptimizationType controls the profiling optimization strategy. Use when explicit SLA targets (ttft+itl or e2eLatency) are not known. |
Enum: [latency throughput] Optional: {} |
|
ttft float |
TTFT is the Time To First Token target in milliseconds. | Optional: {} |
|
itl float |
ITL is the Inter-Token Latency target in milliseconds. | Optional: {} |
|
e2eLatency float |
E2ELatency is the target end-to-end request latency in milliseconds. Alternative to specifying TTFT + ITL. |
Optional: {} |
Underlying type: string
SearchStrategy controls the profiling search depth.
Validation:
- Enum: [rapid thorough]
Appears in:
| Field | Description |
|---|---|
rapid |
|
thorough |
WorkloadSpec defines the workload characteristics for SLA-based profiling.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
isl integer |
ISL is the Input Sequence Length (number of tokens). | 4000 | Optional: {} |
osl integer |
OSL is the Output Sequence Length (number of tokens). | 1000 | Optional: {} |
concurrency float |
Concurrency is the target concurrency level. Required (or RequestRate) when the planner is disabled. |
Optional: {} |
|
requestRate float |
RequestRate is the target request rate (req/s). Required (or Concurrency) when the planner is disabled. |
Optional: {} |
Underlying type: string
CertProvisionMode controls how webhook TLS certificates are managed.
Appears in:
| Field | Description |
|---|---|
auto |
CertProvisionModeAuto uses the built-in cert-controller to generate and rotate certificates. |
manual |
CertProvisionModeManual expects certificates to be provided externally (e.g., cert-manager, admin). |
CheckpointConfiguration holds checkpoint/restore settings.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
enabled boolean |
Enabled indicates if checkpoint functionality is enabled | ||
readyForCheckpointFilePath string |
ReadyForCheckpointFilePath signals model readiness for checkpoint jobs | /tmp/ready-for-checkpoint | |
storage CheckpointStorageConfiguration |
Storage holds storage backend configuration |
CheckpointOCIConfig holds OCI registry storage configuration.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
uri string |
URI is the OCI URI (oci://registry/repository) | ||
credentialsSecretRef string |
CredentialsSecretRef is the name of the docker config secret |
CheckpointPVCConfig holds PVC storage configuration.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
pvcName string |
PVCName is the name of the PVC | chrek-pvc | |
basePath string |
BasePath is the base directory within the PVC | /checkpoints |
CheckpointS3Config holds S3 storage configuration.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
uri string |
URI is the S3 URI (s3://[endpoint/]bucket/prefix) | ||
credentialsSecretRef string |
CredentialsSecretRef is the name of the credentials secret |
CheckpointStorageConfiguration holds storage backend configuration for checkpoints.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
type string |
Type is the storage backend type: pvc, s3, or oci | pvc | |
pvc CheckpointPVCConfig |
PVC configuration (used when Type=pvc) | ||
s3 CheckpointS3Config |
S3 configuration (used when Type=s3) | ||
oci CheckpointOCIConfig |
OCI configuration (used when Type=oci) |
Underlying type: string
DiscoveryBackend is the type for the discovery backend.
Appears in:
| Field | Description |
|---|---|
kubernetes |
DiscoveryBackendKubernetes is the Kubernetes discovery backend |
etcd |
DiscoveryBackendEtcd is the etcd discovery backend |
DiscoveryConfiguration holds discovery backend settings.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
backend DiscoveryBackend |
Backend is the discovery backend: "kubernetes" or "etcd" | kubernetes |
GPUConfiguration holds GPU discovery settings.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
discoveryEnabled boolean |
DiscoveryEnabled indicates whether GPU discovery is enabled | true |
GroveConfiguration holds Grove orchestrator settings.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
enabled boolean |
Enabled overrides auto-detection. nil = auto-detect. | ||
terminationDelay Duration |
TerminationDelay configures the termination delay for Grove PodCliqueSets | 15m |
InfrastructureConfiguration holds service mesh and backend addresses.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
natsAddress string |
NATSAddress is the address of the NATS server | ||
etcdAddress string |
ETCDAddress is the address of the etcd server | ||
modelExpressURL string |
ModelExpressURL is the URL of the Model Express server to inject into all pods | ||
prometheusEndpoint string |
PrometheusEndpoint is the URL of the Prometheus endpoint to use for metrics |
IngressConfiguration holds ingress settings.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
virtualServiceGateway string |
VirtualServiceGateway is the name of the Istio virtual service gateway | ||
controllerClassName string |
ControllerClassName is the ingress controller class name | ||
controllerTLSSecretName string |
ControllerTLSSecretName is the TLS secret for the ingress controller | ||
hostSuffix string |
HostSuffix is the suffix for ingress hostnames |
KaiSchedulerConfiguration holds Kai-scheduler settings.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
enabled boolean |
Enabled overrides auto-detection. nil = auto-detect. |
LWSConfiguration holds LWS orchestrator settings.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
enabled boolean |
Enabled overrides auto-detection. nil = auto-detect. |
LeaderElectionConfiguration holds leader election settings.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
enabled boolean |
Enabled enables leader election for controller manager | false | |
id string |
ID is the leader election resource identity | ||
namespace string |
Namespace is the namespace for the leader election resource |
LoggingConfiguration holds logging settings.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
level string |
Level is the log level (e.g., "info", "debug") | info | |
format string |
Format is the log format (e.g., "json", "text") | json |
MPIConfiguration holds MPI SSH secret settings.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
sshSecretName string |
SSHSecretName is the name of the secret containing the SSH key for MPI | ||
sshSecretNamespace string |
SSHSecretNamespace is the namespace where the MPI SSH secret is located |
MetricsServer extends Server with secure serving option.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
bindAddress string |
BindAddress is the address the server binds to | ||
port integer |
Port is the port the server listens on | ||
secure boolean |
Secure enables secure serving for the metrics endpoint |
NamespaceConfiguration determines operator namespace mode.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
restricted string |
Restricted is the namespace to restrict to. Empty = cluster-wide mode. | ||
scope NamespaceScopeConfiguration |
Scope holds namespace scope lease settings (namespace-restricted mode only) |
NamespaceScopeConfiguration holds lease settings for namespace-restricted mode.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
leaseDuration Duration |
LeaseDuration is the duration of namespace scope marker lease before expiration | 30s | |
leaseRenewInterval Duration |
LeaseRenewInterval is the interval for renewing namespace scope marker lease | 10s |
OperatorConfiguration is the Schema for the operator configuration.
| Field | Description | Default | Validation |
|---|---|---|---|
apiVersion string |
operator.config.dynamo.nvidia.com/v1alpha1 |
||
kind string |
OperatorConfiguration |
||
server ServerConfiguration |
Server configuration (metrics, health probes, webhooks) | ||
leaderElection LeaderElectionConfiguration |
Leader election configuration | ||
namespace NamespaceConfiguration |
Namespace configuration (restricted vs cluster-wide) | ||
orchestrators OrchestratorConfiguration |
Orchestrator configuration with optional overrides | ||
infrastructure InfrastructureConfiguration |
Service mesh and infrastructure addresses | ||
ingress IngressConfiguration |
Ingress configuration | ||
rbac RBACConfiguration |
RBAC configuration for cross-namespace resource management (cluster-wide mode) | ||
mpi MPIConfiguration |
MPI SSH secret configuration | ||
checkpoint CheckpointConfiguration |
Checkpoint/restore configuration | ||
discovery DiscoveryConfiguration |
Discovery backend configuration | ||
gpu GPUConfiguration |
GPU discovery configuration | ||
logging LoggingConfiguration |
Logging configuration | ||
security SecurityConfiguration |
HTTP/2 and TLS settings |
OrchestratorConfiguration holds orchestrator override settings.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
grove GroveConfiguration |
Grove orchestrator configuration | ||
lws LWSConfiguration |
LWS orchestrator configuration | ||
kaiScheduler KaiSchedulerConfiguration |
KaiScheduler configuration |
RBACConfiguration holds RBAC settings for cluster-wide mode.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
plannerClusterRoleName string |
PlannerClusterRoleName is the ClusterRole for planner | ||
dgdrProfilingClusterRoleName string |
DGDRProfilingClusterRoleName is the ClusterRole for DGDR profiling jobs | ||
eppClusterRoleName string |
EPPClusterRoleName is the ClusterRole for EPP |
SecurityConfiguration holds HTTP/2 and TLS settings.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
enableHTTP2 boolean |
EnableHTTP2 enables HTTP/2 for metrics and webhook servers | false |
Server holds a bind address and port.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
bindAddress string |
BindAddress is the address the server binds to | ||
port integer |
Port is the port the server listens on |
ServerConfiguration holds server bind addresses and ports.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
metrics MetricsServer |
Metrics server configuration | { bindAddress:127.0.0.1 port:8080 } | |
healthProbe Server |
Health probe server configuration | { bindAddress:0.0.0.0 port:8081 } | |
webhook WebhookServer |
Webhook server configuration | { certDir:/tmp/k8s-webhook-server/serving-certs host:0.0.0.0 port:9443 } |
WebhookServer extends Server with host and certificate directory.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
bindAddress string |
BindAddress is the address the server binds to | ||
port integer |
Port is the port the server listens on | ||
host string |
Host is the address the webhook server binds to | ||
certDir string |
CertDir is the directory containing TLS certificates | ||
certProvisionMode CertProvisionMode |
CertProvisionMode controls certificate management: "auto" (built-in cert-controller) or "manual" (external) | auto | |
secretName string |
SecretName is the name of the Kubernetes Secret holding webhook TLS certificates | webhook-server-cert | |
serviceName string |
ServiceName is the name of the Kubernetes Service fronting the webhook server. Used to generate certificate SANs. Set by the Helm chart. |
The Dynamo operator automatically applies default values to various fields when they are not explicitly specified in your deployments. These defaults include:
-
Health Probes: Startup, liveness, and readiness probes are configured differently for frontend, worker, and planner components. For example, worker components receive a startup probe with a 2-hour timeout (720 failures × 10 seconds) to accommodate long model loading times.
-
Security Context: All components receive
fsGroup: 1000by default to ensure proper file permissions for mounted volumes. This can be overridden via theextraPodSpec.securityContextfield. -
Shared Memory: All components receive an 8Gi shared memory volume mounted at
/dev/shmby default (can be disabled or resized via thesharedMemoryfield). -
Environment Variables: Components automatically receive environment variables like
DYN_NAMESPACE,DYN_PARENT_DGD_K8S_NAME,DYNAMO_PORT, and backend-specific variables. -
Pod Configuration: Default
terminationGracePeriodSecondsof 60 seconds andrestartPolicy: Always. -
Autoscaling: When enabled without explicit metrics, defaults to CPU-based autoscaling with 80% target utilization.
-
Backend-Specific Behavior: For multinode deployments, probes are automatically modified or removed for worker nodes depending on the backend framework (VLLM, SGLang, or TensorRT-LLM).
All components receive the following pod-level defaults unless overridden:
terminationGracePeriodSeconds:60secondsrestartPolicy:Always
The operator automatically applies default security context settings to all components to ensure proper file permissions, particularly for mounted volumes:
fsGroup:1000- Sets the group ownership of mounted volumes and any files created in those volumes
This default ensures that non-root containers can write to mounted volumes (like model caches or persistent storage) without permission issues. The fsGroup setting is particularly important for:
- Model downloads and caching
- Compilation cache directories
- Persistent volume claims (PVCs)
- SSH key generation in multinode deployments
To override the default security context, specify your own securityContext in the extraPodSpec of your component:
services:
YourWorker:
extraPodSpec:
securityContext:
fsGroup: 2000 # Custom group ID
runAsUser: 1000
runAsGroup: 1000
runAsNonRoot: trueImportant: When you provide any securityContext object in extraPodSpec, the operator will not inject any defaults. This gives you complete control over the security context, including the ability to run as root (by omitting runAsNonRoot or setting it to false).
In OpenShift environments with Security Context Constraints (SCCs), you may need to omit explicit UID/GID values to allow OpenShift's admission controllers to assign them dynamically:
services:
YourWorker:
extraPodSpec:
securityContext:
# Omit fsGroup to let OpenShift assign it based on SCC
# OpenShift will inject the appropriate UID rangeAlternatively, if you want to keep the default fsGroup: 1000 behavior and are certain your cluster allows it, you don't need to specify anything - the operator defaults will work.
Shared memory is enabled by default for all components:
- Enabled:
true(unless explicitly disabled viasharedMemory.disabled) - Size:
8Gi - Mount Path:
/dev/shm - Volume Type:
emptyDirwithmemorymedium
To disable shared memory or customize the size, use the sharedMemory field in your component specification.
The operator applies different default health probes based on the component type.
Frontend components receive the following probe configurations:
Liveness Probe:
- Type: HTTP GET
- Path:
/health - Port:
http(8000) - Initial Delay: 60 seconds
- Period: 60 seconds
- Timeout: 30 seconds
- Failure Threshold: 10
Readiness Probe:
- Type: Exec command
- Command:
curl -s http://localhost:${DYNAMO_PORT}/health | jq -e ".status == \"healthy\"" - Initial Delay: 60 seconds
- Period: 60 seconds
- Timeout: 30 seconds
- Failure Threshold: 10
Worker components receive the following probe configurations:
Liveness Probe:
- Type: HTTP GET
- Path:
/live - Port:
system(9090) - Period: 5 seconds
- Timeout: 30 seconds
- Failure Threshold: 1
Readiness Probe:
- Type: HTTP GET
- Path:
/health - Port:
system(9090) - Period: 10 seconds
- Timeout: 30 seconds
- Failure Threshold: 60
Startup Probe:
- Type: HTTP GET
- Path:
/live - Port:
system(9090) - Period: 10 seconds
- Timeout: 5 seconds
- Failure Threshold: 720 (allows up to 2 hours for startup: 10s × 720 = 7200s)
:::{note}
For larger models (typically >70B parameters) or slower storage systems, you may need to increase the failureThreshold to allow more time for model loading. Calculate the required threshold based on your expected startup time: failureThreshold = (expected_startup_seconds / period). Override the startup probe in your component specification if the default 2-hour window is insufficient.
:::
For multinode deployments, the operator modifies probes based on the backend framework and node role:
The operator automatically selects between two deployment modes based on parallelism configuration:
Tensor/Pipeline Parallel Mode (when world_size > GPUs_per_node):
- Uses Ray for distributed execution (
--distributed-executor-backend ray) - Leader nodes: Starts Ray head and runs vLLM; all probes remain active
- Worker nodes: Run Ray agents only; all probes (liveness, readiness, startup) are removed
Data Parallel Mode (when world_size × data_parallel_size > GPUs_per_node):
- Worker nodes: All probes (liveness, readiness, startup) are removed
- Leader nodes: All probes remain active
- Worker nodes: All probes (liveness, readiness, startup) are removed
- Leader nodes: All probes remain unchanged
- Worker nodes:
- Liveness and startup probes are removed
- Readiness probe is replaced with a TCP socket check on SSH port (2222):
- Initial Delay: 20 seconds
- Period: 20 seconds
- Timeout: 5 seconds
- Failure Threshold: 10
The operator automatically injects environment variables into component containers based on component type, backend framework, and operator configuration. User-provided envs values always take precedence over operator defaults.
These environment variables are injected into every component container regardless of type.
| Variable | Purpose | Default | Type | Source |
|---|---|---|---|---|
DYN_NAMESPACE |
Dynamo service namespace used for service discovery and routing | Derived from DGD spec | string |
Downward API annotation on checkpoint-restored pods |
DYN_COMPONENT |
Identifies the component type for runtime behavior | One of: frontend, worker, prefill, decode, planner, epp |
string |
Set from component spec |
DYN_PARENT_DGD_K8S_NAME |
Kubernetes name of the parent DynamoGraphDeployment resource | — | string |
Set from DGD metadata |
DYN_PARENT_DGD_K8S_NAMESPACE |
Kubernetes namespace of the parent DynamoGraphDeployment resource | — | string |
Set from DGD metadata |
POD_NAME |
Current pod name | — | string |
Downward API (metadata.name) |
POD_NAMESPACE |
Current pod namespace | — | string |
Downward API (metadata.namespace) |
POD_UID |
Current pod UID | — | string |
Downward API (metadata.uid) |
DYN_DISCOVERY_BACKEND |
Service discovery backend for inter-component communication | kubernetes |
string |
Options: kubernetes, etcd |
These are injected into all components when the corresponding infrastructure service is configured in the operator's OperatorConfiguration.
| Variable | Purpose | Default | Type | Condition |
|---|---|---|---|---|
NATS_SERVER |
NATS messaging server address | — | string |
Set when infrastructure.natsAddress is configured |
ETCD_ENDPOINTS |
etcd endpoint addresses for distributed state | — | string |
Set when infrastructure.etcdAddress is configured |
MODEL_EXPRESS_URL |
Model Express service URL for model management | — | string |
Set when infrastructure.modelExpressURL is configured |
PROMETHEUS_ENDPOINT |
Prometheus endpoint for metrics collection | — | string |
Set when infrastructure.prometheusEndpoint is configured |
| Variable | Purpose | Default | Type |
|---|---|---|---|
DYNAMO_PORT |
HTTP port the frontend listens on | 8000 |
int |
DYN_HTTP_PORT |
HTTP port for the frontend service (alias) | 8000 |
int |
DYN_NAMESPACE_PREFIX |
Namespace prefix used for frontend request routing | Same as DYN_NAMESPACE |
string |
| Variable | Purpose | Default | Type |
|---|---|---|---|
DYN_SYSTEM_ENABLED |
Enables the system HTTP server for health checks and metrics | true |
string (boolean) |
DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS |
Endpoints whose health status is used for readiness | ["generate"] |
string (JSON array) |
DYN_SYSTEM_PORT |
Port for the system HTTP server (health, metrics) | 9090 |
int |
DYN_HEALTH_CHECK_ENABLED |
Disables the legacy health check mechanism in favor of the system server | false |
string (boolean) |
NIXL_TELEMETRY_ENABLE |
Enables or disables NIXL telemetry collection | n |
string |
NIXL_TELEMETRY_EXPORTER |
Telemetry exporter format for NIXL metrics | prometheus |
string |
NIXL_TELEMETRY_PROMETHEUS_PORT |
Port for NIXL Prometheus metrics endpoint | 19090 |
int |
DYN_NAMESPACE_WORKER_SUFFIX |
Hash suffix appended to worker namespace for rolling updates | — | string |
| Variable | Purpose | Default | Type |
|---|---|---|---|
PLANNER_PROMETHEUS_PORT |
Port for the planner's Prometheus metrics endpoint | 9085 |
int |
| Variable | Purpose | Default | Type |
|---|---|---|---|
USE_STREAMING |
Enables streaming mode for inference request proxying | true |
string (boolean) |
RUST_LOG |
Rust log level and filter configuration | debug,dynamo_llm::kv_router=trace |
string |
| Variable | Purpose | Default | Type | Condition |
|---|---|---|---|---|
VLLM_CACHE_ROOT |
Directory for vLLM compilation cache artifacts | — | string |
Set when a volume mount has useAsCompilationCache: true |
VLLM_NIXL_SIDE_CHANNEL_HOST |
Host IP for the NIXL side channel in multiprocessing mode | Pod IP | string |
Multinode mp backend only (Downward API: status.podIP) |
| Variable | Purpose | Default | Type | Condition |
|---|---|---|---|---|
OMPI_MCA_orte_keep_fqdn_hostnames |
Instructs OpenMPI to preserve FQDN hostnames for inter-node communication | 1 |
string |
Multinode deployments only |
These environment variables are injected when checkpoint/restore is enabled for a component.
| Variable | Purpose | Default | Type | Condition |
|---|---|---|---|---|
DYN_CHECKPOINT_PATH |
Base directory where checkpoint data is stored | From operator checkpoint config storage.pvc.basePath |
string |
PVC storage type |
DYN_CHECKPOINT_LOCATION |
Full checkpoint URI (for non-PVC backends) | — | string |
S3 or OCI storage type |
DYN_CHECKPOINT_HASH |
Identity hash that uniquely identifies the checkpoint | — | string |
Always set when checkpoint is enabled |
SKIP_WAIT_FOR_CHECKPOINT |
Skips the checkpoint readiness polling loop; checks once and proceeds | — | string |
Set on restored and DGD pods |
The following component types automatically receive dedicated service accounts:
- Planner:
planner-serviceaccount - EPP:
epp-serviceaccount
The operator automatically discovers and injects image pull secrets for container images. When a component specifies a container image, the operator:
- Scans all Kubernetes secrets of type
kubernetes.io/dockerconfigjsonin the component's namespace - Extracts the docker registry server URLs from each secret's authentication configuration
- Matches the container image's registry host against the discovered registry URLs
- Automatically injects matching secrets as
imagePullSecretsin the pod specification
This eliminates the need to manually specify image pull secrets for each component. The operator maintains an internal index of docker secrets and their associated registries, refreshing this index periodically.
To disable automatic image pull secret discovery for a specific component, add the following annotation:
annotations:
nvidia.com/disable-image-pull-secret-discovery: "true"When autoscaling is enabled but no metrics are specified, the operator applies:
- Default Metric: CPU utilization
- Target Average Utilization:
80%
Default container ports are configured based on component type:
- Port: 8000
- Protocol: TCP
- Name:
http
- Port: 9090 (system)
- Protocol: TCP
- Name:
system - Port: 19090 (NIXL)
- Protocol: TCP
- Name:
nixl
- Port: 9085
- Protocol: TCP
- Name:
metrics
- Port: 9002 (gRPC)
- Protocol: TCP
- Name:
grpc - Port: 9003 (gRPC health)
- Protocol: TCP
- Name:
grpc-health - Port: 9090 (metrics)
- Protocol: TCP
- Name:
metrics
- Ray Head Port: 6379 (for Ray cluster coordination in multinode TP/PP deployments)
- Data Parallel RPC Port: 13445 (for data parallel multinode deployments)
- Distribution Init Port: 29500 (for multinode deployments)
- SSH Port: 2222 (for multinode MPI communication)
- OpenMPI Environment:
OMPI_MCA_orte_keep_fqdn_hostnames=1
For users who want to understand the implementation details or contribute to the operator, the default values described in this document are set in the following source files:
- Health Probes, Security Context & Pod Specifications:
internal/dynamo/graph.go- Contains the main logic for applying default probes, security context, environment variables, shared memory, and pod configurations - Component-Specific Defaults:
internal/dynamo/component_common.go- Base container and pod spec shared by all component typesinternal/dynamo/component_frontend.gointernal/dynamo/component_worker.gointernal/dynamo/component_planner.gointernal/dynamo/component_epp.go
- Image Pull Secrets:
internal/secrets/docker.go- Implements the docker secret indexer and automatic discovery - Backend-Specific Behavior:
- Checkpoint / Restore:
internal/checkpoint/dgd_integration.go- Checkpoint env var injection and volume setup - Constants & Annotations:
internal/consts/consts.go- Defines annotation keys and other constants
- All these defaults can be overridden by explicitly specifying values in your DynamoComponentDeployment or DynamoGraphDeployment resources
- User-specified probes (via
livenessProbe,readinessProbe, orstartupProbefields) take precedence over operator defaults - For security context, if you provide any
securityContextinextraPodSpec, no defaults will be injected, giving you full control - For multinode deployments, some defaults are modified or removed as described above to accommodate distributed execution patterns
- The
extraPodSpec.mainContainerfield can be used to override probe configurations set by the operator