GPU capacity and allocation in status

todo after time-slicing feature is added

kubectl get gpu shows total GPUs, how many in use, across which namespaces


Before
NAME   READY   REASON   DRIVER VERSION   NODES READY   AGE
gpu       True       Ready       590                         1                          5m

After (with time-slicing, shipped together)
NAME   READY   REASON   DRIVER VERSION   NODES READY   TOTAL GPUs   ALLOCATED   AGE
gpu       True       Ready       590                         3                          12                    8                     5m


TOTAL GPUs
Read from node labels that NVIDIA sets automatically after driver installation - nvidia.com/gpu.count per node, summed across all GPU nodes. We don't set this, we read it. When time-slicing is active NVIDIA advertises virtual GPUs instead of physical ones, so this number already reflects the sharing configuration. No user action needed.

ALLOCATED
Computed by listing all running pods across all namespaces and summing nvidia.com/gpu resource requests. A pod requesting nvidia.com/gpu: 2 contributes 2 to the count. We don't set this either - we derive it from the cluster state on every reconcile. Reflects current demand, not capacity.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU capacity and allocation in status #69

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

GPU capacity and allocation in status #69

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions