Skip to content

Commit 80bd7d4

Browse files
akolbinclaude
andauthored
Add GPU cost overview dashboards (DataDog#23455)
* Add GPU cost overview dashboards Adds 5 dashboards to the GPU integration for monitoring GPU compute spend and utilization across cloud providers and Kubernetes. Dashboards: - gpu_cost_overview: cross-cloud totals, spend by team/env, fleet utilization - aws_gpu_cost_overview: AWS-specific with Capacity Block tracking - azure_gpu_cost_overview: Azure GPU VM families (NC, NCv3, ND series) - gcp_gpu_cost_overview: GCP GPU SKUs with On-Demand vs Committed coverage - k8s_gpu_cost_overview: cluster/namespace allocation, idle cost attribution Cost queries span cloud_cost amortized metrics plus unblended for AWS Capacity Blocks (which are not captured in amortized). Utilization widgets join GPU telemetry (gpu.sm_active, gpu.device.total) with cost data for unit-economics views. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Refine GPU utilization metrics and dashboard polish - Switch unhealthy KPIs to GPU Idle % (gr_engine_active) since gpu.device.unhealthy is non-functional - Add Healthy GPU Rate KPI on k8s using kubernetes_state.node.gpu_allocatable / gpu_capacity - Add Spend on Idle GPUs KPI to per-cloud and overview dashboards (excludes AWS Capacity Blocks since their cost is upfront and unrelated to engine activity) - Standardize utilization terminology: "Average GPU Utilization %" and "GPU Idle %" across dashboards - Drop redundant cloud provider prefix from per-cloud widget titles and remove the "GPU Spend" group wrapper to match k8s dashboard layout Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add GPU Monitoring setup CTA banner to per-cloud and k8s dashboards Mirrors the existing banner from the overview dashboard so users on any cloud-specific or Kubernetes view see the link to enable GPU Monitoring, which populates utilization-driven widgets. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 2f31885 commit 80bd7d4

6 files changed

Lines changed: 4987 additions & 1 deletion

File tree

0 commit comments

Comments
 (0)