Skip to content

Kyma Telemetry auto-wiring #70

@vrdc-sap

Description

@vrdc-sap

When Kyma Telemetry module is present on the cluster, automatically create a MetricPipeline CR that scrapes DCGM metrics from the gpu-operator namespace and forwards them to the user's existing metrics backend.

NVIDIA DCGM Exporter already ships GPU metrics (utilization, memory, temperature) inside the gpu-operator namespace. Today they go nowhere. With this feature, enabling the GPU module is enough - metrics appear in the user's dashboards automatically with zero additional configuration.

How

  1. On every reconcile, check if MetricPipeline CRD exists on the cluster
  2. If Telemetry is present, create a MetricPipeline pointing at gpu-operator namespace, forwarding to Kyma's internal OTel Collector
  3. Delete the pipeline when GPU module is deleted

Open question

Confirm the internal Telemetry OTel Collector service endpoint - this is what we point the pipeline output at so metrics flow to whatever backend the user already configured in Telemetry.

Acceptance criteria

  • Telemetry not installed -> no error, no pipeline, silent skip
  • Telemetry installed -> MetricPipeline created automatically on GPU module install
  • GPU module deleted -> MetricPipeline cleaned up
  • Telemetry installed after GPU module -> next reconcile picks it up
  • GPU metrics visible in user's backend with no manual steps

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions