Skip to content

Add Ingero - eBPF GPU causal observability for production ML#773

Open
dml37 wants to merge 1 commit into
EthicalML:masterfrom
dml37:add-ingero
Open

Add Ingero - eBPF GPU causal observability for production ML#773
dml37 wants to merge 1 commit into
EthicalML:masterfrom
dml37:add-ingero

Conversation

@dml37
Copy link
Copy Markdown

@dml37 dml37 commented May 17, 2026

Adds Ingero to Evaluation and Monitoring.

Ingero is an open-source (Apache 2.0 + GPL-2.0) eBPF agent and MCP server purpose-built for production ML observability. It traces the causal chain from Linux kernel events through CUDA API calls to Python source lines, so a stall in forward() can be attributed to (e.g.) cudaMalloc spiking under host-side CPU contention.

  • One binary, <2% overhead, zero code changes, runs as a DaemonSet on Kubernetes.
  • Used to debug PyTorch / vLLM training and inference stalls.
  • Already featured in awesome-ebpf, awesome-observability, awesome-opentelemetry, awesome-sre-tools, Awesome-GPU, awesome-mcp-servers, and others.

Repo: https://github.com/ingero-io/ingero

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant