add OTEL-compatible tracing + make targets + scripts#139
add OTEL-compatible tracing + make targets + scripts#139djs55 wants to merge 19 commits intocontainerd:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds OpenTelemetry (OTel) tracing across the VM/container startup path, including relaying VM-originated spans back to the host, plus developer tooling (Jaeger + Mermaid Gantt) and an integration test to visualize traces.
Changes:
- Introduce a VM-side TTRPC traces service + channel-backed OTel span exporter to stream spans out of the VM.
- Instrument key host/shim + VM code paths with spans and propagate context over TTRPC.
- Add tracing integration test, Jaeger/Makefile helpers, Mermaid conversion script, and tracing documentation.
Reviewed changes
Copilot reviewed 20 out of 21 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| plugins/vminit/tracing/plugin.go | Registers VM tracing plugin and installs OTel tracer provider/exporter. |
| internal/vminit/tracing/service.go | TTRPC streaming service that forwards exported spans to the host. |
| internal/vminit/tracing/exporter.go | Channel-backed OTel SpanExporter implementation for VM span buffering. |
| internal/vminit/task/service.go | Adds VM-side spans around Create/Start and networking wait/connect. |
| internal/vminit/runc/container.go | Adds VM-side spans around container creation and mount setup. |
| internal/vminit/process/init.go | Adds VM-side spans around crun create/start operations. |
| internal/vminit/ctrnetworking/ctrnetworking.go | Adds VM-side spans per-network setup within errgroup goroutines. |
| internal/vm/libkrun/instance.go | Adds host-side spans for VMStart/WaitForTTRPC + TTRPC client interceptor. |
| internal/shim/task/tracing.go | Implements host-side relay of VM span stream to an OTLP HTTP exporter. |
| internal/shim/task/service.go | Adds shim spans and starts VM trace relay during Create. |
| integration/tracing_test.go | Build-tagged integration test that exports spans to Jaeger and relays VM spans. |
| hack/trace-to-mermaid.py | Fetches latest Jaeger trace and renders it as a Mermaid Gantt chart. |
| docs/tracing.md | Documentation for running tracing workflow with Jaeger + Mermaid. |
| cmd/vminitd/main.go | Ensures tracing plugin is loaded by vminitd. |
| api/proto/nerdbox/services/traces/v1/traces.proto | New Traces service + Span/KeyValue messages for VM→host span streaming. |
| api/services/traces/v1/traces.pb.go | Generated protobuf types for traces service. |
| api/services/traces/v1/traces_ttrpc.pb.go | Generated TTRPC bindings for traces service. |
| api/services/traces/v1/doc.go | Package doc for traces API. |
| Makefile | Adds test-tracing, Jaeger lifecycle, UI open, and Gantt generation targets. |
| go.mod | Adds OTel SDK + OTLP HTTP exporter dependencies. |
| go.sum | Records new module checksums for added dependencies. |
Comments suppressed due to low confidence (1)
internal/vm/libkrun/instance.go:337
- This wait loop has multiple
returnpaths before the explicitttrpcWaitSpan.End()call, so the WaitForTTRPC span may never be ended/exported. Use adefer ttrpcWaitSpan.End()right after starting the span, or restructure to end the span before each early return.
_, ttrpcWaitSpan := tracer.Start(ctx, "libkrun.WaitForTTRPC")
startedAt := time.Now()
for {
select {
case err := <-errC:
if err != nil {
return fmt.Errorf("failure running vm: %w", err)
}
case <-ctx.Done():
return ctx.Err()
case <-time.After(time.Millisecond):
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
a7fd67c to
830e59d
Compare
830e59d to
ca1697e
Compare
ca1697e to
f06a31f
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 21 out of 22 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
internal/vm/libkrun/instance.go:341
- Spans started for VMStart/WaitForTTRPC are only ended on the success path. If Start() returns early (errC error, ctx.Done(), or vmStartTimeout), these spans never end, which produces incomplete traces and can leak span state. Consider deferring span.End() immediately after each tracer.Start (or using a single parent span with defer) so all exit paths close the spans.
tracer := otel.Tracer("nerdbox")
_, krunStartSpan := tracer.Start(ctx, "libkrun.VMStart")
errC := make(chan error)
go func() {
defer close(errC)
if err := v.vmc.Start(); err != nil {
errC <- err
}
}()
v.shutdownCallbacks = []func(context.Context) error{
func(context.Context) error {
cerr := v.vmc.Shutdown()
select {
case err := <-errC:
if err != nil {
return fmt.Errorf("failure running vm: %w", err)
}
default:
}
return cerr
},
}
var conn net.Conn
// Initial TTRPC ping deadline. On Windows, the vsock listen-mode proxy
// has more overhead (host→Unix socket→vsock→guest→vsock→Unix socket→host)
// so we start with a longer deadline.
d := 2 * time.Millisecond
if runtime.GOOS == "windows" {
d = 500 * time.Millisecond
}
_, ttrpcWaitSpan := tracer.Start(ctx, "libkrun.WaitForTTRPC")
startedAt := time.Now()
for {
select {
case err := <-errC:
if err != nil {
return fmt.Errorf("failure running vm: %w", err)
}
case <-ctx.Done():
return ctx.Err()
case <-time.After(time.Millisecond):
}
if time.Since(startedAt) > vmStartTimeout {
log.G(ctx).WithField("timeout", vmStartTimeout).Warn("Timeout while waiting for VM to start")
return fmt.Errorf("VM did not start within %s", vmStartTimeout)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
f06a31f to
b45ef30
Compare
b45ef30 to
d5c5c51
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 21 out of 22 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
d5c5c51 to
f70b6f4
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 21 out of 22 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
7087d9a to
415b643
Compare
415b643 to
378cb97
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 21 out of 22 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
internal/vm/libkrun/instance.go:304
errCis unbuffered and the VM start goroutine does a blockingerrC <- err. IfStartreturns early (e.g.,ctx.Done()or timeout) before receiving fromerrC, that goroutine can deadlock forever on send (and never reachclose(errC)), leaking the goroutine and potentially hanging shutdown logic that expects the channel to close. MakeerrCbuffered (size 1) and/or make the send non-blocking so errors can't block when the caller has already returned.
errC := make(chan error)
go func() {
defer close(errC)
if err := v.vmc.Start(); err != nil {
errC <- err
}
}()
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
378cb97 to
e82db3d
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 23 out of 24 changed files in this pull request and generated 16 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
ffb3c23 to
99e7a9f
Compare
82c8526 to
70e56d8
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 23 out of 24 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
internal/vminit/task/service.go:314
- In
Start,ctxis overwritten with the context returned fromvmTracer.StartforwaitForCtrNetConnect, but afterwaitNetSpan.End()the function continues using that derivedctxfor subsequent operations (getContainer, logs, etc.). This again nests later work under an already-ended span. Consider using a scoped variable (e.g.,waitCtx) forwaitForCtrNetConnectand then continuing withstartCtx(or the originalctx) for the rest of the method.
startCtx := ctx // preserve parent context for starting sibling spans
ctx, waitNetSpan := vmTracer.Start(startCtx, "waitForCtrNetConnect")
if err := s.waitForCtrNetConnect(ctx, r.ID); err != nil {
waitNetSpan.End()
return nil, errgrpc.ToGRPC(err)
}
waitNetSpan.End()
container, err := s.getContainer(r.ID)
if err != nil {
return nil, err
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
My main concern with this one is binary size. Luckily it seems the initrd size increase is reasonable. However, the shim doubles in size. If we were to keep the exporter only in the runtime, would we need to use a similar tracing API in the shim to pull them up to the runtime? This is the analysis I am looking at
|
70e56d8 to
f754e9f
Compare
|
Yeah I agree the binary size is bad. OTEL always bothers me: every time I use it all I want is create a tree of named spans over HTTP and yet the required deps are huge. I'm probably only using 0.1% of its capabilities. I did a quick experiment to try to reduce the dependencies. Rather than using OTEL, I've added a minimal span sending thing which only requires json (I think I need this to remain wire-compatible with Jaeger etc). I noticed the net/http pulled in a bunch of crypto so I dropped that too if we assume everything is localhost HTTP. Comparing binary sizes to main (Mac/arm64)
Let me know what you think. I pushed these patches on top of the previous ones, so the branch is a bit of a mess and would need more tidying 😅 |
Add OpenTelemetry spans to the host-side VM startup path (libkrun.VMStart, libkrun.WaitForTTRPC) with developer tooling: Jaeger Makefile targets, a Mermaid Gantt chart generator, tracing integration test, and docs. Signed-off-by: David Scott <dave@recoil.org>
Add a ttrpc Traces streaming service inside the VM that exports OTel spans over vsock to the host shim. The shim relays them to the OTLP collector with clock-skew correction. Includes VM-side spans for container create, start, mount setup, and networking, plus the proto definitions and generated code. Signed-off-by: David Scott <dave@recoil.org>
Register a host-side OTel tracing plugin in the shim that configures a TracerProvider with an OTLP HTTP exporter, enabling end-to-end trace context propagation across the shim's ttrpc calls. Signed-off-by: David Scott <dave@recoil.org>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: David Scott <dave@recoil.org>
Signed-off-by: David Scott <dave@recoil.org>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: David Scott <dave@recoil.org>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: David Scott <dave@recoil.org>
Replace the OTel SDK-based Exporter with the minimal internal tracing.Collector, removing the last OTel SDK dependency from the vminit tracing service. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: David Scott <dave@recoil.org>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: David Scott <dave@recoil.org>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: David Scott <dave@recoil.org>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: David Scott <dave@recoil.org>
Replace all go.opentelemetry.io/* imports with the internal tracing package. The test now uses tracing.Init, tracing.Start, and tracing.ForwardTraces directly instead of manually wiring up OTel exporters, providers, and propagators. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: David Scott <dave@recoil.org>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: David Scott <dave@recoil.org>
Signed-off-by: David Scott <dave@recoil.org>
Signed-off-by: David Scott <dave@recoil.org>
Signed-off-by: David Scott <dave@recoil.org>
Signed-off-by: David Scott <dave@recoil.org>
Signed-off-by: David Scott <dave@recoil.org>
f754e9f to
76b8f4e
Compare
Signed-off-by: David Scott <dave@recoil.org>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 27 out of 28 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func spanToOTLPJSON(s *Span) otlpSpan { | ||
| return otlpSpan{ | ||
| TraceID: s.TraceID.String(), | ||
| SpanID: s.SpanID.String(), | ||
| ParentSpanID: s.ParentSpanID.String(), | ||
| Name: s.Name, | ||
| Kind: 1, // SPAN_KIND_INTERNAL | ||
| StartTimeUnixNano: strconv.FormatInt(s.StartTime.UnixNano(), 10), | ||
| EndTimeUnixNano: strconv.FormatInt(s.EndTime.UnixNano(), 10), | ||
| Status: otlpStatus{Code: 1}, // STATUS_CODE_OK | ||
| } |
There was a problem hiding this comment.
spanToOTLPJSON always sets ParentSpanID to a non-empty string. For root spans this becomes an all-zero parent span ID, which is invalid in OTLP and can confuse trace reconstruction. Consider omitting parentSpanId when ParentSpanID is the zero value (set it to the empty string so the omitempty tag works).
| span := otlpSpan{ | ||
| TraceID: hex.EncodeToString(s.TraceID), | ||
| SpanID: hex.EncodeToString(s.SpanID), | ||
| ParentSpanID: hex.EncodeToString(s.ParentSpanID), | ||
| Name: s.Name, | ||
| Kind: int(s.Kind), | ||
| StartTimeUnixNano: strconv.FormatInt(startNano, 10), | ||
| EndTimeUnixNano: strconv.FormatInt(endNano, 10), | ||
| Status: otlpStatus{ |
There was a problem hiding this comment.
exportVMSpan always encodes and sets ParentSpanID. For root spans this becomes an all-zero parent span ID, which is invalid in OTLP and can break parent/child relationships in Jaeger. Consider leaving ParentSpanID empty when the incoming ParentSpanID is empty/all-zeros so the omitempty tag on otlpSpan.ParentSpanID can drop it.
| addr += ":80" | ||
| } | ||
| } | ||
| return &OTLPEndpoint{addr: addr, path: "/v1/traces"} |
There was a problem hiding this comment.
ParseOTLPEndpoint ignores any path component in OTEL_EXPORTER_OTLP_ENDPOINT and always forces path: "/v1/traces". This breaks valid endpoints like http://collector:4318/v1/traces (you'll POST to /v1/traces on the host but can't honor custom paths/reverse proxies). Consider using u.Path when non-empty (and defaulting to /v1/traces only when the path is empty).
| return &OTLPEndpoint{addr: addr, path: "/v1/traces"} | |
| path := "/v1/traces" | |
| if u.Path != "" { | |
| path = u.Path | |
| } | |
| return &OTLPEndpoint{addr: addr, path: path} |
| u, err := url.Parse(endpoint) | ||
| if err != nil || u.Host == "" { | ||
| // Treat as host:port directly. | ||
| return &OTLPEndpoint{addr: endpoint, path: "/v1/traces"} | ||
| } | ||
| addr := u.Host | ||
| if u.Port() == "" { | ||
| if u.Scheme == "https" { | ||
| addr += ":443" | ||
| } else { | ||
| addr += ":80" | ||
| } | ||
| } | ||
| return &OTLPEndpoint{addr: addr, path: "/v1/traces"} |
There was a problem hiding this comment.
ParseOTLPEndpoint treats https://... as port 443, but postOTLP sends a plaintext HTTP/1.1 request over raw TCP (no TLS). This will fail against real HTTPS collectors and can be a security footgun if someone expects encryption. Consider either (a) explicitly rejecting https endpoints with a clear error/log and documenting that only plain HTTP is supported, or (b) implementing TLS for HTTPS endpoints.
| // Shutdown flushes remaining spans and stops the background goroutine. | ||
| func (f *Flusher) Shutdown(ctx context.Context) error { | ||
| close(f.done) | ||
| return f.flush(ctx) | ||
| } |
There was a problem hiding this comment.
Flusher.Shutdown closes f.done unconditionally. If the shutdown callback is invoked more than once (or Shutdown is called defensively), this will panic with "close of closed channel". Consider making shutdown idempotent (e.g., sync.Once around the close, or a non-blocking select on f.done).
| // Read just the status line to check for errors. | ||
| var resp [128]byte | ||
| n, _ := conn.Read(resp[:]) | ||
| if n < 12 { | ||
| return fmt.Errorf("OTLP response too short") | ||
| } | ||
| // "HTTP/1.1 200" — status code starts at byte 9 | ||
| if resp[9] >= '4' { | ||
| return fmt.Errorf("OTLP export failed: %s", string(resp[:n])) | ||
| } | ||
| return nil |
There was a problem hiding this comment.
postOTLP checks resp[9] >= '4' to detect errors, which treats any 3xx response as success and also assumes the status code is always at byte 9. Consider parsing the HTTP status line more robustly and treating non-2xx responses as errors (at minimum, require resp[9] == '2').
|
|
||
| Nerdbox supports distributed tracing across the full container startup path, | ||
| including spans that originate inside the VM. Traces are exported via | ||
| OTLP/HTTP (JSON) to any compatible collector (e.g., Jaeger). |
There was a problem hiding this comment.
The docs say traces are exported via OTLP/HTTP to any compatible collector, but the current implementation sends a plaintext HTTP/1.1 request over raw TCP and does not support HTTPS/TLS endpoints. Consider documenting this limitation explicitly (or adding TLS support) so users don't assume https://... endpoints will work.
| OTLP/HTTP (JSON) to any compatible collector (e.g., Jaeger). | |
| OTLP/HTTP (JSON) to compatible collectors over plaintext `http://` | |
| endpoints (for example, a local Jaeger setup). HTTPS/TLS (`https://`) | |
| endpoints are not currently supported. |
Help us understand (and later optimise) VM/container start time.
If you run this:
It produces this:
The biggest thing on the critical path is the VM start + initial connection, which needs breaking down more (future investigation)
We can also share Mermaid diagrams which can be rendered on GitHub by running:
which produces this:
gantt dateFormat x axisFormat %s.%L s title Trace for nerdbox-vm section VM.NewInstance+Start nerdbox/VM.NewInstance+Start :VM_NewInstance+Start_9f6fda, 0, 83 . nerdbox/libkrun.VMStart :libkrun_VMStart_90885a, 5, 82 . nerdbox/libkrun.WaitForTTRPC :libkrun_WaitForTTRPC_10af7c, 5, 82 . nerdbox/TTRPC.Task.Create :TTRPC_Task_Create_5e832c, 83, 85 . . nerdbox/containerd.task.v3.Task/Create :containerd_task_v3_Task_Create_3a4175, 84, 85 . . . containerd.task.v3.Task/Create :containerd_task_v3_Task_Create_61f5b3, 83, 85 . . . . task.Create :task_Create_a0b789, 84, 85 . . . . . runc.NewContainer :runc_NewContainer_f4a190, 84, 85 . . . . . . crun.create :crun_create_beaf4f, 84, 85 . nerdbox/TTRPC.Task.Start :TTRPC_Task_Start_c8de57, 85, 87 . . nerdbox/containerd.task.v3.Task/Start :containerd_task_v3_Task_Start_f0139b, 85, 87 . . . containerd.task.v3.Task/Start :containerd_task_v3_Task_Start_1d5297, 85, 87 . . . . task.Start :task_Start_84961b, 85, 87 . . . . . . container.Start :container_Start_5372d6, 85, 87 . . . . . . . crun.start :crun_start_ce160c, 85, 87The first patch adds the infra and tracing for the host part. This gets most of the benefit IMHO.
The second patch extends the tracing to the VM: this is a bit complicated because I didn't want to assume there was a network, so the spans are relayed back to the host and then forwarded to the collector.
The clock is interesting because the RTC only has 1s granularity, so to correct we send the host timestamp to the VM before the tracing.