feat: Propagate process context to profiles#1343
Conversation
67515d9 to
a6990cf
Compare
a6990cf to
1e92222
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1e92222697
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } | ||
| } | ||
|
|
||
| trace.Resource = procMeta.ProcessContextInfo.Resource |
There was a problem hiding this comment.
Propagate process context extra attributes
When a publisher sets ProcessContext.extra_attributes, readPayload now stores them in ProcessContextInfo.ExtraAttributes, but this assignment is the only process-context data copied onto the trace. Since neither EbpfTrace nor TraceEventMeta carries ExtraAttributes, those attributes are dropped before pdata.Generate, so processes that rely on the optional process-context profile attributes never see them in exported profiles.
Useful? React with 👍 / 👎.
1e92222 to
c75eb3a
Compare
c75eb3a to
e10b4b3
Compare
91218c3 to
0b708cf
Compare
christos68k
left a comment
There was a problem hiding this comment.
Did one pass, will go through it again
| // old process context is discarded and a rebuild is forced so new env vars | ||
| // take effect even when context mapping is present. | ||
| func readProcessContext( | ||
| mappingAddr uint64, pid libpf.PID, rm remotememory.RemoteMemory, |
There was a problem hiding this comment.
Let's move this function to processcontext.go, since it's more coupled with that logic
|
|
||
| TEST_INTEGRATION_BINARY_DIRS := tracer processmanager/ebpf support interpreter/golabels/integrationtests | ||
|
|
||
| processctx-execs: |
There was a problem hiding this comment.
Shouldn't we make this a requirement of integration-test-binaries ?
| pprof_stable_cgo_pie: | ||
| CGO_ENABLED=1 go test -C ./interpreter/golabels/integrationtests/pprof -c -ldflags '-extldflags "-static"' -trimpath -buildmode=pie -tags $(GO_TAGS),withcgo,integration -o ./../$@ | ||
|
|
||
| integration-test-binaries: generate ebpf pprof-execs |
There was a problem hiding this comment.
| integration-test-binaries: generate ebpf pprof-execs processctx-execs |
| return nil | ||
| } | ||
|
|
||
| meta := pr.GetProcessMeta(process.MetaConfig{IncludeEnvVars: pm.includeEnvVars}) |
There was a problem hiding this comment.
The eBPF code will first check for the presence of an entry in pid_page_to_mapping_info before it sends a trace to userspace.
Removing this logic from this method and postponing process meta collection until after mappings have been processed, increases the race window from UpdatePidPageMappingInfo to process meta collection. Any traces that the kernel sends in that race window, will have no process metadata whatsoever.
One option is to keep this logic here so that basic process metadata is populated immediately and process context added after mappings have been processed.
Another option is to significantly refactor the code so that pm.ebpf.UpdatePidPageMappingInfo isn't called until after process metadata including process context have been collected.
| // resourceAttrKey is the environment variable name OpenTelemetry Resource information will be read from. | ||
| resourceAttrKey = "OTEL_RESOURCE_ATTRIBUTES" | ||
|
|
||
| // svcNameKey is the environment variable name that Service Name information will be read from. | ||
| svcNameKey = "OTEL_SERVICE_NAME" |
There was a problem hiding this comment.
Doesn't seem like we're collecting these anywhere?
| if errors.Is(err, processcontext.ErrNoUpdate) { | ||
| return oldProcessContextInfo | ||
| // readProcessContext reads the process context from a context mapping | ||
| // (if any) and merges env-var-derived attributes in. Returns (info, true) to |
There was a problem hiding this comment.
We should clarify what env-var-derived attributes are here. We can just mention the two specific env variables that we're merging (OTEL_RESOURCE_ATTRIBUTES and OTEL_SERVICE_NAME), or if we move this function into processcontext.go, can refer to svcNameKey and resourceAttrKey.
| errors.Is(err, processcontext.ErrConcurrentUpdate): | ||
| // Note that if processMetaUpdated is true, the caller will discard the previous process context and therefore | ||
| // returning true or false makes no difference. Returning true in this case makes the intent clearer though. | ||
| return processcontext.Info{}, processMetaUpdated |
There was a problem hiding this comment.
Shouldn't we merge the OTel env variables here?
| return processcontext.Info{}, processMetaUpdated | |
| return processcontext.WithMergedEnvVars(processcontext.Info{}, envVars) |
| default: | ||
| log.Warnf("Failed to read ProcessContext for PID %d: %v", pid, err) | ||
| // Fail to read process context, publish a new empty process context. | ||
| return processcontext.Info{}, true |
There was a problem hiding this comment.
Same here:
| return processcontext.Info{}, true | |
| return processcontext.WithMergedEnvVars(processcontext.Info{}, envVars) |
Summary
Propagate OTel ProcessContext resource attributes (per OTEP #4719) to the OTLP profile output so profiles can be correlated with the rest of the telemetry from the same service instance.
Changes
processcontext:
Info.Resourceis now a*pcommon.Resource(was*resourcepb.Resource), so resource attributes flow directly into the OTLP reporter without re-conversion.convertAnyValuehandles allAnyValuevariants includingnested arrays/maps.
WithMergedEnvVars(info, envVars)enriches anInfowithOTEL_SERVICE_NAMEandOTEL_RESOURCE_ATTRIBUTES. Per the OTel resource SDK spec: keys and values are percent-encoded; on any decoding error the entireOTEL_RESOURCE_ATTRIBUTESvalue is discarded; duplicate keys within it resolve last-writer-wins;OTEL_SERVICE_NAMEoverridesservice.nameif both are set; values from the proto resource win over env-var-derived ones on collision.ResourceToContextKey(resource)returns a stable key from the(service.namespace, service.name, service.instance.id)triplet — described as globally unique by the semantic conventions. Used inResourceKeyfor trace deduplication grouping.Infois documented as immutable after publication; the embedded*pcommon.Resourceis shared by pointer across the process-manager writer, the tracer event loop, and the reporter, and must be treated as read-only by all holders.processmanager:
SynchronizeProcessdefers theprocesscontext.Readuntil afterGetProcessMeta, so env vars are available to merge in the same sync.readProcessContextreturns(Info, publish bool):falsepreserves the previously-published context on transientErrNoUpdate/ErrConcurrentUpdate,truepublishes a new context.Behaviour notes
OTEL_CTXmapping are unchanged:Info.Resourcestays nil,ContextKey == NullString, and bucketing falls back to the existing PID/container/executable keys.OTEL_SERVICE_NAME/OTEL_RESOURCE_ATTRIBUTESbut doesn't publish aProcessContextmapping still gets a Resource derived from those env vars.setResourceAttributescopies the process context attributes onto the emitted ResourceProfiles. eBPF-derived process.pid, container.id, and process.executable.* are added on top. APMServiceName, when set, takes precedence over the process context supplied service.name. This override is transitional and APMServiceName should be removed once ProcessContext is the canonical source of service identity.