feat(otelplugin): add OpenTelemetry usage plugin with W3C Baggage propagation#3761
feat(otelplugin): add OpenTelemetry usage plugin with W3C Baggage propagation#3761quanyeomans wants to merge 1 commit into
Conversation
…pagation
Adds internal/otelplugin/ as an opt-in `usage.Plugin` that:
- Parses W3C Baggage on inbound requests (Gin middleware)
- Allowlist-filters the baggage before downstream propagation
- Emits OTLP usage events with GenAI semconv + cost attributes
- Propagates baggage on outbound LLM provider requests
- Mirrors the redisqueue toggle pattern for runtime enable/disable
Off by default — set telemetry.otlp.enabled: true to turn it on. No
existing behaviour changes.
The plugin mirrors the redisqueue pattern: init()-time registration with
coreusage.RegisterPlugin, atomic.Bool feature flag (Enabled /
SetEnabled), config snapshot under atomic.Value for lock-free hot-path
reads.
Why
---
- Native OpenTelemetry observability: any OTel-compatible backend
(Honeycomb, Tempo, Jaeger, Datadog, Grafana Cloud, Azure Monitor,
AWS X-Ray, …) becomes a one-config-change away from CLIProxyAPI
telemetry, instead of each operator writing a backend-specific shim
against the management API.
- Standards-aligned cross-service identity propagation per the W3C
Baggage spec. Trusted-boundary safe default (propagation: off);
operators opt in to propagate or allowlist.
- Early adoption of the OTel GenAI semantic conventions — LLM-routing
proxies are a category the OTel community has identified as needing
reference implementations.
- Per-request cost attribution: optional pricing-table block produces
cost.usd per span.
Files
-----
internal/otelplugin/
config.go — Config schema + SetConfig + Enabled/SetEnabled
baggage.go — W3C Baggage parser, formatter, allowlist filter,
context plumbing (WithBaggage, BaggageFromContext)
middleware.go — Gin middleware + outbound propagation helpers
plugin.go — usage.Plugin implementation + tracer wiring
status.go — OTel codes aliases
*_test.go — 35 unit tests covering parser, middleware,
attribute mapping, cost calc, disabled-plugin no-op
Spec + design context:
https://github.com/three-cubes/tc-agent-zone/blob/main/docs/upstream-asks/cli-proxy-api-otel-baggage.md
|
This pull request targeted The base branch has been automatically changed to |
There was a problem hiding this comment.
Code Review
This pull request introduces the otelplugin package, which exports OpenTelemetry spans for upstream LLM calls and supports W3C Baggage propagation and cost tracking. Feedback from the review highlights several critical improvements: correcting baggage encoding/decoding by using path-based escaping instead of query-based escaping to properly handle + and spaces; optimizing the hot path in tracerFor using atomic.Value to avoid global mutex contention; ensuring deterministic pricing prefix matches by sorting and caching pricing keys; and supporting multiple Baggage headers in the Gin middleware for full spec compliance.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| value, err := url.QueryUnescape(rawValue) | ||
| if err != nil { | ||
| value = rawValue | ||
| } |
There was a problem hiding this comment.
Using url.QueryUnescape will decode + characters as spaces. In W3C Baggage, + is a valid character (often appearing in base64-encoded values) and should not be converted to spaces. Use url.PathUnescape instead to correctly preserve + characters.
| value, err := url.QueryUnescape(rawValue) | |
| if err != nil { | |
| value = rawValue | |
| } | |
| value, err := url.PathUnescape(rawValue) | |
| if err != nil { | |
| value = rawValue | |
| } |
| if strings.TrimSpace(k) == "" || strings.TrimSpace(v) == "" { | ||
| continue | ||
| } | ||
| parts = append(parts, k+"="+url.QueryEscape(v)) |
There was a problem hiding this comment.
| var ( | ||
| tracerMu sync.Mutex | ||
| tracerActive trace.Tracer | ||
| tracerEndpt string | ||
| tracerProto string | ||
| ) | ||
|
|
||
| // tracerFor lazily constructs the global tracer + exporter on first use. The | ||
| // guard mutex serialises construction; subsequent reads are lock-free because | ||
| // trace.Tracer is itself a goroutine-safe handle. | ||
| func tracerFor(cfg *Config) trace.Tracer { | ||
| tracerMu.Lock() | ||
| defer tracerMu.Unlock() | ||
| if tracerActive != nil && | ||
| tracerEndpt == cfg.OTLP.Endpoint && | ||
| tracerProto == cfg.OTLP.Protocol { | ||
| return tracerActive | ||
| } | ||
| tracer, shutdown, err := buildTracer(cfg) | ||
| if err != nil { | ||
| return nil | ||
| } | ||
| sharedExporter.reset(shutdown) | ||
| tracerActive = tracer | ||
| tracerEndpt = cfg.OTLP.Endpoint | ||
| tracerProto = cfg.OTLP.Protocol | ||
| return tracerActive | ||
| } |
There was a problem hiding this comment.
The current implementation of tracerFor acquires a global mutex (tracerMu) on every single request on the hot path. This introduces a significant performance bottleneck under high concurrent load, defeating the goal of a lock-free hot path. We can optimize this by storing the active tracer state in an atomic.Value and only acquiring the mutex on the slow path when the configuration actually changes.
type tracerState struct {
tracer trace.Tracer
endpoint string
protocol string
}
var (
tracerMu sync.Mutex
tracerValue atomic.Value // stores *tracerState
)
// tracerFor lazily constructs the global tracer + exporter on first use.
// It uses an atomic fast-path to avoid mutex contention on the hot path.
func tracerFor(cfg *Config) trace.Tracer {
if state, ok := tracerValue.Load().(*tracerState); ok && state != nil {
if state.endpoint == cfg.OTLP.Endpoint && state.protocol == cfg.OTLP.Protocol {
return state.tracer
}
}
tracerMu.Lock()
defer tracerMu.Unlock()
// Double-check under lock
if state, ok := tracerValue.Load().(*tracerState); ok && state != nil {
if state.endpoint == cfg.OTLP.Endpoint && state.protocol == cfg.OTLP.Protocol {
return state.tracer
}
}
tracer, shutdown, err := buildTracer(cfg)
if err != nil {
return nil
}
sharedExporter.reset(shutdown)
tracerValue.Store(&tracerState{
tracer: tracer,
endpoint: cfg.OTLP.Endpoint,
protocol: cfg.OTLP.Protocol,
})
return tracer
}| func costAttributes(record coreusage.Record, cfg *Config) []attribute.KeyValue { | ||
| pricing, ok := cfg.Cost.Pricing[record.Model] | ||
| if !ok { | ||
| // Try a prefix match — operators frequently use one entry per model | ||
| // family (e.g. "claude-sonnet") rather than every variant suffix. | ||
| for model, p := range cfg.Cost.Pricing { | ||
| if strings.HasPrefix(record.Model, model) { | ||
| pricing = p | ||
| ok = true | ||
| break | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Go map iteration order is randomized. When performing a prefix match over cfg.Cost.Pricing, if multiple configured keys match the model prefix (e.g., gpt-4 and gpt-4o both prefix-match gpt-4o-mini), the matched pricing will be non-deterministic and vary across requests. To ensure deterministic and correct behavior (longest prefix match first), we should sort the pricing keys by length in descending order. To avoid performance overhead on the hot path, we can cache the sorted keys and only rebuild them when the configuration pointer changes.
var (
pricingKeysMu sync.Mutex
lastConfig *Config
sortedPricing []string
)
func getSortedPricingKeys(cfg *Config) []string {
pricingKeysMu.Lock()
defer pricingKeysMu.Unlock()
if cfg == lastConfig {
return sortedPricing
}
lastConfig = cfg
sortedPricing = make([]string, 0, len(cfg.Cost.Pricing))
for k := range cfg.Cost.Pricing {
sortedPricing = append(sortedPricing, k)
}
sort.Slice(sortedPricing, func(i, j int) bool {
return len(sortedPricing[i]) > len(sortedPricing[j])
})
return sortedPricing
}
func costAttributes(record coreusage.Record, cfg *Config) []attribute.KeyValue {
pricing, ok := cfg.Cost.Pricing[record.Model]
if !ok {
// Try a deterministic prefix match sorted by length descending (longest match first)
for _, model := range getSortedPricingKeys(cfg) {
if strings.HasPrefix(record.Model, model) {
pricing = cfg.Cost.Pricing[model]
ok = true
break
}
}
}| import ( | ||
| "context" | ||
| "strings" | ||
| "sync" | ||
| "time" |
| func Middleware() gin.HandlerFunc { | ||
| return func(c *gin.Context) { | ||
| header := c.GetHeader(HeaderBaggage) | ||
| if header == "" { | ||
| c.Next() | ||
| return | ||
| } |
There was a problem hiding this comment.
The W3C Baggage specification allows multiple Baggage headers to be sent in a single request. Using c.GetHeader only retrieves the first header value, ignoring any subsequent ones. To be fully spec-compliant, retrieve all values of the Baggage header and join them with commas.
func Middleware() gin.HandlerFunc {
return func(c *gin.Context) {
headers := c.Request.Header[HeaderBaggage]
if len(headers) == 0 {
c.Next()
return
}
header := strings.Join(headers, ",")
b := ParseBaggageHeader(header)There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 48564723a9
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| func init() { | ||
| coreusage.RegisterPlugin(NewPlugin()) |
There was a problem hiding this comment.
Wire the package into server startup
This registration only runs if internal/otelplugin is imported, but I searched the repo outside internal/otelplugin and found no production import, no call to SetConfig, and no middleware registration. As a result, setting the documented telemetry.otlp.enabled: true block is ignored and the default server never registers this usage plugin or emits OTLP spans.
Useful? React with 👍 / 👎.
| for model, p := range cfg.Cost.Pricing { | ||
| if strings.HasPrefix(record.Model, model) { | ||
| pricing = p | ||
| ok = true | ||
| break |
There was a problem hiding this comment.
Prefer the most-specific pricing prefix
When operators configure both a broad family prefix and a more specific model prefix, e.g. claude and claude-sonnet, this map iteration can match either entry for claude-sonnet-* because Go map order is randomized. That makes cost.usd nondeterministic and can report the wrong price; choose the longest matching prefix instead of breaking on the first match.
Useful? React with 👍 / 👎.
Motivation
This PR adds an opt-in OpenTelemetry usage plugin for CLIProxyAPI so any OTel-compatible backend (Honeycomb, Tempo, Jaeger, Datadog, Grafana Cloud, Azure Monitor, AWS X-Ray, …) becomes a one-config-change away from CLIProxyAPI telemetry — instead of each operator writing a backend-specific shim against the management API.
It also wires standards-aligned cross-service identity propagation per the W3C Baggage spec so caller-supplied identity (correlation ids, tenant ids, agent ids, …) lands on each upstream LLM span without per-request plumbing in the proxy core.
Design context lives in tc-agent-zone:
What the patch adds
New package at
internal/otelplugin/:config.goSetConfig,Enabled/SetEnabled); mirrors theredisqueuetoggle patternbaggage.goWithBaggage,BaggageFromContext)middleware.goMiddleware()) + outbound propagation helpers (PropagateBaggage,ApplyOutbound)plugin.gousage.Pluginimplementation; tracer + exporter wiring; GenAI semconv + cost attribute constructionstatus.gocodesaliases (keeps the import surface inplugin.gosmall)README.md*_test.goPlus
go.mod/go.sumentries for:go.opentelemetry.io/otelv1.44.0 (+sdk/trace,exporters/otlp/otlptrace/otlptracehttp,semconv/v1.26.0)golang.org/x/sysbumped from v0.38.0 (indirect) to v0.45.0 viago mod tidyto satisfy OTel transitive constraints — aligns with upstream'scrypto v0.45.0/net v0.47.0cadence.Design properties
telemetry.otlp.enabled: falseis the safe default; no behaviour change for existing operators.engine.Use(otelplugin.Middleware())+otelplugin.SetConfig(...)); no other proxy internals modified.init()likeinternal/redisqueue/plugin.go. Feature flag viaatomic.Boollikeredisqueue.usageStatisticsEnabled. Config snapshot underatomic.Valuefor lock-free hot-path reads. Maintenance burden tracks an existing pattern rather than introducing a new mechanism.gen_ai.system,gen_ai.request.model,gen_ai.usage.*) where they exist; falls back to localcost.*namespace where they don't.off/propagate/allowlist);offis the safe default (matches today's behaviour — no header forwarding). Trusted-boundary safe by default; operators opt in to propagate or allowlist.cost.usdper span.Test coverage
internal/otelplugin/:baggage_test.go) — canonical multi-key, URL decode/encode, malformed input handling, empty/nil safety, allowlist normalisationmiddleware_test.go) — header-present roundtrip, missing-header noop, malformed-header noop, outbound propagation policies, off mode safetyplugin_test.go) — GenAI semconv attribute mapping, cost-table lookup with prefix fallback, optional-field omission when zero, include-usage/include-cost toggles, disabled-plugin no-opAll 35 pass on upstream HEAD
07c607b7(2026-06-08).Backward compatibility
telemetry.otlp.enableddefaults to false. Without enabling, the plugin is registered but every entry point short-circuits — zero hot-path cost.internal/redisqueue/plugin.go. Operators familiar with the redisqueue enable/disable flow get the same shape here.go.mod/go.sumchange in existing files.Verification
Happy to iterate on naming, config layout, or any other aspect — this is offered as a reference implementation, not a take-it-or-leave-it patch. Feedback welcome.