Skip to content

Commit 40f29d1

Browse files
Yun-Kimclaude
authored andcommitted
feat(llmobs): sampling decisions, rates, and propagation (#9030)
* feat(llmobs): add sampling decisions, rates, and propagation Compute a keep/drop decision + rate once on the root LLMObs span and inherit it across spans and services (via x-datadog-tags). Spans are always shipped; the decision is recorded in the event _dd block so the backend can honor it. Mirrors dd-trace-py. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(llmobs): address sampling PR review comments - Match dd-trace-py: only the true trace root makes a fresh sampling decision; a span under a propagated LLMObs parent inherits whatever was propagated (possibly none) instead of starting a divergent one. - Reuse the shared formatKnuthRate (hoisted to sampler.js) instead of a bespoke llmobs formatRate. - Rename handleLLMObsParentIdInjection -> handleLLMObsInjection. - Declare llmobs.sampleRate in the public index.d.ts (fixes config-names lint) and tag sample_rate on enablement telemetry. - Drop redundant comments flagged in review. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore(llmobs): regenerate config types after merging master Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(llmobs): address follow-up sampling review comments - Rebuild the sampler lazily from the live config rate so a disable/ re-enable (or any rate change) is reflected, instead of caching one sampler at construction. - Tweak the injection comment wording. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(llmobs): export formatKnuthRate from priority_sampler.js Keep the formatter in priority_sampler.js (export it) instead of hoisting to sampler.js, per review. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(llmobs): build the sampler once in the tagger constructor The sample rate is set at init and not mutable at runtime, so the per-span lazy rebuild was unnecessary work on a hot path. Build it once at construction. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Update packages/dd-trace/src/llmobs/tagger.js * refactor(llmobs): address sampling review comments and make sampler config-reactive - Mirror `llmobs.sampleRate` in the v5 type declarations (index.d.v5.ts), per the repo type policy requiring non-v6-only public APIs in both declaration surfaces (PR review). - Register DD_LLMOBS_SAMPLE_RATE as implementation "A" in supported-configurations.json (PR review); generated types are unchanged. - Build the LLMObs sampler lazily and rebuild it when config.llmobs.sampleRate changes, instead of fixing it for the tagger's lifetime, so a future remote config update to the sample rate takes effect without re-instantiation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(llmobs): match DD_LLMOBS_SAMPLE_RATE default to config registry (1.0) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(llmobs): address remaining sampling review comments - Drop the redundant #samplerRate field; use sampler.rate() for the rebuild check. - Move formatKnuthRate to the shared src/util.js so the LLMObs tagger no longer imports priority_sampler just for a string formatter. - Reword the sampleRate JSDoc in index.d.ts and index.d.v5.ts ("honored at ingestion time"). - Test rates that exercise the formatter: 1/3 (6-decimal cap) and 0.25 (trailing-zero stripping). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent 1103ebb commit 40f29d1

15 files changed

Lines changed: 340 additions & 21 deletions

index.d.ts

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4033,6 +4033,15 @@ declare namespace tracer {
40334033
* Programmatic configuration takes precedence over the environment variables listed above.
40344034
*/
40354035
agentlessEnabled?: boolean,
4036+
4037+
/**
4038+
* The proportion of LLM Observability traces to sample, between `0` and `1` (inclusive).
4039+
* The decision is computed once per trace, propagated across services, and recorded on every
4040+
* span; spans are always sent and the decision is honored at ingestion time. Defaults to `1`.
4041+
* @env DD_LLMOBS_SAMPLE_RATE
4042+
* Programmatic configuration takes precedence over the environment variables listed above.
4043+
*/
4044+
sampleRate?: number,
40364045
}
40374046

40384047
/** @hidden */

index.d.v5.ts

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4234,6 +4234,15 @@ declare namespace tracer {
42344234
* Programmatic configuration takes precedence over the environment variables listed above.
42354235
*/
42364236
agentlessEnabled?: boolean,
4237+
4238+
/**
4239+
* The proportion of LLM Observability traces to sample, between `0` and `1` (inclusive).
4240+
* The decision is computed once per trace, propagated across services, and recorded on every
4241+
* span; spans are always sent and the decision is honored at ingestion time. Defaults to `1`.
4242+
* @env DD_LLMOBS_SAMPLE_RATE
4243+
* Programmatic configuration takes precedence over the environment variables listed above.
4244+
*/
4245+
sampleRate?: number,
42374246
}
42384247
/** @hidden */
42394248
type spanKind = 'agent' | 'workflow' | 'task' | 'tool' | 'retrieval' | 'embedding' | 'llm'

packages/dd-trace/src/config/generated-config-types.d.ts

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -477,6 +477,7 @@ export interface GeneratedConfig {
477477
agentlessEnabled: boolean | undefined;
478478
enabled: boolean;
479479
mlApp: string | undefined;
480+
sampleRate: number;
480481
};
481482
logInjection: boolean;
482483
logLevel: "debug" | "info" | "warn" | "error";
@@ -724,6 +725,7 @@ export interface GeneratedEnvVarConfig {
724725
DD_LLMOBS_AGENTLESS_ENABLED: boolean | undefined;
725726
DD_LLMOBS_ENABLED: boolean;
726727
DD_LLMOBS_ML_APP: string | undefined;
728+
DD_LLMOBS_SAMPLE_RATE: number;
727729
DD_LOG_LEVEL: "debug" | "info" | "warn" | "error";
728730
DD_LOGS_INJECTION: boolean;
729731
DD_LOGS_OTEL_ENABLED: boolean;

packages/dd-trace/src/config/supported-configurations.json

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1238,6 +1238,16 @@
12381238
"default": null
12391239
}
12401240
],
1241+
"DD_LLMOBS_SAMPLE_RATE": [
1242+
{
1243+
"implementation": "A",
1244+
"type": "decimal",
1245+
"configurationNames": [
1246+
"llmobs.sampleRate"
1247+
],
1248+
"default": "1.0"
1249+
}
1250+
],
12411251
"DD_LOGS_INJECTION": [
12421252
{
12431253
"implementation": "B",

packages/dd-trace/src/llmobs/constants/tags.js

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,12 @@ module.exports = {
1414
PROPAGATED_PARENT_ID_KEY: '_dd.p.llmobs_parent_id',
1515
PROPAGATED_ML_APP_KEY: '_dd.p.llmobs_ml_app',
1616
PARENT_ID_KEY: '_ml_obs.llmobs_parent_id',
17+
PROPAGATED_SAMPLE_RATE_KEY: '_dd.p.llmobs_sr',
18+
PROPAGATED_SAMPLING_DECISION_KEY: '_dd.p.llmobs_sd',
19+
SAMPLE_RATE: '_ml_obs.sample_rate',
20+
SAMPLING_DECISION: '_ml_obs.sampling_decision',
21+
SAMPLING_DECISION_SAMPLED: '1',
22+
SAMPLING_DECISION_DROPPED: '0',
1723
TAGS: '_ml_obs.tags',
1824
NAME: '_ml_obs.name',
1925
TRACE_ID: '_ml_obs.trace_id',

packages/dd-trace/src/llmobs/index.js

Lines changed: 16 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,10 @@ const {
99
ML_APP,
1010
PROPAGATED_ML_APP_KEY,
1111
PROPAGATED_PARENT_ID_KEY,
12+
SAMPLE_RATE,
13+
SAMPLING_DECISION,
14+
PROPAGATED_SAMPLE_RATE_KEY,
15+
PROPAGATED_SAMPLING_DECISION_KEY,
1216
} = require('./constants/tags')
1317
const { storage } = require('./storage')
1418
const telemetry = require('./telemetry')
@@ -68,7 +72,7 @@ function enable (config) {
6872
spanFinishCh.subscribe(handleSpanProcess)
6973

7074
// distributed tracing for llmobs
71-
injectCh.subscribe(handleLLMObsParentIdInjection)
75+
injectCh.subscribe(handleLLMObsInjection)
7276

7377
setAgentStrategy(config, useAgentless => {
7478
if (useAgentless && !(config.apiKey && config.site)) {
@@ -92,7 +96,7 @@ function disable () {
9296
if (evalMetricAppendCh.hasSubscribers) evalMetricAppendCh.unsubscribe(handleEvalMetricAppend)
9397
if (flushCh.hasSubscribers) flushCh.unsubscribe(handleFlush)
9498
if (spanFinishCh.hasSubscribers) spanFinishCh.unsubscribe(handleSpanProcess)
95-
if (injectCh.hasSubscribers) injectCh.unsubscribe(handleLLMObsParentIdInjection)
99+
if (injectCh.hasSubscribers) injectCh.unsubscribe(handleLLMObsInjection)
96100
if (registerUserSpanProcessorCh.hasSubscribers) registerUserSpanProcessorCh.unsubscribe(handleRegisterProcessor)
97101

98102
spanWriter?.destroy()
@@ -106,8 +110,8 @@ function disable () {
106110
}
107111

108112
// since LLMObs traces can extend between services and be the same trace,
109-
// we need to propagate the parent id and mlApp.
110-
function handleLLMObsParentIdInjection ({ carrier }) {
113+
// we need to propagate the parent id, mlApp, and sampling rate/decision.
114+
function handleLLMObsInjection ({ carrier }) {
111115
// Respect the standard propagator's gate: when trace tag propagation is
112116
// disabled, don't write `x-datadog-tags` for LLMObs either.
113117
if (globalTracerConfig.DD_TRACE_X_DATADOG_TAGS_MAX_LENGTH === 0) return
@@ -122,14 +126,21 @@ function handleLLMObsParentIdInjection ({ carrier }) {
122126
parentContext?._trace?.tags?.[PROPAGATED_ML_APP_KEY] ||
123127
globalTracerConfig.llmobs.mlApp
124128

125-
if (!parentId && !mlApp) return
129+
const sampleRate =
130+
mlObsSpanTags?.[SAMPLE_RATE] ?? parentContext?._trace?.tags?.[PROPAGATED_SAMPLE_RATE_KEY]
131+
const samplingDecision =
132+
mlObsSpanTags?.[SAMPLING_DECISION] ?? parentContext?._trace?.tags?.[PROPAGATED_SAMPLING_DECISION_KEY]
133+
134+
if (!parentId && !mlApp && samplingDecision == null) return
126135

127136
// `_injectTags` only writes `x-datadog-tags` when the trace has `_dd.p.*`
128137
// tags, so it may be undefined here — coalesce before appending.
129138
const existing = carrier['x-datadog-tags']
130139
let tags = existing || ''
131140
if (parentId) tags += `${tags ? ',' : ''}${PROPAGATED_PARENT_ID_KEY}=${parentId}`
132141
if (mlApp) tags += `${tags ? ',' : ''}${PROPAGATED_ML_APP_KEY}=${mlApp}`
142+
if (sampleRate != null) tags += `${tags ? ',' : ''}${PROPAGATED_SAMPLE_RATE_KEY}=${sampleRate}`
143+
if (samplingDecision != null) tags += `${tags ? ',' : ''}${PROPAGATED_SAMPLING_DECISION_KEY}=${samplingDecision}`
133144
if (tags !== existing) carrier['x-datadog-tags'] = tags
134145
}
135146

packages/dd-trace/src/llmobs/span_processor.js

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,8 @@ const {
3333
ROUTING_API_KEY,
3434
ROUTING_SITE,
3535
LLMOBS_SUBMITTED_TAG_KEY,
36+
SAMPLE_RATE,
37+
SAMPLING_DECISION,
3638
} = require('./constants/tags')
3739
const { UNSERIALIZABLE_VALUE_TEXT } = require('./constants/text')
3840
const telemetry = require('./telemetry')
@@ -248,6 +250,8 @@ class LLMObsSpanProcessor {
248250
_dd: {
249251
span_id: span.context().toSpanId(),
250252
trace_id: span.context().toTraceId(true),
253+
sample_rate: mlObsTags[SAMPLE_RATE],
254+
sampling_decision: mlObsTags[SAMPLING_DECISION],
251255
},
252256
}
253257

packages/dd-trace/src/llmobs/tagger.js

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
'use strict'
22

33
const log = require('../log')
4+
const Sampler = require('../sampler')
5+
const { formatKnuthRate } = require('../util')
46
const {
57
MODEL_NAME,
68
MODEL_PROVIDER,
@@ -41,6 +43,12 @@ const {
4143
ROUTING_SITE,
4244
PROMPT_TRACKING_INSTRUMENTATION_METHOD,
4345
INSTRUMENTATION_METHOD_ANNOTATED,
46+
SAMPLE_RATE,
47+
SAMPLING_DECISION,
48+
SAMPLING_DECISION_SAMPLED,
49+
SAMPLING_DECISION_DROPPED,
50+
PROPAGATED_SAMPLE_RATE_KEY,
51+
PROPAGATED_SAMPLING_DECISION_KEY,
4452
} = require('./constants/tags')
4553
const { storage } = require('./storage')
4654
const { findGenAIAncestorSpanId, validateCostTags, writeBridgeTags, validateToolDefinitions } = require('./util')
@@ -53,12 +61,31 @@ class LLMObsTagger {
5361
/** @type {import('../config/config-base')} */
5462
#config
5563

64+
/** @type {import('../sampler') | null} */
65+
#sampler = null
66+
5667
constructor (config, softFail = false) {
5768
this.#config = config
5869

5970
this.softFail = softFail
6071
}
6172

73+
/**
74+
* The sampler reads its rate from `config.llmobs.sampleRate`, which can change
75+
* at runtime (e.g. via remote config). Rebuild the sampler whenever the rate
76+
* changes so decisions reflect the current config, while reusing the existing
77+
* sampler when it hasn't.
78+
*
79+
* @returns {import('../sampler')}
80+
*/
81+
#getSampler () {
82+
const rate = this.#config.llmobs?.sampleRate ?? 1
83+
if (this.#sampler === null || rate !== this.#sampler.rate()) {
84+
this.#sampler = new Sampler(rate)
85+
}
86+
return this.#sampler
87+
}
88+
6289
static get tagMap () {
6390
return registry
6491
}
@@ -124,6 +151,8 @@ class LLMObsTagger {
124151
ROOT_PARENT_ID
125152
this._setTag(span, PARENT_ID_KEY, parentId)
126153

154+
this.#tagSamplingDecision(span, parent)
155+
127156
// apply annotation context
128157
const annotationContext = storage.getStore()?.annotationContext
129158

@@ -153,6 +182,32 @@ class LLMObsTagger {
153182
}
154183
}
155184

185+
#tagSamplingDecision (span, parent) {
186+
const traceTags = span.context()._trace.tags
187+
const parentTags = registry.get(parent)
188+
189+
let sampleRate, samplingDecision
190+
if (parentTags) {
191+
// Local LLMObs parent: inherit its decision.
192+
sampleRate = parentTags[SAMPLE_RATE]
193+
samplingDecision = parentTags[SAMPLING_DECISION]
194+
} else if (traceTags[PROPAGATED_PARENT_ID_KEY]) {
195+
// Distributed LLMObs parent: inherit whatever was propagated. This may be
196+
// absent if the upstream service predates sampling propagation, in which
197+
// case we make no decision here rather than starting a divergent one.
198+
sampleRate = traceTags[PROPAGATED_SAMPLE_RATE_KEY]
199+
samplingDecision = traceTags[PROPAGATED_SAMPLING_DECISION_KEY]
200+
} else {
201+
// Root span: make the trace's one sampling decision.
202+
const sampler = this.#getSampler()
203+
sampleRate = formatKnuthRate(sampler.rate())
204+
samplingDecision = sampler.isSampled(span) ? SAMPLING_DECISION_SAMPLED : SAMPLING_DECISION_DROPPED
205+
}
206+
207+
if (sampleRate != null) this._setTag(span, SAMPLE_RATE, sampleRate)
208+
if (samplingDecision != null) this._setTag(span, SAMPLING_DECISION, samplingDecision)
209+
}
210+
156211
// TODO: similarly for the following `tag` methods,
157212
// how can we transition from a span weakmap to core API functionality
158213
tagLLMIO (span, inputData, outputData) {

packages/dd-trace/src/llmobs/telemetry.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,7 @@ function recordLLMObsEnabled (startTime, config, value = 1) {
8787
site: config.site,
8888
auto: Number(autoEnabled),
8989
ml_app: config.llmobs.mlApp,
90+
sample_rate: config.llmobs.sampleRate,
9091
}
9192
llmobsMetrics.count('product_enabled', tags).inc(value)
9293
llmobsMetrics.distribution('init_time', tags).track(initTimeMs)

packages/dd-trace/src/priority_sampler.js

Lines changed: 1 addition & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ const RateLimiter = require('./rate_limiter')
1919
const Sampler = require('./sampler')
2020
const { setSamplingRules } = require('./startup-log')
2121
const SamplingRule = require('./sampling_rule')
22+
const { formatKnuthRate } = require('./util')
2223

2324
const {
2425
SAMPLING_MECHANISM_DEFAULT,
@@ -36,19 +37,6 @@ const {
3637

3738
const DEFAULT_KEY = 'service:,env:'
3839

39-
/**
40-
* Formats a sampling rate as a string with up to 6 decimal digits and no trailing zeros.
41-
*
42-
* @param {number} rate
43-
*/
44-
function formatKnuthRate (rate) {
45-
const string = Number(rate).toFixed(6)
46-
for (let i = string.length - 1; i > 0; i--) {
47-
if (string[i] === '0') continue
48-
return string.slice(0, i + (string[i] === '.' ? 0 : 1))
49-
}
50-
}
51-
5240
const defaultSampler = new Sampler(AUTO_KEEP)
5341

5442
/**

0 commit comments

Comments
 (0)