[Repro] Standalone LLMObs sampling drop on top of #10989#11249
Closed
FouadWahabi wants to merge 10 commits intomasterfrom
Closed
[Repro] Standalone LLMObs sampling drop on top of #10989#11249FouadWahabi wants to merge 10 commits intomasterfrom
FouadWahabi wants to merge 10 commits intomasterfrom
Conversation
Signed-off-by: matsumo-and <yh134.toisanda@gmail.com>
Fixes #10051 When DD_APM_TRACING_ENABLED=false is set, APM tracing should be disabled while allowing other products like LLM Observability to function. Previously, setting DD_APM_TRACING_ENABLED=false would inadvertently disable LLMObs or allow APM traces to leak through when no other products were enabled. Signed-off-by: matsumo-and <yh134.toisanda@gmail.com>
Signed-off-by: matsumo-and <yh134.toisanda@gmail.com>
Signed-off-by: matsumo-and <yh134.toisanda@gmail.com>
When APM tracing is disabled but both LLMObs and ASM are enabled, the previous code returned LlmObsStandaloneSampler which never sent the 1 APM trace/minute required by ASM for billing and service catalog. Introduce LlmObsAndAsmStandaloneSampler that keeps all LLMObs and ASM traces while rate-limiting plain APM traces to 1 per minute. Also clarify the log message for the ASM-only standalone case. Signed-off-by: matsumo-and <yh134.toisanda@gmail.com>
Replace the three separate sampler classes (AsmStandaloneSampler, LlmObsStandaloneSampler, LlmObsAndAsmStandaloneSampler) with a single StandaloneSampler that iterates over a list of active StandaloneProduct entries, making it trivially extensible to future products. Also add ProductTraceSource.isAnyStandaloneProductMarked() to simplify the TraceCollector force-keep bypass check. Signed-off-by: matsumo-and <yh134.toisanda@gmail.com>
- Replace individual SamplerTest cases with @unroll matrix that covers all product-flag combinations and asserts activeProducts contents directly; add package-private getActiveProducts() getter to StandaloneSampler to enable this - Add StandaloneSamplerTest case for spans with both LLMOBS and ASM bits set simultaneously, verifying LLMOBS wins via list ordering - Add TraceCollectorTest exercising the full span-finish → CoreTracer write path to verify that setSamplingPriorityIfNecessary skips the sampler when APM is disabled, a standalone product flag is set, and priority is already non-UNSET Signed-off-by: matsumo-and <yh134.toisanda@gmail.com>
When an LLMObs span is created without an active surrounding APM AgentScope, DDLLMObsSpan calls AgentTracer.get().getTraceSegment() which reads activeSpan() — but LLMObsContext.attach does not activate an AgentScope, so the segment is null and the LLMOBS trace-source bit is never set on the local root. StandaloneSampler then drops the trace because it only checks the root's traceSource bitfield. Add a controller endpoint that creates the LLMObs span on a fresh thread (no inherited APM scope) and a smoke test that asserts the standalone LLMObs trace is kept and carries the LLMOBS bit (0x20) in _dd.p.ts. The test fails on top of the current PR and will pass once the bit is propagated regardless of active scope.
Required by AbstractApmTracingDisabledSmokeTest. Without this override, the smoke test sources fail to compile.
CodeNarc IfStatementBraces rule requires braces even for single statements.
BenchmarksStartupParameters
See matching parameters
SummaryFound 1 performance improvements and 0 performance regressions! Performance is the same for 59 metrics, 11 unstable metrics.
Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.62.0-SNAPSHOT~f4dbbf9945, baseline=1.62.0-SNAPSHOT~e6cac64dfd
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.064 s) : 0, 1064377
Total [baseline] (8.841 s) : 0, 8841160
Agent [candidate] (1.065 s) : 0, 1064691
Total [candidate] (8.842 s) : 0, 8842147
section iast
Agent [baseline] (1.244 s) : 0, 1244463
Total [baseline] (9.497 s) : 0, 9496974
Agent [candidate] (1.239 s) : 0, 1239490
Total [candidate] (9.489 s) : 0, 9489254
gantt
title insecure-bank - break down per module: candidate=1.62.0-SNAPSHOT~f4dbbf9945, baseline=1.62.0-SNAPSHOT~e6cac64dfd
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.241 ms) : 0, 1241
crashtracking [candidate] (1.222 ms) : 0, 1222
BytebuddyAgent [baseline] (635.868 ms) : 0, 635868
BytebuddyAgent [candidate] (635.128 ms) : 0, 635128
AgentMeter [baseline] (29.48 ms) : 0, 29480
AgentMeter [candidate] (29.393 ms) : 0, 29393
GlobalTracer [baseline] (249.399 ms) : 0, 249399
GlobalTracer [candidate] (249.563 ms) : 0, 249563
AppSec [baseline] (32.76 ms) : 0, 32760
AppSec [candidate] (33.007 ms) : 0, 33007
Debugger [baseline] (59.959 ms) : 0, 59959
Debugger [candidate] (60.244 ms) : 0, 60244
Remote Config [baseline] (602.938 µs) : 0, 603
Remote Config [candidate] (618.222 µs) : 0, 618
Telemetry [baseline] (8.389 ms) : 0, 8389
Telemetry [candidate] (9.87 ms) : 0, 9870
Flare Poller [baseline] (10.636 ms) : 0, 10636
Flare Poller [candidate] (9.723 ms) : 0, 9723
section iast
crashtracking [baseline] (1.232 ms) : 0, 1232
crashtracking [candidate] (1.215 ms) : 0, 1215
BytebuddyAgent [baseline] (823.708 ms) : 0, 823708
BytebuddyAgent [candidate] (820.62 ms) : 0, 820620
AgentMeter [baseline] (11.284 ms) : 0, 11284
AgentMeter [candidate] (11.276 ms) : 0, 11276
GlobalTracer [baseline] (237.865 ms) : 0, 237865
GlobalTracer [candidate] (237.198 ms) : 0, 237198
AppSec [baseline] (31.376 ms) : 0, 31376
AppSec [candidate] (30.428 ms) : 0, 30428
Debugger [baseline] (62.467 ms) : 0, 62467
Debugger [candidate] (61.989 ms) : 0, 61989
Remote Config [baseline] (536.068 µs) : 0, 536
Remote Config [candidate] (519.007 µs) : 0, 519
Telemetry [baseline] (7.97 ms) : 0, 7970
Telemetry [candidate] (7.867 ms) : 0, 7867
Flare Poller [baseline] (3.351 ms) : 0, 3351
Flare Poller [candidate] (3.343 ms) : 0, 3343
IAST [baseline] (28.612 ms) : 0, 28612
IAST [candidate] (29.05 ms) : 0, 29050
Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.62.0-SNAPSHOT~f4dbbf9945, baseline=1.62.0-SNAPSHOT~e6cac64dfd
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.063 s) : 0, 1062541
Total [baseline] (11.002 s) : 0, 11001881
Agent [candidate] (1.07 s) : 0, 1070103
Total [candidate] (11.029 s) : 0, 11029254
section appsec
Agent [baseline] (1.266 s) : 0, 1265796
Total [baseline] (11.022 s) : 0, 11021523
Agent [candidate] (1.266 s) : 0, 1265670
Total [candidate] (11.072 s) : 0, 11071694
section iast
Agent [baseline] (1.247 s) : 0, 1247458
Total [baseline] (11.319 s) : 0, 11319043
Agent [candidate] (1.254 s) : 0, 1253852
Total [candidate] (11.313 s) : 0, 11313446
section profiling
Agent [baseline] (1.198 s) : 0, 1198065
Total [baseline] (10.958 s) : 0, 10957704
Agent [candidate] (1.192 s) : 0, 1192250
Total [candidate] (11.033 s) : 0, 11033130
gantt
title petclinic - break down per module: candidate=1.62.0-SNAPSHOT~f4dbbf9945, baseline=1.62.0-SNAPSHOT~e6cac64dfd
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.221 ms) : 0, 1221
crashtracking [candidate] (1.225 ms) : 0, 1225
BytebuddyAgent [baseline] (634.362 ms) : 0, 634362
BytebuddyAgent [candidate] (639.534 ms) : 0, 639534
AgentMeter [baseline] (29.378 ms) : 0, 29378
AgentMeter [candidate] (29.651 ms) : 0, 29651
GlobalTracer [baseline] (248.631 ms) : 0, 248631
GlobalTracer [candidate] (250.516 ms) : 0, 250516
AppSec [baseline] (32.662 ms) : 0, 32662
AppSec [candidate] (33.071 ms) : 0, 33071
Debugger [baseline] (60.595 ms) : 0, 60595
Debugger [candidate] (61.085 ms) : 0, 61085
Remote Config [baseline] (605.598 µs) : 0, 606
Remote Config [candidate] (598.971 µs) : 0, 599
Telemetry [baseline] (9.974 ms) : 0, 9974
Telemetry [candidate] (8.43 ms) : 0, 8430
Flare Poller [baseline] (9.125 ms) : 0, 9125
Flare Poller [candidate] (9.844 ms) : 0, 9844
section appsec
crashtracking [baseline] (1.232 ms) : 0, 1232
crashtracking [candidate] (1.217 ms) : 0, 1217
BytebuddyAgent [baseline] (675.632 ms) : 0, 675632
BytebuddyAgent [candidate] (675.394 ms) : 0, 675394
AgentMeter [baseline] (12.195 ms) : 0, 12195
AgentMeter [candidate] (12.24 ms) : 0, 12240
GlobalTracer [baseline] (249.566 ms) : 0, 249566
GlobalTracer [candidate] (249.053 ms) : 0, 249053
IAST [baseline] (24.637 ms) : 0, 24637
IAST [candidate] (24.636 ms) : 0, 24636
AppSec [baseline] (185.056 ms) : 0, 185056
AppSec [candidate] (185.061 ms) : 0, 185061
Debugger [baseline] (64.6 ms) : 0, 64600
Debugger [candidate] (64.647 ms) : 0, 64647
Remote Config [baseline] (559.433 µs) : 0, 559
Remote Config [candidate] (602.596 µs) : 0, 603
Telemetry [baseline] (7.758 ms) : 0, 7758
Telemetry [candidate] (7.835 ms) : 0, 7835
Flare Poller [baseline] (8.566 ms) : 0, 8566
Flare Poller [candidate] (7.821 ms) : 0, 7821
section iast
crashtracking [baseline] (1.226 ms) : 0, 1226
crashtracking [candidate] (1.237 ms) : 0, 1237
BytebuddyAgent [baseline] (825.029 ms) : 0, 825029
BytebuddyAgent [candidate] (829.294 ms) : 0, 829294
AgentMeter [baseline] (11.324 ms) : 0, 11324
AgentMeter [candidate] (11.424 ms) : 0, 11424
GlobalTracer [baseline] (238.355 ms) : 0, 238355
GlobalTracer [candidate] (239.415 ms) : 0, 239415
IAST [baseline] (28.267 ms) : 0, 28267
IAST [candidate] (28.322 ms) : 0, 28322
AppSec [baseline] (31.655 ms) : 0, 31655
AppSec [candidate] (31.808 ms) : 0, 31808
Debugger [baseline] (63.471 ms) : 0, 63471
Debugger [candidate] (64.021 ms) : 0, 64021
Remote Config [baseline] (531.124 µs) : 0, 531
Remote Config [candidate] (541.071 µs) : 0, 541
Telemetry [baseline] (7.987 ms) : 0, 7987
Telemetry [candidate] (8.099 ms) : 0, 8099
Flare Poller [baseline] (3.439 ms) : 0, 3439
Flare Poller [candidate] (3.534 ms) : 0, 3534
section profiling
crashtracking [baseline] (1.188 ms) : 0, 1188
crashtracking [candidate] (1.184 ms) : 0, 1184
BytebuddyAgent [baseline] (699.053 ms) : 0, 699053
BytebuddyAgent [candidate] (696.85 ms) : 0, 696850
AgentMeter [baseline] (8.979 ms) : 0, 8979
AgentMeter [candidate] (8.976 ms) : 0, 8976
GlobalTracer [baseline] (210.799 ms) : 0, 210799
GlobalTracer [candidate] (209.364 ms) : 0, 209364
AppSec [baseline] (33.52 ms) : 0, 33520
AppSec [candidate] (32.693 ms) : 0, 32693
Debugger [baseline] (66.677 ms) : 0, 66677
Debugger [candidate] (65.486 ms) : 0, 65486
Remote Config [baseline] (603.305 µs) : 0, 603
Remote Config [candidate] (574.793 µs) : 0, 575
Telemetry [baseline] (8.181 ms) : 0, 8181
Telemetry [candidate] (8.116 ms) : 0, 8116
Flare Poller [baseline] (3.564 ms) : 0, 3564
Flare Poller [candidate] (3.549 ms) : 0, 3549
ProfilingAgent [baseline] (93.907 ms) : 0, 93907
ProfilingAgent [candidate] (93.674 ms) : 0, 93674
Profiling [baseline] (94.484 ms) : 0, 94484
Profiling [candidate] (94.224 ms) : 0, 94224
LoadParameters
See matching parameters
SummaryFound 2 performance improvements and 1 performance regressions! Performance is the same for 18 metrics, 15 unstable metrics.
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.62.0-SNAPSHOT~f4dbbf9945, baseline=1.62.0-SNAPSHOT~e6cac64dfd
dateFormat X
axisFormat %s
section baseline
no_agent (1.321 ms) : 1309, 1333
. : milestone, 1321,
iast (3.424 ms) : 3378, 3470
. : milestone, 3424,
iast_FULL (6.008 ms) : 5947, 6068
. : milestone, 6008,
iast_GLOBAL (3.645 ms) : 3584, 3707
. : milestone, 3645,
profiling (2.152 ms) : 2131, 2174
. : milestone, 2152,
tracing (1.862 ms) : 1847, 1877
. : milestone, 1862,
section candidate
no_agent (1.25 ms) : 1239, 1261
. : milestone, 1250,
iast (3.304 ms) : 3259, 3349
. : milestone, 3304,
iast_FULL (6.01 ms) : 5950, 6070
. : milestone, 6010,
iast_GLOBAL (3.753 ms) : 3687, 3818
. : milestone, 3753,
profiling (2.164 ms) : 2145, 2183
. : milestone, 2164,
tracing (1.953 ms) : 1934, 1972
. : milestone, 1953,
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.62.0-SNAPSHOT~f4dbbf9945, baseline=1.62.0-SNAPSHOT~e6cac64dfd
dateFormat X
axisFormat %s
section baseline
no_agent (19.012 ms) : 18822, 19202
. : milestone, 19012,
appsec (18.773 ms) : 18585, 18960
. : milestone, 18773,
code_origins (18.01 ms) : 17833, 18186
. : milestone, 18010,
iast (17.928 ms) : 17753, 18103
. : milestone, 17928,
profiling (18.503 ms) : 18322, 18684
. : milestone, 18503,
tracing (18.157 ms) : 17980, 18335
. : milestone, 18157,
section candidate
no_agent (18.037 ms) : 17855, 18219
. : milestone, 18037,
appsec (18.792 ms) : 18604, 18981
. : milestone, 18792,
code_origins (17.779 ms) : 17603, 17955
. : milestone, 17779,
iast (18.904 ms) : 18711, 19097
. : milestone, 18904,
profiling (18.355 ms) : 18172, 18537
. : milestone, 18355,
tracing (17.725 ms) : 17552, 17898
. : milestone, 17725,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics. Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.62.0-SNAPSHOT~f4dbbf9945, baseline=1.62.0-SNAPSHOT~e6cac64dfd
dateFormat X
axisFormat %s
section baseline
no_agent (14.83 s) : 14830000, 14830000
. : milestone, 14830000,
appsec (15.151 s) : 15151000, 15151000
. : milestone, 15151000,
iast (18.167 s) : 18167000, 18167000
. : milestone, 18167000,
iast_GLOBAL (17.859 s) : 17859000, 17859000
. : milestone, 17859000,
profiling (14.756 s) : 14756000, 14756000
. : milestone, 14756000,
tracing (15.128 s) : 15128000, 15128000
. : milestone, 15128000,
section candidate
no_agent (15.549 s) : 15549000, 15549000
. : milestone, 15549000,
appsec (15.095 s) : 15095000, 15095000
. : milestone, 15095000,
iast (18.526 s) : 18526000, 18526000
. : milestone, 18526000,
iast_GLOBAL (18.081 s) : 18081000, 18081000
. : milestone, 18081000,
profiling (15.514 s) : 15514000, 15514000
. : milestone, 15514000,
tracing (14.751 s) : 14751000, 14751000
. : milestone, 14751000,
Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.62.0-SNAPSHOT~f4dbbf9945, baseline=1.62.0-SNAPSHOT~e6cac64dfd
dateFormat X
axisFormat %s
section baseline
no_agent (1.487 ms) : 1476, 1499
. : milestone, 1487,
appsec (3.805 ms) : 3583, 4026
. : milestone, 3805,
iast (2.274 ms) : 2204, 2343
. : milestone, 2274,
iast_GLOBAL (2.315 ms) : 2246, 2385
. : milestone, 2315,
profiling (2.089 ms) : 2035, 2144
. : milestone, 2089,
tracing (2.072 ms) : 2019, 2125
. : milestone, 2072,
section candidate
no_agent (1.488 ms) : 1476, 1499
. : milestone, 1488,
appsec (3.84 ms) : 3616, 4064
. : milestone, 3840,
iast (2.272 ms) : 2202, 2341
. : milestone, 2272,
iast_GLOBAL (2.321 ms) : 2251, 2391
. : milestone, 2321,
profiling (2.091 ms) : 2037, 2145
. : milestone, 2091,
tracing (2.074 ms) : 2021, 2127
. : milestone, 2074,
|
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds a failing smoke test on top of #10989 demonstrating that standalone LLMObs spans (no surrounding APM scope) get silently dropped by the new
StandaloneSampler.Why
In
DDLLMObsSpan(dd-java-agent/agent-llmobs/.../DDLLMObsSpan.java):getTraceSegment()readsactiveSpan()(CoreTracer.java:1505-1515). ButLLMObsContext.attach(span.context())does not activate anAgentScope— it writes to its ownContextKey. So when LLMObs is used without a surrounding APM scope (CLI, batch, worker, fresh thread, before any HTTP handler),activeSpan()isnull, segment isnull, and the LLMOBS bit is never propagated to the local root.StandaloneSampleronly checks the root's traceSource bitfield → trace is dropped.The existing
LlmObsApmDisabledSmokeTestdoes not catch this because every LLMObs call runs inside a Spring servlet handler, so the request span is the active scope and is the local root for the LLMObs trace.How this repro works
/rest-api/llmobs/standalonethat runsLLMObs.startLLMSpan(...)on a fresh thread (no inherited APM scope).0x20→\"20\") in_dd.p.tson the root,SAMPLER_KEEP.On top of #10989 the test fails because the trace receives
SAMPLER_DROPand_dd.p.tsis missing.Suggested fix
Use the LLMObs span's own context (already a
TraceSegmentviaDDSpanContext) instead of the active-scope segment:DDSpanContext.setTagTopalready routes throughgetRootSpanContextOrThis()(DDSpanContext.java:1363), so it correctly marks the local root regardless of which scope is active.Test plan
./gradlew :dd-smoke-tests:apm-tracing-disabled:test --tests \"*standalone LLMObs*\"against this branch — expected to fail.