Provide optimized writers for OpenTelemetry's "trace.proto" wire protocol#11120
Provide optimized writers for OpenTelemetry's "trace.proto" wire protocol#11120
Conversation
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 63 metrics, 8 unstable metrics. Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.62.0-SNAPSHOT~397618fe1c, baseline=1.62.0-SNAPSHOT~d5d2097cb9
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.057 s) : 0, 1056864
Total [baseline] (8.792 s) : 0, 8792297
Agent [candidate] (1.058 s) : 0, 1058128
Total [candidate] (8.832 s) : 0, 8831719
section iast
Agent [baseline] (1.225 s) : 0, 1224854
Total [baseline] (9.555 s) : 0, 9555351
Agent [candidate] (1.221 s) : 0, 1221465
Total [candidate] (9.531 s) : 0, 9530635
gantt
title insecure-bank - break down per module: candidate=1.62.0-SNAPSHOT~397618fe1c, baseline=1.62.0-SNAPSHOT~d5d2097cb9
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.228 ms) : 0, 1228
crashtracking [candidate] (1.225 ms) : 0, 1225
BytebuddyAgent [baseline] (632.795 ms) : 0, 632795
BytebuddyAgent [candidate] (634.341 ms) : 0, 634341
AgentMeter [baseline] (29.555 ms) : 0, 29555
AgentMeter [candidate] (29.54 ms) : 0, 29540
GlobalTracer [baseline] (248.117 ms) : 0, 248117
GlobalTracer [candidate] (249.378 ms) : 0, 249378
AppSec [baseline] (32.329 ms) : 0, 32329
AppSec [candidate] (32.406 ms) : 0, 32406
Debugger [baseline] (59.021 ms) : 0, 59021
Debugger [candidate] (59.035 ms) : 0, 59035
Remote Config [baseline] (599.472 µs) : 0, 599
Remote Config [candidate] (609.97 µs) : 0, 610
Telemetry [baseline] (8.0 ms) : 0, 8000
Telemetry [candidate] (8.014 ms) : 0, 8014
Flare Poller [baseline] (9.066 ms) : 0, 9066
Flare Poller [candidate] (7.426 ms) : 0, 7426
section iast
crashtracking [baseline] (1.24 ms) : 0, 1240
crashtracking [candidate] (1.222 ms) : 0, 1222
BytebuddyAgent [baseline] (801.389 ms) : 0, 801389
BytebuddyAgent [candidate] (798.808 ms) : 0, 798808
AgentMeter [baseline] (11.676 ms) : 0, 11676
AgentMeter [candidate] (11.586 ms) : 0, 11586
GlobalTracer [baseline] (239.829 ms) : 0, 239829
GlobalTracer [candidate] (238.72 ms) : 0, 238720
IAST [baseline] (25.982 ms) : 0, 25982
IAST [candidate] (25.839 ms) : 0, 25839
AppSec [baseline] (33.35 ms) : 0, 33350
AppSec [candidate] (31.985 ms) : 0, 31985
Debugger [baseline] (61.747 ms) : 0, 61747
Debugger [candidate] (63.662 ms) : 0, 63662
Remote Config [baseline] (539.057 µs) : 0, 539
Remote Config [candidate] (534.116 µs) : 0, 534
Telemetry [baseline] (9.292 ms) : 0, 9292
Telemetry [candidate] (9.407 ms) : 0, 9407
Flare Poller [baseline] (3.639 ms) : 0, 3639
Flare Poller [candidate] (3.615 ms) : 0, 3615
Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.62.0-SNAPSHOT~397618fe1c, baseline=1.62.0-SNAPSHOT~d5d2097cb9
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.056 s) : 0, 1056322
Total [baseline] (11.042 s) : 0, 11042462
Agent [candidate] (1.058 s) : 0, 1057877
Total [candidate] (11.085 s) : 0, 11084609
section appsec
Agent [baseline] (1.252 s) : 0, 1252335
Total [baseline] (11.027 s) : 0, 11026633
Agent [candidate] (1.248 s) : 0, 1248192
Total [candidate] (11.211 s) : 0, 11210673
section iast
Agent [baseline] (1.223 s) : 0, 1223292
Total [baseline] (11.19 s) : 0, 11189571
Agent [candidate] (1.225 s) : 0, 1224534
Total [candidate] (11.313 s) : 0, 11313499
section profiling
Agent [baseline] (1.196 s) : 0, 1195708
Total [baseline] (11.033 s) : 0, 11033181
Agent [candidate] (1.187 s) : 0, 1187043
Total [candidate] (11.011 s) : 0, 11011386
gantt
title petclinic - break down per module: candidate=1.62.0-SNAPSHOT~397618fe1c, baseline=1.62.0-SNAPSHOT~d5d2097cb9
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.217 ms) : 0, 1217
crashtracking [candidate] (1.227 ms) : 0, 1227
BytebuddyAgent [baseline] (631.786 ms) : 0, 631786
BytebuddyAgent [candidate] (632.214 ms) : 0, 632214
AgentMeter [baseline] (29.486 ms) : 0, 29486
AgentMeter [candidate] (29.5 ms) : 0, 29500
GlobalTracer [baseline] (248.13 ms) : 0, 248130
GlobalTracer [candidate] (249.184 ms) : 0, 249184
AppSec [baseline] (32.293 ms) : 0, 32293
AppSec [candidate] (32.275 ms) : 0, 32275
Debugger [baseline] (59.772 ms) : 0, 59772
Debugger [candidate] (59.747 ms) : 0, 59747
Remote Config [baseline] (589.264 µs) : 0, 589
Remote Config [candidate] (590.356 µs) : 0, 590
Telemetry [baseline] (8.709 ms) : 0, 8709
Telemetry [candidate] (8.047 ms) : 0, 8047
Flare Poller [baseline] (8.253 ms) : 0, 8253
Flare Poller [candidate] (8.914 ms) : 0, 8914
section appsec
crashtracking [baseline] (1.22 ms) : 0, 1220
crashtracking [candidate] (1.211 ms) : 0, 1211
BytebuddyAgent [baseline] (664.211 ms) : 0, 664211
BytebuddyAgent [candidate] (661.673 ms) : 0, 661673
AgentMeter [baseline] (12.301 ms) : 0, 12301
AgentMeter [candidate] (12.256 ms) : 0, 12256
GlobalTracer [baseline] (249.769 ms) : 0, 249769
GlobalTracer [candidate] (248.638 ms) : 0, 248638
IAST [baseline] (24.628 ms) : 0, 24628
IAST [candidate] (24.435 ms) : 0, 24435
AppSec [baseline] (184.928 ms) : 0, 184928
AppSec [candidate] (185.262 ms) : 0, 185262
Debugger [baseline] (66.265 ms) : 0, 66265
Debugger [candidate] (65.881 ms) : 0, 65881
Remote Config [baseline] (609.87 µs) : 0, 610
Remote Config [candidate] (605.608 µs) : 0, 606
Telemetry [baseline] (8.333 ms) : 0, 8333
Telemetry [candidate] (8.343 ms) : 0, 8343
Flare Poller [baseline] (3.498 ms) : 0, 3498
Flare Poller [candidate] (3.542 ms) : 0, 3542
section iast
crashtracking [baseline] (1.222 ms) : 0, 1222
crashtracking [candidate] (1.231 ms) : 0, 1231
BytebuddyAgent [baseline] (799.935 ms) : 0, 799935
BytebuddyAgent [candidate] (799.759 ms) : 0, 799759
AgentMeter [baseline] (11.584 ms) : 0, 11584
AgentMeter [candidate] (11.65 ms) : 0, 11650
GlobalTracer [baseline] (238.668 ms) : 0, 238668
GlobalTracer [candidate] (239.58 ms) : 0, 239580
IAST [baseline] (25.787 ms) : 0, 25787
IAST [candidate] (25.842 ms) : 0, 25842
AppSec [baseline] (31.153 ms) : 0, 31153
AppSec [candidate] (31.576 ms) : 0, 31576
Debugger [baseline] (65.418 ms) : 0, 65418
Debugger [candidate] (65.344 ms) : 0, 65344
Remote Config [baseline] (537.354 µs) : 0, 537
Remote Config [candidate] (531.559 µs) : 0, 532
Telemetry [baseline] (9.253 ms) : 0, 9253
Telemetry [candidate] (9.365 ms) : 0, 9365
Flare Poller [baseline] (3.576 ms) : 0, 3576
Flare Poller [candidate] (3.623 ms) : 0, 3623
section profiling
crashtracking [baseline] (1.199 ms) : 0, 1199
crashtracking [candidate] (1.18 ms) : 0, 1180
BytebuddyAgent [baseline] (700.314 ms) : 0, 700314
BytebuddyAgent [candidate] (692.817 ms) : 0, 692817
AgentMeter [baseline] (9.213 ms) : 0, 9213
AgentMeter [candidate] (9.267 ms) : 0, 9267
GlobalTracer [baseline] (208.475 ms) : 0, 208475
GlobalTracer [candidate] (208.03 ms) : 0, 208030
AppSec [baseline] (32.862 ms) : 0, 32862
AppSec [candidate] (32.986 ms) : 0, 32986
Debugger [baseline] (66.127 ms) : 0, 66127
Debugger [candidate] (65.942 ms) : 0, 65942
Remote Config [baseline] (581.002 µs) : 0, 581
Remote Config [candidate] (583.759 µs) : 0, 584
Telemetry [baseline] (7.814 ms) : 0, 7814
Telemetry [candidate] (7.754 ms) : 0, 7754
Flare Poller [baseline] (3.576 ms) : 0, 3576
Flare Poller [candidate] (3.498 ms) : 0, 3498
ProfilingAgent [baseline] (93.611 ms) : 0, 93611
ProfilingAgent [candidate] (93.764 ms) : 0, 93764
Profiling [baseline] (94.166 ms) : 0, 94166
Profiling [candidate] (94.304 ms) : 0, 94304
LoadParameters
See matching parameters
SummaryFound 1 performance improvements and 0 performance regressions! Performance is the same for 19 metrics, 16 unstable metrics.
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.62.0-SNAPSHOT~397618fe1c, baseline=1.62.0-SNAPSHOT~d5d2097cb9
dateFormat X
axisFormat %s
section baseline
no_agent (18.064 ms) : 17882, 18247
. : milestone, 18064,
appsec (18.821 ms) : 18630, 19011
. : milestone, 18821,
code_origins (17.901 ms) : 17722, 18080
. : milestone, 17901,
iast (17.657 ms) : 17484, 17831
. : milestone, 17657,
profiling (18.281 ms) : 18102, 18460
. : milestone, 18281,
tracing (17.511 ms) : 17337, 17686
. : milestone, 17511,
section candidate
no_agent (19.111 ms) : 18916, 19307
. : milestone, 19111,
appsec (18.871 ms) : 18680, 19061
. : milestone, 18871,
code_origins (17.713 ms) : 17537, 17890
. : milestone, 17713,
iast (17.923 ms) : 17747, 18100
. : milestone, 17923,
profiling (18.334 ms) : 18154, 18514
. : milestone, 18334,
tracing (17.98 ms) : 17803, 18158
. : milestone, 17980,
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.62.0-SNAPSHOT~397618fe1c, baseline=1.62.0-SNAPSHOT~d5d2097cb9
dateFormat X
axisFormat %s
section baseline
no_agent (1.254 ms) : 1242, 1266
. : milestone, 1254,
iast (3.404 ms) : 3357, 3450
. : milestone, 3404,
iast_FULL (6.064 ms) : 6002, 6126
. : milestone, 6064,
iast_GLOBAL (3.869 ms) : 3813, 3925
. : milestone, 3869,
profiling (2.232 ms) : 2208, 2256
. : milestone, 2232,
tracing (1.929 ms) : 1911, 1947
. : milestone, 1929,
section candidate
no_agent (1.297 ms) : 1284, 1311
. : milestone, 1297,
iast (3.257 ms) : 3210, 3304
. : milestone, 3257,
iast_FULL (5.943 ms) : 5883, 6003
. : milestone, 5943,
iast_GLOBAL (3.826 ms) : 3759, 3893
. : milestone, 3826,
profiling (2.087 ms) : 2067, 2108
. : milestone, 2087,
tracing (1.873 ms) : 1856, 1889
. : milestone, 1873,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 12 metrics, 0 unstable metrics. Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.62.0-SNAPSHOT~397618fe1c, baseline=1.62.0-SNAPSHOT~d5d2097cb9
dateFormat X
axisFormat %s
section baseline
no_agent (1.487 ms) : 1476, 1499
. : milestone, 1487,
appsec (2.527 ms) : 2472, 2581
. : milestone, 2527,
iast (2.267 ms) : 2198, 2335
. : milestone, 2267,
iast_GLOBAL (2.31 ms) : 2241, 2380
. : milestone, 2310,
profiling (2.089 ms) : 2035, 2144
. : milestone, 2089,
tracing (2.075 ms) : 2022, 2128
. : milestone, 2075,
section candidate
no_agent (1.486 ms) : 1474, 1497
. : milestone, 1486,
appsec (2.529 ms) : 2474, 2583
. : milestone, 2529,
iast (2.274 ms) : 2204, 2344
. : milestone, 2274,
iast_GLOBAL (2.307 ms) : 2237, 2376
. : milestone, 2307,
profiling (2.099 ms) : 2044, 2154
. : milestone, 2099,
tracing (2.077 ms) : 2024, 2131
. : milestone, 2077,
Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.62.0-SNAPSHOT~397618fe1c, baseline=1.62.0-SNAPSHOT~d5d2097cb9
dateFormat X
axisFormat %s
section baseline
no_agent (14.912 s) : 14912000, 14912000
. : milestone, 14912000,
appsec (14.744 s) : 14744000, 14744000
. : milestone, 14744000,
iast (18.559 s) : 18559000, 18559000
. : milestone, 18559000,
iast_GLOBAL (18.404 s) : 18404000, 18404000
. : milestone, 18404000,
profiling (14.942 s) : 14942000, 14942000
. : milestone, 14942000,
tracing (15.048 s) : 15048000, 15048000
. : milestone, 15048000,
section candidate
no_agent (15.239 s) : 15239000, 15239000
. : milestone, 15239000,
appsec (14.725 s) : 14725000, 14725000
. : milestone, 14725000,
iast (18.706 s) : 18706000, 18706000
. : milestone, 18706000,
iast_GLOBAL (17.975 s) : 17975000, 17975000
. : milestone, 17975000,
profiling (14.886 s) : 14886000, 14886000
. : milestone, 14886000,
tracing (14.806 s) : 14806000, 14806000
. : milestone, 14806000,
|
583dc0c to
4adb56e
Compare
| /** | ||
| * Collects trace spans and marshalls them into a chunked payload. | ||
| * | ||
| * <p>This payload is only valid for the calling thread until the next collection. | ||
| */ | ||
| @Override | ||
| public OtlpPayload collectSpans(List<DDSpan> spans) { |
There was a problem hiding this comment.
Is List<DDSpan> spans expected to be spans from a single trace? If so, each collectSpans call produces a full TracesData envelope with resource and scope wrappers per trace. This doesn't seem optimal and differs from the Datadog/msgpack implementation? Unless the expectation is that the eventual OtlpWriter will accumulated completed traces and call this once per flush cycle with a combined span list (although that can't be right based on the MetaWriter, which expects just a single trace at a time).
There was a problem hiding this comment.
Very good point - on reflection I'll change this to add a flush method so we can accumulate trace chunks over multiple calls.
There was a problem hiding this comment.
OK, I've updated the collector API so it has two methods:
addTrace(spans)which adds a trace to the collectorcollectTraces()which marshals the collected spans into a payload
This should allow its use as a replacement PayloadDispatcher, which means we can re-use more of the existing remote writer code.
…send them as first-class links (likewise turn off legacy baggage injection)
ab2ef0b to
7cdfed7
Compare
dougqh
left a comment
There was a problem hiding this comment.
Claude caught a couple issues...
- NPE and ClassCastException
Since I'm off next week, I'm not going to "request changes".
I'll just trust those get fixed and let someone else do the final review.
Also, added one key performance suggestion around use of forEach.
And here are couple more Claude reported that I'll leave to your discretion...
Config.get().getServiceName() on every span — OtlpTraceProto.java:137
if (!Config.get().getServiceName().equalsIgnoreCase(span.getServiceName())) {
Cache the default service name (ideally as a UTF8BytesString for cheap equality). This runs for every span in every payload.recordMessage allocates a fresh ByteBuffer + backing array per chunk — OtlpCommonProto.java:126-140
Every span, every link, every scope prefix gets its own heap allocation. Precisely-sized allocations are nice but total allocation count scales with the
chunk count. If profiling shows GC pressure, a small reusable scratch arena that hands out slices (or an OtlpPayload that owns a large backing buffer with
offset/length pairs) would eliminate most of this. Trade-off is lifetime complexity, so only worth it if measurements show it matters.
Yes, sadly this is the nature of heavily nested protobuf messages (the protobuf manual says to avoid too much nesting) It means that before we can write out a span we need to know its exact message size. And because the size field is written out with You could process traces twice - once to size everything, and again to write it out - but the book-keeping needed for that gets complicated, and you're doubling the CPU time doing two passes. Initial benchmarking showed we're allocating less than OTel with the current approach, mainly because we re-use the same buffer for doing the initial writes before recording each message slice. But I might look into pooling of slices to reduce churn. |
…takes an extra context object
97b5fc9 to
e77fe7e
Compare
What Does This Do
Uses a single temporary buffer as in #10983 to prepare message chunks at different nesting levels (resource / scope / span)
First we chunk all nested messages, i.e. span-links, for a given span. Once the span is complete we add the first part of the span message and its chunked links to the scoped chunks. Once the scope is complete we add the first part of the scoped spans message and all its chunks (span messages and their links) to the payload. Once all the span data has been chunked we add the enclosing resource metrics message to the start of the payload.
Multiple traces can be added to the collector before collecting them into a payload. Note that this payload is only valid for the calling thread until the next collection. Adding traces after collection automatically starts a new payload.
Motivation
Avoids need to use full protobuf library while keeping intermediate array creation to a minimum.
Additional Notes
OtlpTraceProtoTestwas created with the help of Claude.Contributor Checklist
type:and (comp:orinst:) labels in addition to any other useful labelsclose,fix, or any linking keywords when referencing an issueUse
solvesinstead, and assign the PR milestone to the issueJira ticket: [PROJ-IDENT]
Note: Once your PR is ready to merge, add it to the merge queue by commenting
/merge./merge -ccancels the queue request./merge -f --reason "reason"skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see this doc.