Reducing memory allocation from client-side stats#10705
Reducing memory allocation from client-side stats#10705gh-worker-dd-mergequeue-cf854d[bot] merged 8 commits intomasterfrom
Conversation
Introduced DDCache-s around UTF8BytesString construction UTF8BytesString are advantageous for serialization, but that only applies to the key instance that is actually serialized Most key instances here are being created to do a look-up into the map. so the UTF8BytesString creation was extra work. This change is intended as a quick fix. I think there's still a lot of room for improvement by restructuring the code further, but that is left for a future PR. As is this changer reduces the impact on application throughput by 5-20% depending on the heap size.
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 61 metrics, 10 unstable metrics. Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.60.0-SNAPSHOT~4098c5d13e, baseline=1.61.0-SNAPSHOT~92d7ae699c
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.068 s) : 0, 1068378
Total [baseline] (11.09 s) : 0, 11090408
Agent [candidate] (1.065 s) : 0, 1065068
Total [candidate] (11.043 s) : 0, 11043042
section appsec
Agent [baseline] (1.247 s) : 0, 1247380
Total [baseline] (11.137 s) : 0, 11137446
Agent [candidate] (1.249 s) : 0, 1249124
Total [candidate] (11.159 s) : 0, 11158836
section iast
Agent [baseline] (1.227 s) : 0, 1226988
Total [baseline] (11.326 s) : 0, 11326187
Agent [candidate] (1.237 s) : 0, 1236577
Total [candidate] (11.416 s) : 0, 11415590
section profiling
Agent [baseline] (1.183 s) : 0, 1183285
Total [baseline] (10.95 s) : 0, 10950418
Agent [candidate] (1.182 s) : 0, 1181968
Total [candidate] (11.004 s) : 0, 11004273
gantt
title petclinic - break down per module: candidate=1.60.0-SNAPSHOT~4098c5d13e, baseline=1.61.0-SNAPSHOT~92d7ae699c
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.21 ms) : 0, 1210
crashtracking [candidate] (1.221 ms) : 0, 1221
BytebuddyAgent [baseline] (632.319 ms) : 0, 632319
BytebuddyAgent [candidate] (631.648 ms) : 0, 631648
AgentMeter [baseline] (29.379 ms) : 0, 29379
AgentMeter [candidate] (29.366 ms) : 0, 29366
GlobalTracer [baseline] (258.967 ms) : 0, 258967
GlobalTracer [candidate] (258.459 ms) : 0, 258459
AppSec [baseline] (31.859 ms) : 0, 31859
AppSec [candidate] (31.819 ms) : 0, 31819
Debugger [baseline] (59.839 ms) : 0, 59839
Debugger [candidate] (59.807 ms) : 0, 59807
Remote Config [baseline] (595.538 µs) : 0, 596
Remote Config [candidate] (592.628 µs) : 0, 593
Telemetry [baseline] (8.709 ms) : 0, 8709
Telemetry [candidate] (8.677 ms) : 0, 8677
Flare Poller [baseline] (9.344 ms) : 0, 9344
Flare Poller [candidate] (7.198 ms) : 0, 7198
section appsec
crashtracking [baseline] (1.21 ms) : 0, 1210
crashtracking [candidate] (1.197 ms) : 0, 1197
BytebuddyAgent [baseline] (658.031 ms) : 0, 658031
BytebuddyAgent [candidate] (659.853 ms) : 0, 659853
AgentMeter [baseline] (12.075 ms) : 0, 12075
AgentMeter [candidate] (12.019 ms) : 0, 12019
GlobalTracer [baseline] (258.948 ms) : 0, 258948
GlobalTracer [candidate] (258.905 ms) : 0, 258905
AppSec [baseline] (177.714 ms) : 0, 177714
AppSec [candidate] (177.84 ms) : 0, 177840
Debugger [baseline] (65.943 ms) : 0, 65943
Debugger [candidate] (65.684 ms) : 0, 65684
Remote Config [baseline] (567.68 µs) : 0, 568
Remote Config [candidate] (571.107 µs) : 0, 571
Telemetry [baseline] (9.009 ms) : 0, 9009
Telemetry [candidate] (9.081 ms) : 0, 9081
Flare Poller [baseline] (3.571 ms) : 0, 3571
Flare Poller [candidate] (3.591 ms) : 0, 3591
IAST [baseline] (23.999 ms) : 0, 23999
IAST [candidate] (24.022 ms) : 0, 24022
section iast
crashtracking [baseline] (1.186 ms) : 0, 1186
crashtracking [candidate] (1.227 ms) : 0, 1227
BytebuddyAgent [baseline] (795.824 ms) : 0, 795824
BytebuddyAgent [candidate] (802.859 ms) : 0, 802859
AgentMeter [baseline] (11.375 ms) : 0, 11375
AgentMeter [candidate] (11.632 ms) : 0, 11632
GlobalTracer [baseline] (247.138 ms) : 0, 247138
GlobalTracer [candidate] (249.049 ms) : 0, 249049
AppSec [baseline] (26.476 ms) : 0, 26476
AppSec [candidate] (26.625 ms) : 0, 26625
Debugger [baseline] (64.663 ms) : 0, 64663
Debugger [candidate] (64.004 ms) : 0, 64004
Remote Config [baseline] (538.021 µs) : 0, 538
Remote Config [candidate] (553.931 µs) : 0, 554
Telemetry [baseline] (14.15 ms) : 0, 14150
Telemetry [candidate] (14.557 ms) : 0, 14557
Flare Poller [baseline] (4.307 ms) : 0, 4307
Flare Poller [candidate] (4.571 ms) : 0, 4571
IAST [baseline] (25.139 ms) : 0, 25139
IAST [candidate] (25.331 ms) : 0, 25331
section profiling
crashtracking [baseline] (1.179 ms) : 0, 1179
crashtracking [candidate] (1.179 ms) : 0, 1179
BytebuddyAgent [baseline] (683.451 ms) : 0, 683451
BytebuddyAgent [candidate] (682.516 ms) : 0, 682516
AgentMeter [baseline] (8.609 ms) : 0, 8609
AgentMeter [candidate] (8.609 ms) : 0, 8609
GlobalTracer [baseline] (215.802 ms) : 0, 215802
GlobalTracer [candidate] (215.777 ms) : 0, 215777
AppSec [baseline] (32.064 ms) : 0, 32064
AppSec [candidate] (31.98 ms) : 0, 31980
Debugger [baseline] (64.592 ms) : 0, 64592
Debugger [candidate] (61.369 ms) : 0, 61369
Remote Config [baseline] (579.375 µs) : 0, 579
Remote Config [candidate] (581.988 µs) : 0, 582
Telemetry [baseline] (8.977 ms) : 0, 8977
Telemetry [candidate] (11.37 ms) : 0, 11370
Flare Poller [baseline] (3.442 ms) : 0, 3442
Flare Poller [candidate] (4.26 ms) : 0, 4260
ProfilingAgent [baseline] (93.748 ms) : 0, 93748
ProfilingAgent [candidate] (93.452 ms) : 0, 93452
Profiling [baseline] (94.306 ms) : 0, 94306
Profiling [candidate] (94.011 ms) : 0, 94011
Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.60.0-SNAPSHOT~4098c5d13e, baseline=1.61.0-SNAPSHOT~92d7ae699c
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.061 s) : 0, 1061113
Total [baseline] (8.835 s) : 0, 8834857
Agent [candidate] (1.057 s) : 0, 1057104
Total [candidate] (8.839 s) : 0, 8839384
section iast
Agent [baseline] (1.226 s) : 0, 1226073
Total [baseline] (9.558 s) : 0, 9558086
Agent [candidate] (1.234 s) : 0, 1233754
Total [candidate] (9.621 s) : 0, 9620848
gantt
title insecure-bank - break down per module: candidate=1.60.0-SNAPSHOT~4098c5d13e, baseline=1.61.0-SNAPSHOT~92d7ae699c
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.196 ms) : 0, 1196
crashtracking [candidate] (1.208 ms) : 0, 1208
BytebuddyAgent [baseline] (628.772 ms) : 0, 628772
BytebuddyAgent [candidate] (628.239 ms) : 0, 628239
AgentMeter [baseline] (29.118 ms) : 0, 29118
AgentMeter [candidate] (29.109 ms) : 0, 29109
GlobalTracer [baseline] (257.368 ms) : 0, 257368
GlobalTracer [candidate] (257.285 ms) : 0, 257285
AppSec [baseline] (31.585 ms) : 0, 31585
AppSec [candidate] (31.519 ms) : 0, 31519
Debugger [baseline] (58.833 ms) : 0, 58833
Debugger [candidate] (58.77 ms) : 0, 58770
Remote Config [baseline] (587.007 µs) : 0, 587
Remote Config [candidate] (588.375 µs) : 0, 588
Telemetry [baseline] (8.661 ms) : 0, 8661
Telemetry [candidate] (8.607 ms) : 0, 8607
Flare Poller [baseline] (8.842 ms) : 0, 8842
Flare Poller [candidate] (5.688 ms) : 0, 5688
section iast
crashtracking [baseline] (1.194 ms) : 0, 1194
crashtracking [candidate] (1.221 ms) : 0, 1221
BytebuddyAgent [baseline] (795.473 ms) : 0, 795473
BytebuddyAgent [candidate] (800.805 ms) : 0, 800805
AgentMeter [baseline] (11.328 ms) : 0, 11328
AgentMeter [candidate] (11.577 ms) : 0, 11577
GlobalTracer [baseline] (247.567 ms) : 0, 247567
GlobalTracer [candidate] (248.606 ms) : 0, 248606
AppSec [baseline] (26.32 ms) : 0, 26320
AppSec [candidate] (26.513 ms) : 0, 26513
Debugger [baseline] (62.824 ms) : 0, 62824
Debugger [candidate] (63.021 ms) : 0, 63021
Remote Config [baseline] (544.43 µs) : 0, 544
Remote Config [candidate] (542.615 µs) : 0, 543
Telemetry [baseline] (15.334 ms) : 0, 15334
Telemetry [candidate] (15.395 ms) : 0, 15395
Flare Poller [baseline] (4.388 ms) : 0, 4388
Flare Poller [candidate] (4.622 ms) : 0, 4622
IAST [baseline] (25.056 ms) : 0, 25056
IAST [candidate] (25.277 ms) : 0, 25277
LoadParameters
See matching parameters
SummaryFound 2 performance improvements and 2 performance regressions! Performance is the same for 15 metrics, 17 unstable metrics.
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.60.0-SNAPSHOT~4098c5d13e, baseline=1.61.0-SNAPSHOT~92d7ae699c
dateFormat X
axisFormat %s
section baseline
no_agent (1.185 ms) : 1173, 1197
. : milestone, 1185,
iast (3.267 ms) : 3221, 3312
. : milestone, 3267,
iast_FULL (5.792 ms) : 5734, 5850
. : milestone, 5792,
iast_GLOBAL (3.526 ms) : 3463, 3588
. : milestone, 3526,
profiling (2.091 ms) : 2070, 2112
. : milestone, 2091,
tracing (1.805 ms) : 1790, 1820
. : milestone, 1805,
section candidate
no_agent (1.219 ms) : 1205, 1233
. : milestone, 1219,
iast (3.324 ms) : 3284, 3364
. : milestone, 3324,
iast_FULL (5.887 ms) : 5829, 5946
. : milestone, 5887,
iast_GLOBAL (3.53 ms) : 3474, 3586
. : milestone, 3530,
profiling (2.005 ms) : 1985, 2025
. : milestone, 2005,
tracing (1.845 ms) : 1827, 1862
. : milestone, 1845,
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.60.0-SNAPSHOT~4098c5d13e, baseline=1.61.0-SNAPSHOT~92d7ae699c
dateFormat X
axisFormat %s
section baseline
no_agent (18.181 ms) : 17991, 18372
. : milestone, 18181,
appsec (18.244 ms) : 18057, 18431
. : milestone, 18244,
code_origins (17.815 ms) : 17637, 17992
. : milestone, 17815,
iast (17.652 ms) : 17477, 17827
. : milestone, 17652,
profiling (19.427 ms) : 19232, 19623
. : milestone, 19427,
tracing (17.817 ms) : 17641, 17994
. : milestone, 17817,
section candidate
no_agent (18.234 ms) : 18048, 18421
. : milestone, 18234,
appsec (18.702 ms) : 18516, 18887
. : milestone, 18702,
code_origins (18.914 ms) : 18723, 19106
. : milestone, 18914,
iast (17.78 ms) : 17603, 17958
. : milestone, 17780,
profiling (18.443 ms) : 18261, 18624
. : milestone, 18443,
tracing (17.874 ms) : 17698, 18050
. : milestone, 17874,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics. Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.60.0-SNAPSHOT~4098c5d13e, baseline=1.61.0-SNAPSHOT~92d7ae699c
dateFormat X
axisFormat %s
section baseline
no_agent (1.476 ms) : 1464, 1487
. : milestone, 1476,
appsec (3.804 ms) : 3581, 4026
. : milestone, 3804,
iast (2.264 ms) : 2194, 2333
. : milestone, 2264,
iast_GLOBAL (2.311 ms) : 2241, 2381
. : milestone, 2311,
profiling (2.11 ms) : 2053, 2168
. : milestone, 2110,
tracing (2.087 ms) : 2032, 2141
. : milestone, 2087,
section candidate
no_agent (1.477 ms) : 1465, 1488
. : milestone, 1477,
appsec (3.793 ms) : 3573, 4014
. : milestone, 3793,
iast (2.27 ms) : 2200, 2340
. : milestone, 2270,
iast_GLOBAL (2.301 ms) : 2231, 2371
. : milestone, 2301,
profiling (2.118 ms) : 2060, 2175
. : milestone, 2118,
tracing (2.087 ms) : 2033, 2141
. : milestone, 2087,
Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.60.0-SNAPSHOT~4098c5d13e, baseline=1.61.0-SNAPSHOT~92d7ae699c
dateFormat X
axisFormat %s
section baseline
no_agent (15.751 s) : 15751000, 15751000
. : milestone, 15751000,
appsec (14.998 s) : 14998000, 14998000
. : milestone, 14998000,
iast (18.048 s) : 18048000, 18048000
. : milestone, 18048000,
iast_GLOBAL (18.261 s) : 18261000, 18261000
. : milestone, 18261000,
profiling (14.926 s) : 14926000, 14926000
. : milestone, 14926000,
tracing (15.127 s) : 15127000, 15127000
. : milestone, 15127000,
section candidate
no_agent (14.954 s) : 14954000, 14954000
. : milestone, 14954000,
appsec (14.858 s) : 14858000, 14858000
. : milestone, 14858000,
iast (17.873 s) : 17873000, 17873000
. : milestone, 17873000,
iast_GLOBAL (17.549 s) : 17549000, 17549000
. : milestone, 17549000,
profiling (14.926 s) : 14926000, 14926000
. : milestone, 14926000,
tracing (15.24 s) : 15240000, 15240000
. : milestone, 15240000,
|
| public final class MetricKey { | ||
| static final DDCache<String, UTF8BytesString> RESOURCE_CACHE = DDCaches.newFixedSizeCache(32); | ||
| static final DDCache<String, UTF8BytesString> SERVICE_CACHE = DDCaches.newFixedSizeCache(8); | ||
| static final DDCache<String, UTF8BytesString> SERVICE_SOURCE_CACHE = DDCaches.newFixedSizeCache(4); |
There was a problem hiding this comment.
This should be already filled with UTF8BytesString. It can take values of the instrumentation name (that's should also be UTF8BytesString) so I suggest removing caching for this specific one. 4, is also too small
There was a problem hiding this comment.
The lookup logic accounts for receiving a UTF8BytesString and in that case bypasses the cache entirely.
Also, I erred on the side of keeping the caches small. I just want to eliminate most of the allocation - not all of the allocation.
That said, I may have misunderstood what "serviceSource" is so maybe that one should be larger. If it is instrumentations, then maybe we should increase it to 16. (The thinking being that only a few instrumentations are typically active in a given system.)
There was a problem hiding this comment.
If it can only be a UTF8BytesString, we could also tighten the type in the constructor. Although in that case, the JIT would dead code eliminate using the cache anyway (except for the initial allocation).
| static final DDCache<String, UTF8BytesString> TYPE_CACHE = DDCaches.newFixedSizeCache(8); | ||
| static final DDCache<String, UTF8BytesString> KIND_CACHE = DDCaches.newFixedSizeCache(8); | ||
| static final DDCache<String, UTF8BytesString> HTTP_METHOD_CACHE = DDCaches.newFixedSizeCache(8); | ||
| static final DDCache<String, UTF8BytesString> HTTP_ENDPOINT_CACHE = DDCaches.newFixedSizeCache(32); |
There was a problem hiding this comment.
Would it be possible to widen those a bit? Also, for the http endpoint, I'm wondering if we should just do that caching earlier to make other serialisation benefitting of this (i.e. in EndpointResolver)
There was a problem hiding this comment.
I erred on the side of making the caches small.
The thinking is any object reuse is an improvement over what we were doing, but consuming too much memory could be harmful.
| } | ||
|
|
||
| static UTF8BytesString utf8(DDCache<String, UTF8BytesString> cache, CharSequence charSeq) { | ||
| if ( charSeq == null ) { |
There was a problem hiding this comment.
There are things that are supposing to check nullity of this. Now if we replace with empty is not more the same semantic and this will break some code. I suggest to let the caller return the default value so it won't break the existing logic
There was a problem hiding this comment.
Nevermind - I see that I did accidentally change the semantics for method & endpoint. I'll fix that.
I kept the semantics that already existed. If you look at the prior code, null was replaced with EMPTY, so I did the same.
There was a problem hiding this comment.
well there are things that are OK to return empty but others needs to be literally null (i.e. service source, httpMethod, httpEndpoint)
There was a problem hiding this comment.
Yes, I realized after my initial reply. I'll double check all of them.
There was a problem hiding this comment.
Since the split of EMPTY vs null was about 50 / 50, I decided to just restore the null checks in the constructor.
- fix null handling for http method & endpoint - increase service source cache size
Since it about a 50/50 split between cases that use EMPTY vs null, when a null is passed I decided to just put the null handling back into the constructor. That makes the choice more explicit, and makes the PR easier to review / compare to the prior logic
| public final class MetricKey { | ||
| static final DDCache<String, UTF8BytesString> RESOURCE_CACHE = DDCaches.newFixedSizeCache(32); | ||
| static final DDCache<String, UTF8BytesString> SERVICE_CACHE = DDCaches.newFixedSizeCache(8); | ||
| static final DDCache<String, UTF8BytesString> SERVICE_SOURCE_CACHE = |
There was a problem hiding this comment.
That field does not need to be cached, for two main reasons:
- It is already either a
UTF8BytesStringornull. In fact, it comes from thecomponent()method, which in helpers typically returns aUTF8BytesString. - Its cardinality matches that of the instrumentations, the size of the cache is too small.
The constructor calling UTF8BytesString.create() may look confusing at first glance, but in this case it does not allocate a new object — it simply returns the existing instance. It was kept for consistency with the usual pattern.
Unless there are strong counterarguments, I would suggest removing the cache for this field.
There was a problem hiding this comment.
I think the key part here is "typically returns a UTF8BytesString".
The static typing doesn't currently guarantee it.
And the caching code accounts for the case of receiving a UTF8BytesString by bypassing the cache. In the same manner as UTF8BytesString.create did previously.
When a UTF8BytesString is passed, the cache incurs no overhead other than the static overhead of the cache array itself. But in the case where something other than UTF8BytesString is passed, the cache can significantly reduce memory consumption.
I think that's important because we need to curtail the worst case outcomes.
As for the size, the cardinality doesn't need to match total instrumentations. That would be too large. The size just needs to accommodate the active span producing instrumentations. As I said, I erred on the small side, since I want to avoid the worst case of consuming a lot of static memory.
And assuming a UTF8BytesString is introducing a subtle form of coupling that could easily be compromised by someone later on.
amarziali
left a comment
There was a problem hiding this comment.
Thanks for that improvement. wrt service name source, as per our chat, we might simplify trying changing the signature of that constructor if possible and shortcut the cache if we ensure that every value is already an UTF8BytesString itself.
|
/merge |
|
View all feedbacks in Devflow UI.
This pull request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings.
The expected merge time in
This PR is rejected because it was updated |
bric3
left a comment
There was a problem hiding this comment.
Thanks for the optimization.
I think the only nitpick is comment explaining the choice of the cache sizes.
|
/merge |
|
View all feedbacks in Devflow UI.
The expected merge time in
|
|
/merge |
|
View all feedbacks in Devflow UI.
PR already in the queue with status queued |
What Does This Do
Introduces DDCache-s around UTF8BytesString construction
Motivation
UTF8BytesString are advantageous for serialization, but that only applies to the MetricKey instance that is actually serialized
Most MetricKey instances are only created to do a look-up into the map. so the UTF8BytesString creation was just extra work
This change reduces allocation by 6% and GC time by 7% in span creation stress test.
This change reduces impact on application throughput by 5-20% depending on heap size.
Additional Notes
This change is intended as a quick fix. I think there's still a lot of room for improvement by restructuring the code further, but that is left for a future PR.
Contributor Checklist
type:and (comp:orinst:) labels in addition to any other useful labelsclose,fix, or any linking keywords when referencing an issueUse
solvesinstead, and assign the PR milestone to the issueJira ticket: [PROJ-IDENT]
Note: Once your PR is ready to merge, add it to the merge queue by commenting
/merge./merge -ccancels the queue request./merge -f --reason "reason"skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see this doc.