DSM overhead optimizations by kr-igor · Pull Request #8450 · DataDog/dd-trace-dotnet

kr-igor · 2026-04-13T20:33:59Z

DSM Per-Message Overhead Optimizations

Summary of changes

Edge-tag array caching: Introduced EdgeTagCache<TKey> and BacklogTagCache<TKey> — process-wide, per-type ConcurrentDictionary caches that intern edge-tag arrays and backlog-tag strings so they are only allocated once per unique key (topic/group/cluster combination).
Node-hash caching: Added a NodeHashCacheEntry/NodeHashSnapshot mechanism inside DataStreamsManager that memoizes the expensive CalculateNodeHash result per (edgeTags[], nodeHashBase) pair. Reads are lock-free via a volatile field; writes acquire a per-entry lock only on cache miss or base change.
Zero-allocation context encode/decode (net core 3.1+): Added PathwayContextEncoder.EncodeInto and a Span<byte>-based Decode overload; DataStreamsContextPropagator uses stackalloc buffers on .NET Core 3.1+ to avoid intermediate byte[] heap allocations on every produce/consume.
Reference-equality dictionary comparers: DataStreamsAggregator and DataStreamsManager._nodeHashCache now use reference-equality comparers backed by RuntimeHelpers.GetHashCode, which is safe because all keys are interned by the caches above.
Drain-signal instead of sleep: Replaced the 10 ms Thread.Sleep polling loop in DataStreamsWriter with a ManualResetEventSlim that wakes immediately when the queue reaches 1 000 items or after a 500 ms timeout, eliminating unnecessary context switches.
Integration-specific cache-key structs: Added readonly struct cache keys (ConsumeEdgeTagCacheKey, ProduceEdgeTagCacheKey, CommitBacklogTagCacheKey, ProduceBacklogTagCacheKey) for Kafka; equivalent structs for AWS SQS/SNS/Kinesis, Azure Service Bus, IBM MQ, and RabbitMQ.
Minor hot-path fix (Kafka): The Remove(TemporaryBase64PathwayContext) header scan is now skipped when KafkaCreateConsumerScopeEnabled=true (the default), avoiding an O(n) scan on every message.
LastConsumePathway guard removed: Dropped the redundant != null guard on the produce path that required an AsyncLocal read before the actual AsyncLocal read.

Reason for change

DSM instrumentation runs on the hot path of every instrumented message. Profiling revealed that the dominant allocations were:

A new string[] edge-tag array on every produce/consume call.
A CalculateNodeHash call (hashing over all edge tags) on every checkpoint.
Intermediate byte[] arrays for pathway context Base64 encoding/decoding.
Unnecessary CPU spin from a fixed 10 ms sleep between drain cycles.

These optimizations target p99 and throughput benchmarks for Kafka, SQS, SNS, RabbitMQ, IBM MQ, Azure Service Bus, and Kinesis instrumentation.

Implementation details

Caching strategy

EdgeTagCache<TKey> and BacklogTagCache<TKey> use the static-generic-class pattern (static class Foo<T> with a static field) to give each integration its own dictionary instance without any runtime dispatch. The key type is a readonly struct implementing IEquatable<TKey>, which prevents boxing in ConcurrentDictionary lookups.

The caches are bounded at MaxEdgeTagCacheSize = 1000 entries. Once that limit is reached, new keys are computed on the fly (no caching) to prevent unbounded memory growth from high-cardinality identifiers.

Node-hash caching

_nodeHashCache is keyed by string[] identity (not value equality) because the arrays themselves are interned by EdgeTagCache<TKey>. Each entry holds a volatile NodeHashSnapshot (nodeHashBase + NodeHash). On every checkpoint:

Look up the array reference — O(1) identity hash.
Read the volatile snapshot — lock-free.
If the base matches, return immediately.
Otherwise, acquire the per-entry lock, double-check, compute, and publish a new snapshot.

Zero-allocation encode/decode

PathwayContextEncoder.EncodeInto(PathwayContext, Span<byte>) writes directly into a caller-supplied buffer. DataStreamsContextPropagator stackallocs MaxEncodedSize (26 bytes) and MaxBase64EncodedSize (36 bytes) on the stack and uses Base64.EncodeToUtf8/DecodeFromUtf8 in-place. The only unavoidable allocation is the final ToArray() passed to headers.Add, because Kafka takes ownership of the byte array.

This path is guarded by #if NETCOREAPP3_1_OR_GREATER; .NET Framework falls back to the original heap-allocating path.

Drain signal

DataStreamsWriter previously slept 10 ms unconditionally between drain iterations, burning CPU and adding ~10 ms latency per batch even under load. The new ManualResetEventSlim is signalled immediately when either queue exceeds DrainThreshold (1 000 items), capping worst-case latency at DrainTimeoutMs (500 ms) while eliminating idle wakeups.

Test coverage

DataStreamsManagerTests: new unit tests verify that GetOrCreateEdgeTags and GetOrCreateBacklogTags return the same array/string reference on repeated calls with the same key, and distinct references for different keys. Tests cover Kafka produce/consume, RabbitMQ produce/consume, and generic key types.
PathwayContextEncoderTests: existing encode/decode round-trip tests pass against the new Span<byte> overloads.
All existing DSM tests continue to pass.

Other details

The MaxEdgeTagCacheSize constant is internal to allow unit tests to verify the overflow/bypass behavior.
No public API surface changes; all new types are internal.
.NET Framework code paths are unchanged — all Span-based optimizations are gated behind #if NETCOREAPP3_1_OR_GREATER.

dd-trace-dotnet-ci-bot · 2026-04-13T21:57:27Z

Execution-Time Benchmarks Report ⏱️

Execution-time results for samples comparing This PR (8450) and master.

✅ No regressions detected - check the details below

Full Metrics Comparison

FakeDbCommand

Metric	Master (Mean ± 95% CI)	Current (Mean ± 95% CI)	Change	Status
.NET Framework 4.8 - Baseline
duration	74.35 ± (74.24 - 74.84) ms	73.18 ± (73.21 - 73.62) ms	-1.6%	✅
.NET Framework 4.8 - Bailout
duration	77.21 ± (77.08 - 77.49) ms	78.05 ± (77.87 - 78.24) ms	+1.1%	✅⬆️
.NET Framework 4.8 - CallTarget+Inlining+NGEN
duration	1079.83 ± (1078.14 - 1085.50) ms	1081.48 ± (1082.51 - 1088.51) ms	+0.2%	✅⬆️
.NET Core 3.1 - Baseline
process.internal_duration_ms	22.75 ± (22.69 - 22.80) ms	22.53 ± (22.50 - 22.57) ms	-0.9%	✅
process.time_to_main_ms	86.09 ± (85.80 - 86.38) ms	85.60 ± (85.40 - 85.79) ms	-0.6%	✅
runtime.dotnet.exceptions.count	0 ± (0 - 0)	0 ± (0 - 0)	+0.0%	✅
runtime.dotnet.mem.committed	10.86 ± (10.85 - 10.86) MB	10.92 ± (10.92 - 10.93) MB	+0.6%	✅⬆️
runtime.dotnet.threads.count	12 ± (12 - 12)	12 ± (12 - 12)	+0.0%	✅
.NET Core 3.1 - Bailout
process.internal_duration_ms	22.56 ± (22.53 - 22.60) ms	22.97 ± (22.91 - 23.03) ms	+1.8%	✅⬆️
process.time_to_main_ms	86.53 ± (86.29 - 86.78) ms	89.34 ± (89.01 - 89.68) ms	+3.2%	✅⬆️
runtime.dotnet.exceptions.count	0 ± (0 - 0)	0 ± (0 - 0)	+0.0%	✅
runtime.dotnet.mem.committed	10.93 ± (10.92 - 10.93) MB	10.96 ± (10.95 - 10.96) MB	+0.3%	✅⬆️
runtime.dotnet.threads.count	13 ± (13 - 13)	13 ± (13 - 13)	+0.0%	✅
.NET Core 3.1 - CallTarget+Inlining+NGEN
process.internal_duration_ms	212.21 ± (211.38 - 213.03) ms	209.99 ± (209.10 - 210.87) ms	-1.0%	✅
process.time_to_main_ms	530.48 ± (529.13 - 531.83) ms	530.35 ± (529.02 - 531.67) ms	-0.0%	✅
runtime.dotnet.exceptions.count	0 ± (0 - 0)	0 ± (0 - 0)	+0.0%	✅
runtime.dotnet.mem.committed	47.77 ± (47.74 - 47.80) MB	48.01 ± (47.98 - 48.04) MB	+0.5%	✅⬆️
runtime.dotnet.threads.count	28 ± (28 - 28)	28 ± (28 - 28)	+1.3%	✅⬆️
.NET 6 - Baseline
process.internal_duration_ms	21.39 ± (21.34 - 21.44) ms	21.59 ± (21.54 - 21.65) ms	+1.0%	✅⬆️
process.time_to_main_ms	73.99 ± (73.78 - 74.20) ms	75.61 ± (75.35 - 75.86) ms	+2.2%	✅⬆️
runtime.dotnet.exceptions.count	0 ± (0 - 0)	0 ± (0 - 0)	+0.0%	✅
runtime.dotnet.mem.committed	10.61 ± (10.61 - 10.61) MB	10.65 ± (10.64 - 10.65) MB	+0.3%	✅⬆️
runtime.dotnet.threads.count	10 ± (10 - 10)	10 ± (10 - 10)	+0.0%	✅
.NET 6 - Bailout
process.internal_duration_ms	21.15 ± (21.12 - 21.19) ms	21.23 ± (21.20 - 21.27) ms	+0.4%	✅⬆️
process.time_to_main_ms	74.79 ± (74.62 - 74.96) ms	74.46 ± (74.30 - 74.61) ms	-0.4%	✅
runtime.dotnet.exceptions.count	0 ± (0 - 0)	0 ± (0 - 0)	+0.0%	✅
runtime.dotnet.mem.committed	10.72 ± (10.72 - 10.72) MB	10.76 ± (10.76 - 10.77) MB	+0.4%	✅⬆️
runtime.dotnet.threads.count	11 ± (11 - 11)	11 ± (11 - 11)	+0.0%	✅
.NET 6 - CallTarget+Inlining+NGEN
process.internal_duration_ms	380.63 ± (378.38 - 382.88) ms	383.57 ± (381.69 - 385.45) ms	+0.8%	✅⬆️
process.time_to_main_ms	530.27 ± (528.93 - 531.61) ms	533.99 ± (532.61 - 535.36) ms	+0.7%	✅⬆️
runtime.dotnet.exceptions.count	0 ± (0 - 0)	0 ± (0 - 0)	+0.0%	✅
runtime.dotnet.mem.committed	49.18 ± (49.16 - 49.21) MB	49.34 ± (49.32 - 49.37) MB	+0.3%	✅⬆️
runtime.dotnet.threads.count	28 ± (28 - 28)	28 ± (28 - 28)	+0.0%	✅⬆️
.NET 8 - Baseline
process.internal_duration_ms	19.99 ± (19.93 - 20.06) ms	19.87 ± (19.82 - 19.93) ms	-0.6%	✅
process.time_to_main_ms	76.48 ± (76.18 - 76.78) ms	75.45 ± (75.13 - 75.77) ms	-1.3%	✅
runtime.dotnet.exceptions.count	0 ± (0 - 0)	0 ± (0 - 0)	+0.0%	✅
runtime.dotnet.mem.committed	7.67 ± (7.66 - 7.67) MB	7.68 ± (7.67 - 7.68) MB	+0.1%	✅⬆️
runtime.dotnet.threads.count	10 ± (10 - 10)	10 ± (10 - 10)	+0.0%	✅
.NET 8 - Bailout
process.internal_duration_ms	19.55 ± (19.51 - 19.60) ms	20.03 ± (19.97 - 20.09) ms	+2.4%	✅⬆️
process.time_to_main_ms	74.84 ± (74.65 - 75.03) ms	76.80 ± (76.51 - 77.09) ms	+2.6%	✅⬆️
runtime.dotnet.exceptions.count	0 ± (0 - 0)	0 ± (0 - 0)	+0.0%	✅
runtime.dotnet.mem.committed	7.73 ± (7.72 - 7.73) MB	7.73 ± (7.72 - 7.73) MB	-0.0%	✅
runtime.dotnet.threads.count	11 ± (11 - 11)	11 ± (11 - 11)	+0.0%	✅
.NET 8 - CallTarget+Inlining+NGEN
process.internal_duration_ms	306.14 ± (303.84 - 308.43) ms	298.85 ± (296.63 - 301.07) ms	-2.4%	✅
process.time_to_main_ms	491.73 ± (490.45 - 493.02) ms	491.98 ± (490.84 - 493.12) ms	+0.0%	✅⬆️
runtime.dotnet.exceptions.count	0 ± (0 - 0)	0 ± (0 - 0)	+0.0%	✅
runtime.dotnet.mem.committed	36.57 ± (36.52 - 36.61) MB	36.49 ± (36.46 - 36.51) MB	-0.2%	✅
runtime.dotnet.threads.count	27 ± (27 - 27)	27 ± (27 - 27)	+0.1%	✅⬆️

HttpMessageHandler

Metric	Master (Mean ± 95% CI)	Current (Mean ± 95% CI)	Change	Status
.NET Framework 4.8 - Baseline
duration	204.44 ± (204.75 - 206.22) ms	204.80 ± (205.04 - 206.43) ms	+0.2%	✅⬆️
.NET Framework 4.8 - Bailout
duration	209.08 ± (209.17 - 210.46) ms	210.08 ± (210.40 - 211.93) ms	+0.5%	✅⬆️
.NET Framework 4.8 - CallTarget+Inlining+NGEN
duration	1210.85 ± (1210.43 - 1218.54) ms	1219.33 ± (1219.31 - 1227.78) ms	+0.7%	✅⬆️
.NET Core 3.1 - Baseline
process.internal_duration_ms	199.19 ± (198.54 - 199.84) ms	201.81 ± (200.98 - 202.63) ms	+1.3%	✅⬆️
process.time_to_main_ms	86.47 ± (86.16 - 86.79) ms	87.54 ± (87.12 - 87.97) ms	+1.2%	✅⬆️
runtime.dotnet.exceptions.count	3 ± (3 - 3)	3 ± (3 - 3)	+0.0%	✅
runtime.dotnet.mem.committed	16.03 ± (16.01 - 16.05) MB	15.95 ± (15.93 - 15.97) MB	-0.5%	✅
runtime.dotnet.threads.count	20 ± (20 - 20)	20 ± (20 - 20)	-1.1%	✅
.NET Core 3.1 - Bailout
process.internal_duration_ms	198.18 ± (197.55 - 198.81) ms	199.73 ± (198.98 - 200.49) ms	+0.8%	✅⬆️
process.time_to_main_ms	87.67 ± (87.38 - 87.96) ms	88.52 ± (88.14 - 88.90) ms	+1.0%	✅⬆️
runtime.dotnet.exceptions.count	3 ± (3 - 3)	3 ± (3 - 3)	+0.0%	✅
runtime.dotnet.mem.committed	16.02 ± (16.00 - 16.04) MB	15.99 ± (15.97 - 16.01) MB	-0.2%	✅
runtime.dotnet.threads.count	21 ± (21 - 21)	21 ± (21 - 21)	+0.9%	✅⬆️
.NET Core 3.1 - CallTarget+Inlining+NGEN
process.internal_duration_ms	393.95 ± (392.42 - 395.49) ms	395.44 ± (393.65 - 397.23) ms	+0.4%	✅⬆️
process.time_to_main_ms	534.96 ± (533.63 - 536.29) ms	543.85 ± (542.13 - 545.56) ms	+1.7%	✅⬆️
runtime.dotnet.exceptions.count	3 ± (3 - 3)	3 ± (3 - 3)	+0.0%	✅
runtime.dotnet.mem.committed	58.06 ± (57.92 - 58.19) MB	58.23 ± (58.07 - 58.38) MB	+0.3%	✅⬆️
runtime.dotnet.threads.count	30 ± (30 - 30)	30 ± (30 - 30)	+0.0%	✅⬆️
.NET 6 - Baseline
process.internal_duration_ms	204.07 ± (203.42 - 204.73) ms	205.25 ± (204.54 - 205.96) ms	+0.6%	✅⬆️
process.time_to_main_ms	75.19 ± (74.95 - 75.42) ms	76.07 ± (75.70 - 76.43) ms	+1.2%	✅⬆️
runtime.dotnet.exceptions.count	4 ± (4 - 4)	4 ± (4 - 4)	+0.0%	✅
runtime.dotnet.mem.committed	16.23 ± (16.22 - 16.25) MB	16.27 ± (16.25 - 16.28) MB	+0.2%	✅⬆️
runtime.dotnet.threads.count	19 ± (19 - 19)	19 ± (19 - 19)	-0.2%	✅
.NET 6 - Bailout
process.internal_duration_ms	207.22 ± (206.23 - 208.20) ms	204.16 ± (203.53 - 204.79) ms	-1.5%	✅
process.time_to_main_ms	77.89 ± (77.43 - 78.36) ms	77.13 ± (76.86 - 77.40) ms	-1.0%	✅
runtime.dotnet.exceptions.count	4 ± (4 - 4)	4 ± (4 - 4)	+0.0%	✅
runtime.dotnet.mem.committed	16.21 ± (16.19 - 16.23) MB	16.28 ± (16.26 - 16.30) MB	+0.4%	✅⬆️
runtime.dotnet.threads.count	20 ± (20 - 21)	20 ± (20 - 20)	-1.4%	✅
.NET 6 - CallTarget+Inlining+NGEN
process.internal_duration_ms	598.17 ± (595.44 - 600.91) ms	596.39 ± (593.82 - 598.96) ms	-0.3%	✅
process.time_to_main_ms	539.07 ± (537.86 - 540.28) ms	542.03 ± (540.73 - 543.33) ms	+0.5%	✅⬆️
runtime.dotnet.exceptions.count	4 ± (4 - 4)	4 ± (4 - 4)	+0.0%	✅
runtime.dotnet.mem.committed	60.88 ± (60.78 - 60.98) MB	61.03 ± (60.93 - 61.13) MB	+0.2%	✅⬆️
runtime.dotnet.threads.count	31 ± (31 - 31)	31 ± (31 - 31)	-0.8%	✅
.NET 8 - Baseline
process.internal_duration_ms	204.04 ± (203.20 - 204.89) ms	202.59 ± (201.77 - 203.41) ms	-0.7%	✅
process.time_to_main_ms	74.31 ± (74.04 - 74.59) ms	74.76 ± (74.41 - 75.11) ms	+0.6%	✅⬆️
runtime.dotnet.exceptions.count	4 ± (4 - 4)	4 ± (4 - 4)	+0.0%	✅
runtime.dotnet.mem.committed	11.55 ± (11.54 - 11.57) MB	11.57 ± (11.55 - 11.58) MB	+0.2%	✅⬆️
runtime.dotnet.threads.count	19 ± (19 - 19)	19 ± (19 - 19)	-0.1%	✅
.NET 8 - Bailout
process.internal_duration_ms	202.12 ± (201.46 - 202.77) ms	202.52 ± (201.73 - 203.31) ms	+0.2%	✅⬆️
process.time_to_main_ms	75.67 ± (75.42 - 75.91) ms	76.29 ± (75.99 - 76.60) ms	+0.8%	✅⬆️
runtime.dotnet.exceptions.count	4 ± (4 - 4)	4 ± (4 - 4)	+0.0%	✅
runtime.dotnet.mem.committed	11.61 ± (11.60 - 11.62) MB	11.65 ± (11.63 - 11.66) MB	+0.3%	✅⬆️
runtime.dotnet.threads.count	20 ± (20 - 20)	20 ± (20 - 20)	+0.1%	✅⬆️
.NET 8 - CallTarget+Inlining+NGEN
process.internal_duration_ms	538.18 ± (531.82 - 544.54) ms	535.27 ± (529.16 - 541.39) ms	-0.5%	✅
process.time_to_main_ms	497.77 ± (496.58 - 498.97) ms	501.64 ± (500.26 - 503.03) ms	+0.8%	✅⬆️
runtime.dotnet.exceptions.count	4 ± (4 - 4)	4 ± (4 - 4)	+0.0%	✅
runtime.dotnet.mem.committed	50.45 ± (50.37 - 50.52) MB	50.28 ± (50.20 - 50.35) MB	-0.3%	✅
runtime.dotnet.threads.count	30 ± (30 - 30)	30 ± (30 - 30)	+0.1%	✅⬆️

Comparison explanation

Execution-time benchmarks measure the whole time it takes to execute a program, and are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are highlighted in **red**. The following thresholds were used for comparing the execution times:

Welch test with statistical test for significance of 5%
Only results indicating a difference greater than 5% and 5 ms are considered.

Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard.

Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph).

Duration charts

FakeDbCommand (.NET Framework 4.8)

gantt
    title Execution time (ms) FakeDbCommand (.NET Framework 4.8)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8450) - mean (73ms)  : 70, 77
    master - mean (75ms)  : 70, 79

    section Bailout
    This PR (8450) - mean (78ms)  : 76, 80
    master - mean (77ms)  : 75, 80

    section CallTarget+Inlining+NGEN
    This PR (8450) - mean (1,086ms)  : 1042, 1129
    master - mean (1,082ms)  : 1029, 1135

FakeDbCommand (.NET Core 3.1)

gantt
    title Execution time (ms) FakeDbCommand (.NET Core 3.1)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8450) - mean (115ms)  : 111, 120
    master - mean (116ms)  : 111, 121

    section Bailout
    This PR (8450) - mean (120ms)  : 113, 127
    master - mean (116ms)  : 113, 119

    section CallTarget+Inlining+NGEN
    This PR (8450) - mean (777ms)  : 749, 804
    master - mean (782ms)  : 748, 816

FakeDbCommand (.NET 6)

gantt
    title Execution time (ms) FakeDbCommand (.NET 6)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8450) - mean (104ms)  : 98, 110
    master - mean (102ms)  : 97, 106

    section Bailout
    This PR (8450) - mean (102ms)  : 100, 105
    master - mean (102ms)  : 100, 105

    section CallTarget+Inlining+NGEN
    This PR (8450) - mean (946ms)  : 910, 982
    master - mean (941ms)  : 903, 979

FakeDbCommand (.NET 8)

gantt
    title Execution time (ms) FakeDbCommand (.NET 8)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8450) - mean (104ms)  : 98, 109
    master - mean (105ms)  : 100, 110

    section Bailout
    This PR (8450) - mean (105ms)  : 100, 111
    master - mean (102ms)  : 98, 106

    section CallTarget+Inlining+NGEN
    This PR (8450) - mean (822ms)  : 788, 856
    master - mean (829ms)  : 789, 870

HttpMessageHandler (.NET Framework 4.8)

gantt
    title Execution time (ms) HttpMessageHandler (.NET Framework 4.8)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8450) - mean (206ms)  : 196, 216
    master - mean (205ms)  : 194, 217

    section Bailout
    This PR (8450) - mean (211ms)  : 199, 223
    master - mean (210ms)  : 201, 219

    section CallTarget+Inlining+NGEN
    This PR (8450) - mean (1,224ms)  : 1163, 1284
    master - mean (1,214ms)  : 1156, 1273

HttpMessageHandler (.NET Core 3.1)

gantt
    title Execution time (ms) HttpMessageHandler (.NET Core 3.1)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8450) - mean (301ms)  : 280, 322
    master - mean (296ms)  : 281, 311

    section Bailout
    This PR (8450) - mean (298ms)  : 283, 314
    master - mean (297ms)  : 280, 315

    section CallTarget+Inlining+NGEN
    This PR (8450) - mean (981ms)  : 952, 1010
    master - mean (967ms)  : 934, 999

HttpMessageHandler (.NET 6)

gantt
    title Execution time (ms) HttpMessageHandler (.NET 6)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8450) - mean (292ms)  : 276, 307
    master - mean (289ms)  : 275, 304

    section Bailout
    This PR (8450) - mean (291ms)  : 278, 304
    master - mean (295ms)  : 275, 314

    section CallTarget+Inlining+NGEN
    This PR (8450) - mean (1,169ms)  : 1125, 1214
    master - mean (1,165ms)  : 1119, 1210

HttpMessageHandler (.NET 8)

gantt
    title Execution time (ms) HttpMessageHandler (.NET 8)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8450) - mean (290ms)  : 270, 311
    master - mean (291ms)  : 268, 314

    section Bailout
    This PR (8450) - mean (292ms)  : 271, 312
    master - mean (289ms)  : 275, 304

    section CallTarget+Inlining+NGEN
    This PR (8450) - mean (1,069ms)  : 984, 1154
    master - mean (1,071ms)  : 978, 1164

pr-commenter · 2026-04-14T15:32:25Z

Benchmarks

Benchmark execution time: 2026-04-22 20:38:55

Comparing candidate commit 4082bfd in PR branch kr-igor/dsm-overhead-optimizations with baseline commit 34902b0 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 27 metrics, 0 unstable metrics, 62 known flaky benchmarks, 25 flaky benchmarks without significant changes.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

🟩 = significantly better candidate vs. baseline
🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

Known flaky benchmarks

These benchmarks are marked as flaky and will not trigger a failure. Modify FLAKY_BENCHMARKS_REGEX to control which benchmarks are marked as flaky.

scenario:Benchmarks.Trace.ActivityBenchmark.StartStopWithChild net6.0

🟩 throughput [+6415.516op/s; +9709.640op/s] or [+5.392%; +8.161%]

scenario:Benchmarks.Trace.AgentWriterBenchmark.WriteAndFlushEnrichedTraces net472

🟥 execution_time [+308.224ms; +310.877ms] or [+152.952%; +154.268%]

scenario:Benchmarks.Trace.AgentWriterBenchmark.WriteAndFlushEnrichedTraces net6.0

🟥 execution_time [+382.729ms; +385.286ms] or [+302.379%; +304.400%]

scenario:Benchmarks.Trace.AgentWriterBenchmark.WriteAndFlushEnrichedTraces netcoreapp3.1

🟥 execution_time [+400.366ms; +401.520ms] or [+354.309%; +355.330%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleMoreComplexBody net472

🟥 allocated_mem [+1.308KB; +1.308KB] or [+27.529%; +27.541%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleMoreComplexBody net6.0

🟥 allocated_mem [+471 bytes; +472 bytes] or [+9.977%; +9.987%]
🟩 execution_time [-15.590ms; -11.407ms] or [-7.281%; -5.327%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleMoreComplexBody netcoreapp3.1

🟥 allocated_mem [+1.272KB; +1.272KB] or [+27.502%; +27.510%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleSimpleBody net472

🟥 allocated_mem [+1.307KB; +1.307KB] or [+105.746%; +105.759%]
🟥 throughput [-258788.703op/s; -254546.446op/s] or [-26.424%; -25.990%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleSimpleBody net6.0

🟥 allocated_mem [+471 bytes; +472 bytes] or [+38.558%; +38.566%]
🟩 execution_time [-27.062ms; -22.196ms] or [-12.069%; -9.899%]
🟥 throughput [-70496.047op/s; -47876.618op/s] or [-7.531%; -5.115%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleSimpleBody netcoreapp3.1

🟥 allocated_mem [+1.272KB; +1.272KB] or [+105.292%; +105.304%]
🟥 throughput [-132372.763op/s; -116285.250op/s] or [-19.019%; -16.708%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorMoreComplexBody net6.0

🟩 throughput [+8504.347op/s; +11583.878op/s] or [+5.411%; +7.371%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorSimpleBody net6.0

🟩 throughput [+349712.510op/s; +386362.031op/s] or [+11.661%; +12.883%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorSimpleBody netcoreapp3.1

🟩 execution_time [-19.437ms; -15.062ms] or [-8.960%; -6.943%]
🟩 throughput [+135834.090op/s; +190225.651op/s] or [+5.392%; +7.551%]

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeArgs net472

🟥 execution_time [+300.060ms; +300.589ms] or [+149.930%; +150.194%]

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeArgs net6.0

🟥 execution_time [+299.933ms; +303.130ms] or [+151.257%; +152.869%]

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeArgs netcoreapp3.1

🟥 execution_time [+299.894ms; +302.406ms] or [+151.063%; +152.329%]

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeLegacyArgs net472

🟥 execution_time [+297.285ms; +297.807ms] or [+146.015%; +146.271%]

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeLegacyArgs net6.0

🟥 execution_time [+298.265ms; +301.252ms] or [+145.811%; +147.271%]

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeLegacyArgs netcoreapp3.1

🟥 execution_time [+301.405ms; +302.760ms] or [+150.642%; +151.319%]

scenario:Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafRealisticBenchmark net6.0

🟥 throughput [-236.171op/s; -115.639op/s] or [-10.268%; -5.028%]

scenario:Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafRealisticBenchmarkWithAttack net6.0

🟥 execution_time [+28.235µs; +52.840µs] or [+9.014%; +16.869%]
🟥 throughput [-476.826op/s; -273.845op/s] or [-14.864%; -8.537%]

scenario:Benchmarks.Trace.AspNetCoreBenchmark.SendRequest net472

🟥 execution_time [+299.562ms; +300.248ms] or [+149.512%; +149.855%]

scenario:Benchmarks.Trace.AspNetCoreBenchmark.SendRequest net6.0

unstable execution_time [+368.229ms; +403.553ms] or [+400.096%; +438.477%]
🟩 throughput [+1038.936op/s; +1177.457op/s] or [+8.537%; +9.675%]

scenario:Benchmarks.Trace.AspNetCoreBenchmark.SendRequest netcoreapp3.1

unstable execution_time [+261.047ms; +304.660ms] or [+198.211%; +231.326%]
🟩 throughput [+649.797op/s; +870.338op/s] or [+6.290%; +8.425%]

scenario:Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces net472

unstable execution_time [+243.071ms; +307.717ms] or [+111.762%; +141.485%]
🟥 throughput [-530.937op/s; -459.353op/s] or [-48.108%; -41.622%]

scenario:Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces net6.0

unstable execution_time [+202.179ms; +335.387ms] or [+86.160%; +142.928%]
🟥 throughput [-746.355op/s; -662.904op/s] or [-49.782%; -44.216%]

scenario:Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces netcoreapp3.1

🟥 execution_time [+334.607ms; +341.165ms] or [+200.134%; +204.056%]
🟥 throughput [-398.241op/s; -363.449op/s] or [-27.729%; -25.306%]

scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSliceWithPool netcoreapp3.1

unstable throughput [+3.569op/s; +60.368op/s] or [+0.666%; +11.268%]

scenario:Benchmarks.Trace.CharSliceBenchmark.OriginalCharSlice net6.0

🟩 execution_time [-176.605µs; -137.310µs] or [-8.946%; -6.956%]
🟩 throughput [+39.991op/s; +50.401op/s] or [+7.895%; +9.950%]

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearch net472

🟥 execution_time [+301.808ms; +303.119ms] or [+151.985%; +152.645%]

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearch net6.0

🟥 execution_time [+301.823ms; +303.289ms] or [+151.244%; +151.979%]

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearch netcoreapp3.1

🟥 execution_time [+301.672ms; +304.920ms] or [+151.547%; +153.179%]

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearchAsync net472

🟥 execution_time [+302.638ms; +303.859ms] or [+151.975%; +152.588%]
🟩 throughput [+15387.380op/s; +17191.632op/s] or [+5.155%; +5.759%]

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearchAsync net6.0

🟥 execution_time [+298.610ms; +300.553ms] or [+147.649%; +148.610%]

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearchAsync netcoreapp3.1

🟥 execution_time [+304.477ms; +308.022ms] or [+154.322%; +156.119%]

scenario:Benchmarks.Trace.GraphQLBenchmark.ExecuteAsync net472

🟥 execution_time [+301.589ms; +303.000ms] or [+151.370%; +152.079%]

scenario:Benchmarks.Trace.GraphQLBenchmark.ExecuteAsync net6.0

🟥 execution_time [+297.601ms; +299.352ms] or [+148.327%; +149.199%]
🟩 throughput [+44430.499op/s; +49707.712op/s] or [+8.822%; +9.870%]

scenario:Benchmarks.Trace.GraphQLBenchmark.ExecuteAsync netcoreapp3.1

🟥 execution_time [+300.821ms; +303.216ms] or [+149.656%; +150.847%]

scenario:Benchmarks.Trace.ILoggerBenchmark.EnrichedLog net6.0

🟩 execution_time [-16.021ms; -11.510ms] or [-7.450%; -5.352%]

scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatAspectBenchmark net472

unstable execution_time [+5.200µs; +46.487µs] or [+1.284%; +11.483%]

scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatAspectBenchmark net6.0

🟩 allocated_mem [-25.419KB; -25.398KB] or [-9.272%; -9.265%]
unstable execution_time [-52.289µs; -0.114µs] or [-10.335%; -0.023%]

scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatAspectBenchmark netcoreapp3.1

🟩 allocated_mem [-26.789KB; -26.772KB] or [-9.766%; -9.760%]
unstable execution_time [-14.666µs; +100.199µs] or [-2.542%; +17.364%]
unstable throughput [-80.671op/s; +125.316op/s] or [-4.609%; +7.159%]

scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatBenchmark net6.0

🟥 execution_time [+5.785µs; +9.566µs] or [+13.673%; +22.611%]
🟥 throughput [-4485.702op/s; -2759.521op/s] or [-18.883%; -11.617%]

scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatBenchmark netcoreapp3.1

unstable execution_time [-14.053µs; -6.935µs] or [-21.803%; -10.760%]
🟩 throughput [+1760.497op/s; +3251.399op/s] or [+10.801%; +19.948%]

scenario:Benchmarks.Trace.Log4netBenchmark.EnrichedLog net472

🟥 execution_time [+302.334ms; +303.717ms] or [+152.816%; +153.515%]

scenario:Benchmarks.Trace.Log4netBenchmark.EnrichedLog net6.0

🟥 execution_time [+303.150ms; +305.171ms] or [+154.302%; +155.331%]

scenario:Benchmarks.Trace.Log4netBenchmark.EnrichedLog netcoreapp3.1

🟥 execution_time [+300.996ms; +303.703ms] or [+150.686%; +152.041%]

scenario:Benchmarks.Trace.RedisBenchmark.SendReceive net6.0

🟩 throughput [+36163.352op/s; +40473.897op/s] or [+6.845%; +7.661%]

scenario:Benchmarks.Trace.SerilogBenchmark.EnrichedLog net472

🟥 execution_time [+302.431ms; +304.290ms] or [+150.734%; +151.661%]

scenario:Benchmarks.Trace.SerilogBenchmark.EnrichedLog net6.0

🟥 execution_time [+301.394ms; +302.353ms] or [+151.346%; +151.827%]

scenario:Benchmarks.Trace.SerilogBenchmark.EnrichedLog netcoreapp3.1

🟥 execution_time [+303.076ms; +305.367ms] or [+153.700%; +154.863%]

scenario:Benchmarks.Trace.SingleSpanAspNetCoreBenchmark.SingleSpanAspNetCore net472

🟥 execution_time [+300.201ms; +300.910ms] or [+149.742%; +150.096%]
🟩 throughput [+61159705.532op/s; +61429752.049op/s] or [+44.540%; +44.737%]

scenario:Benchmarks.Trace.SingleSpanAspNetCoreBenchmark.SingleSpanAspNetCore net6.0

unstable execution_time [+345.819ms; +384.324ms] or [+430.087%; +477.976%]
🟩 throughput [+1039.204op/s; +1233.252op/s] or [+8.034%; +9.534%]

scenario:Benchmarks.Trace.SingleSpanAspNetCoreBenchmark.SingleSpanAspNetCore netcoreapp3.1

🟥 execution_time [+299.783ms; +300.711ms] or [+149.525%; +149.988%]

scenario:Benchmarks.Trace.SpanBenchmark.StartFinishScope net6.0

🟩 throughput [+90379.979op/s; +100025.660op/s] or [+8.438%; +9.339%]

scenario:Benchmarks.Trace.SpanBenchmark.StartFinishScope netcoreapp3.1

🟩 throughput [+60816.554op/s; +80111.002op/s] or [+7.039%; +9.273%]

scenario:Benchmarks.Trace.SpanBenchmark.StartFinishSpan net6.0

🟩 throughput [+88146.714op/s; +118898.786op/s] or [+6.823%; +9.203%]

scenario:Benchmarks.Trace.SpanBenchmark.StartFinishSpan netcoreapp3.1

🟩 throughput [+90172.502op/s; +98764.280op/s] or [+8.956%; +9.809%]

scenario:Benchmarks.Trace.SpanBenchmark.StartFinishTwoScopes net6.0

🟩 throughput [+43919.825op/s; +52911.054op/s] or [+7.975%; +9.608%]

scenario:Benchmarks.Trace.SpanBenchmark.StartFinishTwoScopes netcoreapp3.1

🟩 throughput [+27138.070op/s; +36843.295op/s] or [+6.074%; +8.247%]

scenario:Benchmarks.Trace.TraceAnnotationsBenchmark.RunOnMethodBegin net472

🟥 throughput [-44092.977op/s; -37665.414op/s] or [-6.453%; -5.513%]

scenario:Benchmarks.Trace.TraceAnnotationsBenchmark.RunOnMethodBegin net6.0

🟩 throughput [+81502.056op/s; +98697.489op/s] or [+9.106%; +11.027%]

Known flaky benchmarks without significant changes:

scenario:Benchmarks.Trace.ActivityBenchmark.StartStopWithChild net472
scenario:Benchmarks.Trace.ActivityBenchmark.StartStopWithChild netcoreapp3.1
scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorMoreComplexBody net472
scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorMoreComplexBody netcoreapp3.1
scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorSimpleBody net472
scenario:Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafRealisticBenchmark net472
scenario:Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafRealisticBenchmark netcoreapp3.1
scenario:Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafRealisticBenchmarkWithAttack net472
scenario:Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafRealisticBenchmarkWithAttack netcoreapp3.1
scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSlice net472
scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSlice net6.0
scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSlice netcoreapp3.1
scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSliceWithPool net472
scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSliceWithPool net6.0
scenario:Benchmarks.Trace.CharSliceBenchmark.OriginalCharSlice net472
scenario:Benchmarks.Trace.CharSliceBenchmark.OriginalCharSlice netcoreapp3.1
scenario:Benchmarks.Trace.ILoggerBenchmark.EnrichedLog net472
scenario:Benchmarks.Trace.ILoggerBenchmark.EnrichedLog netcoreapp3.1
scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatBenchmark net472
scenario:Benchmarks.Trace.RedisBenchmark.SendReceive net472
scenario:Benchmarks.Trace.RedisBenchmark.SendReceive netcoreapp3.1
scenario:Benchmarks.Trace.SpanBenchmark.StartFinishScope net472
scenario:Benchmarks.Trace.SpanBenchmark.StartFinishSpan net472
scenario:Benchmarks.Trace.SpanBenchmark.StartFinishTwoScopes net472
scenario:Benchmarks.Trace.TraceAnnotationsBenchmark.RunOnMethodBegin netcoreapp3.1

andrewlock · 2026-04-20T18:46:13Z

+            return factory(key);
+        }
+
+        return cache.GetOrAdd(key, factory);


I think we could/should optimize this pattern. Currently:

cache.Count is expensive - it takes a full lock on the dictionary internally

cache.TryGetValue followed by cache.GetOrAdd(key, factory); is two lookups on failures

Given we don't need exactly 1000 items (and AFAICT, we never remove items) I think you could optimize this by moving the method call to the TagCache type directly, and storing an additional count locally there, and using that to avoid the full lock, roughly MaxEdgeTagCacheSize items

private int _edgeCacheCount = 0; private EdgeCache _cache; // hand waving the generic issues public string[] GetOrCreateEdgeTags<TKey>(TKey key, Func<TKey, string[]> factory) where TKey : notnull, IEquatable<TKey> { if (cache.TryGetValue(key, out var existing)) { return existing; } if (Volatile.Read(ref _edgeCacheCount) < MaxEdgeTagCacheSize) { // High-cardinality key space — bypass cache to prevent unbounded memory growth return factory(key); } Interlocked.Increment(ref _editCacheCount); return cache.GetOrAdd(key, factory); }

We still have the two lookups on cache exceeded, but we lose the expensive Count call at least

andrewlock · 2026-04-20T18:48:25Z

        return new PathwayContext(new PathwayHash(hash), pathwayStartNs, edgeStartNs);
    }

+#if NETCOREAPP3_1_OR_GREATER


After we merge this: #8476 we can open this up more broadly, and make it the only implementation 🙂

andrewlock

Thanks for this! I think it looks like a good plan overall, there's just the conversion to readonly record struct to simplify things, and the question about whether we can optimize the failure cases to avoid calling the expensive ConcurrentDictionary.Count property I think

kr-igor · 2026-04-21T19:23:35Z

Pushed fixes for all comments

andrewlock

LGTM, just one last cleanup we can do (reference equality is default, we don't need a custom comparer)

Just to sense check the limits, we have:

~10 different cache key types currently
each array prob being ballpark ~200 bytes (depends on dynamic data, so hard to say)
We cache up to ~1000 distinct arrays

So this could raise the "static" memory usage by ~2MB (10x200x1000bytes) if I understand correctly. Given these paths are called many times, we expect a high hit ration, and that most of these are called in hot paths, plus this has a clear impact on throughput, this looks like a great tradeoff overall to me 👍 Thanks!

andrewlock · 2026-04-22T10:31:23Z

+    // Keyed by string[] identity (reference equality) — safe because TagCache holds strong
+    // references to the cached arrays (bounded by MaxEdgeTagCacheSize).
+    private readonly ConcurrentDictionary<string[], NodeHash> _nodeHashCache =
+        new(NodeHashCacheKeyComparer.Instance);


The custom comparer is not required - object comparisons use reference equality by default

Suggested change

// Keyed by string[] identity (reference equality) — safe because TagCache holds strong

// references to the cached arrays (bounded by MaxEdgeTagCacheSize).

private readonly ConcurrentDictionary<string[], NodeHash> _nodeHashCache =

new(NodeHashCacheKeyComparer.Instance);

// Keyed by string[] identity (reference equality) — safe because TagCache holds strong

// references to the cached arrays (bounded by MaxEdgeTagCacheSize).

private readonly ConcurrentDictionary<string[], NodeHash> _nodeHashCache =

new();

andrewlock · 2026-04-22T10:32:21Z

+    }
+
+    /// <summary>
+    /// Reference-equality comparer for string[] keys in <see cref="_nodeHashCache"/>.
+    /// Two string[] objects are considered equal only when they are the same instance,
+    /// which is always true for the cached arrays held by <see cref="TagCache{TKey, TValue}"/>.
+    /// </summary>
+    private sealed class NodeHashCacheKeyComparer : IEqualityComparer<string[]>
+    {
+        internal static readonly NodeHashCacheKeyComparer Instance = new();
+
+        public bool Equals(string[]? x, string[]? y) => ReferenceEquals(x, y);
+
+        public int GetHashCode(string[] obj) => RuntimeHelpers.GetHashCode(obj);
+    }


This isn't necessary, equality uses reference equality by default

Suggested change

}

/// <summary>

/// Reference-equality comparer for string[] keys in <see cref="_nodeHashCache"/>.

/// Two string[] objects are considered equal only when they are the same instance,

/// which is always true for the cached arrays held by <see cref="TagCache{TKey, TValue}"/>.

/// </summary>

private sealed class NodeHashCacheKeyComparer : IEqualityComparer<string[]>

{

internal static readonly NodeHashCacheKeyComparer Instance = new();

public bool Equals(string[]? x, string[]? y) => ReferenceEquals(x, y);

public int GetHashCode(string[] obj) => RuntimeHelpers.GetHashCode(obj);

}

}

(If you like, prove it to yourself with this! 😄)

var random = new Random(); var dict = new ConcurrentDictionary<string[], int>(); var a = new string[] { "Hello", "World" }; var b = new string[] { "Hello", "World" }; Console.WriteLine(dict.GetOrAdd(a, key => random.Next())); Console.WriteLine(dict.GetOrAdd(a, key => random.Next())); Console.WriteLine(dict.GetOrAdd(b, key => random.Next())); Console.WriteLine(dict.GetOrAdd(b, key => random.Next()));

DSM overhead optimizations

3850cdb

github-actions Bot added the area:data-streams-monitoring label Apr 13, 2026

Implement optimizations for all integrations

adeec8f

kr-igor added 2 commits April 14, 2026 12:56

[DSM] Reduce per-message overhead in Kafka produce and consume hot paths

13b47c9

Avoid context switching

ec5f36b

kr-igor force-pushed the kr-igor/dsm-overhead-optimizations branch from b0dfa05 to ec5f36b Compare April 15, 2026 19:38

kr-igor and others added 3 commits April 15, 2026 14:55

Adjusted DrainThreshold and DrainTimeoutMs

110c978

Merge branch 'master' into kr-igor/dsm-overhead-optimizations

8949f8a

Cache backlog tags to avoid allocations, simplified dictionary lookups

bb42028

kr-igor marked this pull request as ready for review April 20, 2026 14:22

kr-igor requested review from a team as code owners April 20, 2026 14:22

Merge branch 'master' into kr-igor/dsm-overhead-optimizations

346e5d1

andrewlock reviewed Apr 20, 2026

View reviewed changes

andrewlock reviewed Apr 21, 2026

View reviewed changes

kr-igor and others added 3 commits April 21, 2026 09:42

Merge branch 'master' into kr-igor/dsm-overhead-optimizations

3f71527

Refactoring to address comments

77b96ba

Final cleanup

5fd99ee

andrewlock added the type:performance Performance, speed, latency, resource usage (CPU, memory) label Apr 22, 2026

andrewlock approved these changes Apr 22, 2026

View reviewed changes

kr-igor and others added 2 commits April 22, 2026 10:41

Removed comparator

519fce4

Merge branch 'master' into kr-igor/dsm-overhead-optimizations

4082bfd

Conversation

kr-igor commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

DSM Per-Message Overhead Optimizations

Summary of changes

Reason for change

Implementation details

Caching strategy

Node-hash caching

Zero-allocation encode/decode

Drain signal

Test coverage

Other details

Uh oh!

dd-trace-dotnet-ci-bot Bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Execution-Time Benchmarks Report ⏱️

FakeDbCommand

HttpMessageHandler

Uh oh!

pr-commenter Bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Explanation

More details about the CI and significant changes

Known flaky benchmarks

scenario:Benchmarks.Trace.ActivityBenchmark.StartStopWithChild net6.0

scenario:Benchmarks.Trace.AgentWriterBenchmark.WriteAndFlushEnrichedTraces net472

scenario:Benchmarks.Trace.AgentWriterBenchmark.WriteAndFlushEnrichedTraces net6.0

scenario:Benchmarks.Trace.AgentWriterBenchmark.WriteAndFlushEnrichedTraces netcoreapp3.1

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleMoreComplexBody net472

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleMoreComplexBody net6.0

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleMoreComplexBody netcoreapp3.1

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleSimpleBody net472

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleSimpleBody net6.0

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleSimpleBody netcoreapp3.1

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorMoreComplexBody net6.0

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorSimpleBody net6.0

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorSimpleBody netcoreapp3.1

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeArgs net472

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeArgs net6.0

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeArgs netcoreapp3.1

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeLegacyArgs net472

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeLegacyArgs net6.0

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeLegacyArgs netcoreapp3.1

scenario:Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafRealisticBenchmark net6.0

scenario:Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafRealisticBenchmarkWithAttack net6.0

scenario:Benchmarks.Trace.AspNetCoreBenchmark.SendRequest net472

scenario:Benchmarks.Trace.AspNetCoreBenchmark.SendRequest net6.0

scenario:Benchmarks.Trace.AspNetCoreBenchmark.SendRequest netcoreapp3.1

scenario:Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces net472

scenario:Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces net6.0

scenario:Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces netcoreapp3.1

scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSliceWithPool netcoreapp3.1

scenario:Benchmarks.Trace.CharSliceBenchmark.OriginalCharSlice net6.0

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearch net472

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearch net6.0

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearch netcoreapp3.1

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearchAsync net472

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearchAsync net6.0

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearchAsync netcoreapp3.1

scenario:Benchmarks.Trace.GraphQLBenchmark.ExecuteAsync net472

scenario:Benchmarks.Trace.GraphQLBenchmark.ExecuteAsync net6.0

scenario:Benchmarks.Trace.GraphQLBenchmark.ExecuteAsync netcoreapp3.1

scenario:Benchmarks.Trace.ILoggerBenchmark.EnrichedLog net6.0

scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatAspectBenchmark net472

scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatAspectBenchmark net6.0

scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatAspectBenchmark netcoreapp3.1

scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatBenchmark net6.0

scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatBenchmark netcoreapp3.1

scenario:Benchmarks.Trace.Log4netBenchmark.EnrichedLog net472

scenario:Benchmarks.Trace.Log4netBenchmark.EnrichedLog net6.0

scenario:Benchmarks.Trace.Log4netBenchmark.EnrichedLog netcoreapp3.1

scenario:Benchmarks.Trace.RedisBenchmark.SendReceive net6.0

scenario:Benchmarks.Trace.SerilogBenchmark.EnrichedLog net472

scenario:Benchmarks.Trace.SerilogBenchmark.EnrichedLog net6.0

scenario:Benchmarks.Trace.SerilogBenchmark.EnrichedLog netcoreapp3.1

scenario:Benchmarks.Trace.SingleSpanAspNetCoreBenchmark.SingleSpanAspNetCore net472

scenario:Benchmarks.Trace.SingleSpanAspNetCoreBenchmark.SingleSpanAspNetCore net6.0

scenario:Benchmarks.Trace.SingleSpanAspNetCoreBenchmark.SingleSpanAspNetCore netcoreapp3.1

kr-igor commented Apr 13, 2026 •

edited

Loading

dd-trace-dotnet-ci-bot Bot commented Apr 13, 2026 •

edited

Loading

pr-commenter Bot commented Apr 14, 2026 •

edited

Loading

andrewlock Apr 20, 2026 •

edited

Loading