Skip to content

DSM overhead optimizations#8450

Open
kr-igor wants to merge 13 commits intomasterfrom
kr-igor/dsm-overhead-optimizations
Open

DSM overhead optimizations#8450
kr-igor wants to merge 13 commits intomasterfrom
kr-igor/dsm-overhead-optimizations

Conversation

@kr-igor
Copy link
Copy Markdown
Contributor

@kr-igor kr-igor commented Apr 13, 2026

DSM Per-Message Overhead Optimizations

Summary of changes

  • Edge-tag array caching: Introduced EdgeTagCache<TKey> and BacklogTagCache<TKey> — process-wide, per-type ConcurrentDictionary caches that intern edge-tag arrays and backlog-tag strings so they are only allocated once per unique key (topic/group/cluster combination).
  • Node-hash caching: Added a NodeHashCacheEntry/NodeHashSnapshot mechanism inside DataStreamsManager that memoizes the expensive CalculateNodeHash result per (edgeTags[], nodeHashBase) pair. Reads are lock-free via a volatile field; writes acquire a per-entry lock only on cache miss or base change.
  • Zero-allocation context encode/decode (net core 3.1+): Added PathwayContextEncoder.EncodeInto and a Span<byte>-based Decode overload; DataStreamsContextPropagator uses stackalloc buffers on .NET Core 3.1+ to avoid intermediate byte[] heap allocations on every produce/consume.
  • Reference-equality dictionary comparers: DataStreamsAggregator and DataStreamsManager._nodeHashCache now use reference-equality comparers backed by RuntimeHelpers.GetHashCode, which is safe because all keys are interned by the caches above.
  • Drain-signal instead of sleep: Replaced the 10 ms Thread.Sleep polling loop in DataStreamsWriter with a ManualResetEventSlim that wakes immediately when the queue reaches 1 000 items or after a 500 ms timeout, eliminating unnecessary context switches.
  • Integration-specific cache-key structs: Added readonly struct cache keys (ConsumeEdgeTagCacheKey, ProduceEdgeTagCacheKey, CommitBacklogTagCacheKey, ProduceBacklogTagCacheKey) for Kafka; equivalent structs for AWS SQS/SNS/Kinesis, Azure Service Bus, IBM MQ, and RabbitMQ.
  • Minor hot-path fix (Kafka): The Remove(TemporaryBase64PathwayContext) header scan is now skipped when KafkaCreateConsumerScopeEnabled=true (the default), avoiding an O(n) scan on every message.
  • LastConsumePathway guard removed: Dropped the redundant != null guard on the produce path that required an AsyncLocal read before the actual AsyncLocal read.

Reason for change

DSM instrumentation runs on the hot path of every instrumented message. Profiling revealed that the dominant allocations were:

  1. A new string[] edge-tag array on every produce/consume call.
  2. A CalculateNodeHash call (hashing over all edge tags) on every checkpoint.
  3. Intermediate byte[] arrays for pathway context Base64 encoding/decoding.
  4. Unnecessary CPU spin from a fixed 10 ms sleep between drain cycles.

These optimizations target p99 and throughput benchmarks for Kafka, SQS, SNS, RabbitMQ, IBM MQ, Azure Service Bus, and Kinesis instrumentation.

Implementation details

Caching strategy

EdgeTagCache<TKey> and BacklogTagCache<TKey> use the static-generic-class pattern (static class Foo<T> with a static field) to give each integration its own dictionary instance without any runtime dispatch. The key type is a readonly struct implementing IEquatable<TKey>, which prevents boxing in ConcurrentDictionary lookups.

The caches are bounded at MaxEdgeTagCacheSize = 1000 entries. Once that limit is reached, new keys are computed on the fly (no caching) to prevent unbounded memory growth from high-cardinality identifiers.

Node-hash caching

_nodeHashCache is keyed by string[] identity (not value equality) because the arrays themselves are interned by EdgeTagCache<TKey>. Each entry holds a volatile NodeHashSnapshot (nodeHashBase + NodeHash). On every checkpoint:

  1. Look up the array reference — O(1) identity hash.
  2. Read the volatile snapshot — lock-free.
  3. If the base matches, return immediately.
  4. Otherwise, acquire the per-entry lock, double-check, compute, and publish a new snapshot.

Zero-allocation encode/decode

PathwayContextEncoder.EncodeInto(PathwayContext, Span<byte>) writes directly into a caller-supplied buffer. DataStreamsContextPropagator stackallocs MaxEncodedSize (26 bytes) and MaxBase64EncodedSize (36 bytes) on the stack and uses Base64.EncodeToUtf8/DecodeFromUtf8 in-place. The only unavoidable allocation is the final ToArray() passed to headers.Add, because Kafka takes ownership of the byte array.

This path is guarded by #if NETCOREAPP3_1_OR_GREATER; .NET Framework falls back to the original heap-allocating path.

Drain signal

DataStreamsWriter previously slept 10 ms unconditionally between drain iterations, burning CPU and adding ~10 ms latency per batch even under load. The new ManualResetEventSlim is signalled immediately when either queue exceeds DrainThreshold (1 000 items), capping worst-case latency at DrainTimeoutMs (500 ms) while eliminating idle wakeups.

Test coverage

  • DataStreamsManagerTests: new unit tests verify that GetOrCreateEdgeTags and GetOrCreateBacklogTags return the same array/string reference on repeated calls with the same key, and distinct references for different keys. Tests cover Kafka produce/consume, RabbitMQ produce/consume, and generic key types.
  • PathwayContextEncoderTests: existing encode/decode round-trip tests pass against the new Span<byte> overloads.
  • All existing DSM tests continue to pass.

Other details

  • The MaxEdgeTagCacheSize constant is internal to allow unit tests to verify the overflow/bypass behavior.
  • No public API surface changes; all new types are internal.
  • .NET Framework code paths are unchanged — all Span-based optimizations are gated behind #if NETCOREAPP3_1_OR_GREATER.

@dd-trace-dotnet-ci-bot
Copy link
Copy Markdown

dd-trace-dotnet-ci-bot Bot commented Apr 13, 2026

Execution-Time Benchmarks Report ⏱️

Execution-time results for samples comparing This PR (8450) and master.

✅ No regressions detected - check the details below

Full Metrics Comparison

FakeDbCommand

Metric Master (Mean ± 95% CI) Current (Mean ± 95% CI) Change Status
.NET Framework 4.8 - Baseline
duration74.35 ± (74.24 - 74.84) ms73.18 ± (73.21 - 73.62) ms-1.6%
.NET Framework 4.8 - Bailout
duration77.21 ± (77.08 - 77.49) ms78.05 ± (77.87 - 78.24) ms+1.1%✅⬆️
.NET Framework 4.8 - CallTarget+Inlining+NGEN
duration1079.83 ± (1078.14 - 1085.50) ms1081.48 ± (1082.51 - 1088.51) ms+0.2%✅⬆️
.NET Core 3.1 - Baseline
process.internal_duration_ms22.75 ± (22.69 - 22.80) ms22.53 ± (22.50 - 22.57) ms-0.9%
process.time_to_main_ms86.09 ± (85.80 - 86.38) ms85.60 ± (85.40 - 85.79) ms-0.6%
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed10.86 ± (10.85 - 10.86) MB10.92 ± (10.92 - 10.93) MB+0.6%✅⬆️
runtime.dotnet.threads.count12 ± (12 - 12)12 ± (12 - 12)+0.0%
.NET Core 3.1 - Bailout
process.internal_duration_ms22.56 ± (22.53 - 22.60) ms22.97 ± (22.91 - 23.03) ms+1.8%✅⬆️
process.time_to_main_ms86.53 ± (86.29 - 86.78) ms89.34 ± (89.01 - 89.68) ms+3.2%✅⬆️
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed10.93 ± (10.92 - 10.93) MB10.96 ± (10.95 - 10.96) MB+0.3%✅⬆️
runtime.dotnet.threads.count13 ± (13 - 13)13 ± (13 - 13)+0.0%
.NET Core 3.1 - CallTarget+Inlining+NGEN
process.internal_duration_ms212.21 ± (211.38 - 213.03) ms209.99 ± (209.10 - 210.87) ms-1.0%
process.time_to_main_ms530.48 ± (529.13 - 531.83) ms530.35 ± (529.02 - 531.67) ms-0.0%
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed47.77 ± (47.74 - 47.80) MB48.01 ± (47.98 - 48.04) MB+0.5%✅⬆️
runtime.dotnet.threads.count28 ± (28 - 28)28 ± (28 - 28)+1.3%✅⬆️
.NET 6 - Baseline
process.internal_duration_ms21.39 ± (21.34 - 21.44) ms21.59 ± (21.54 - 21.65) ms+1.0%✅⬆️
process.time_to_main_ms73.99 ± (73.78 - 74.20) ms75.61 ± (75.35 - 75.86) ms+2.2%✅⬆️
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed10.61 ± (10.61 - 10.61) MB10.65 ± (10.64 - 10.65) MB+0.3%✅⬆️
runtime.dotnet.threads.count10 ± (10 - 10)10 ± (10 - 10)+0.0%
.NET 6 - Bailout
process.internal_duration_ms21.15 ± (21.12 - 21.19) ms21.23 ± (21.20 - 21.27) ms+0.4%✅⬆️
process.time_to_main_ms74.79 ± (74.62 - 74.96) ms74.46 ± (74.30 - 74.61) ms-0.4%
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed10.72 ± (10.72 - 10.72) MB10.76 ± (10.76 - 10.77) MB+0.4%✅⬆️
runtime.dotnet.threads.count11 ± (11 - 11)11 ± (11 - 11)+0.0%
.NET 6 - CallTarget+Inlining+NGEN
process.internal_duration_ms380.63 ± (378.38 - 382.88) ms383.57 ± (381.69 - 385.45) ms+0.8%✅⬆️
process.time_to_main_ms530.27 ± (528.93 - 531.61) ms533.99 ± (532.61 - 535.36) ms+0.7%✅⬆️
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed49.18 ± (49.16 - 49.21) MB49.34 ± (49.32 - 49.37) MB+0.3%✅⬆️
runtime.dotnet.threads.count28 ± (28 - 28)28 ± (28 - 28)+0.0%✅⬆️
.NET 8 - Baseline
process.internal_duration_ms19.99 ± (19.93 - 20.06) ms19.87 ± (19.82 - 19.93) ms-0.6%
process.time_to_main_ms76.48 ± (76.18 - 76.78) ms75.45 ± (75.13 - 75.77) ms-1.3%
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed7.67 ± (7.66 - 7.67) MB7.68 ± (7.67 - 7.68) MB+0.1%✅⬆️
runtime.dotnet.threads.count10 ± (10 - 10)10 ± (10 - 10)+0.0%
.NET 8 - Bailout
process.internal_duration_ms19.55 ± (19.51 - 19.60) ms20.03 ± (19.97 - 20.09) ms+2.4%✅⬆️
process.time_to_main_ms74.84 ± (74.65 - 75.03) ms76.80 ± (76.51 - 77.09) ms+2.6%✅⬆️
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed7.73 ± (7.72 - 7.73) MB7.73 ± (7.72 - 7.73) MB-0.0%
runtime.dotnet.threads.count11 ± (11 - 11)11 ± (11 - 11)+0.0%
.NET 8 - CallTarget+Inlining+NGEN
process.internal_duration_ms306.14 ± (303.84 - 308.43) ms298.85 ± (296.63 - 301.07) ms-2.4%
process.time_to_main_ms491.73 ± (490.45 - 493.02) ms491.98 ± (490.84 - 493.12) ms+0.0%✅⬆️
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed36.57 ± (36.52 - 36.61) MB36.49 ± (36.46 - 36.51) MB-0.2%
runtime.dotnet.threads.count27 ± (27 - 27)27 ± (27 - 27)+0.1%✅⬆️

HttpMessageHandler

Metric Master (Mean ± 95% CI) Current (Mean ± 95% CI) Change Status
.NET Framework 4.8 - Baseline
duration204.44 ± (204.75 - 206.22) ms204.80 ± (205.04 - 206.43) ms+0.2%✅⬆️
.NET Framework 4.8 - Bailout
duration209.08 ± (209.17 - 210.46) ms210.08 ± (210.40 - 211.93) ms+0.5%✅⬆️
.NET Framework 4.8 - CallTarget+Inlining+NGEN
duration1210.85 ± (1210.43 - 1218.54) ms1219.33 ± (1219.31 - 1227.78) ms+0.7%✅⬆️
.NET Core 3.1 - Baseline
process.internal_duration_ms199.19 ± (198.54 - 199.84) ms201.81 ± (200.98 - 202.63) ms+1.3%✅⬆️
process.time_to_main_ms86.47 ± (86.16 - 86.79) ms87.54 ± (87.12 - 87.97) ms+1.2%✅⬆️
runtime.dotnet.exceptions.count3 ± (3 - 3)3 ± (3 - 3)+0.0%
runtime.dotnet.mem.committed16.03 ± (16.01 - 16.05) MB15.95 ± (15.93 - 15.97) MB-0.5%
runtime.dotnet.threads.count20 ± (20 - 20)20 ± (20 - 20)-1.1%
.NET Core 3.1 - Bailout
process.internal_duration_ms198.18 ± (197.55 - 198.81) ms199.73 ± (198.98 - 200.49) ms+0.8%✅⬆️
process.time_to_main_ms87.67 ± (87.38 - 87.96) ms88.52 ± (88.14 - 88.90) ms+1.0%✅⬆️
runtime.dotnet.exceptions.count3 ± (3 - 3)3 ± (3 - 3)+0.0%
runtime.dotnet.mem.committed16.02 ± (16.00 - 16.04) MB15.99 ± (15.97 - 16.01) MB-0.2%
runtime.dotnet.threads.count21 ± (21 - 21)21 ± (21 - 21)+0.9%✅⬆️
.NET Core 3.1 - CallTarget+Inlining+NGEN
process.internal_duration_ms393.95 ± (392.42 - 395.49) ms395.44 ± (393.65 - 397.23) ms+0.4%✅⬆️
process.time_to_main_ms534.96 ± (533.63 - 536.29) ms543.85 ± (542.13 - 545.56) ms+1.7%✅⬆️
runtime.dotnet.exceptions.count3 ± (3 - 3)3 ± (3 - 3)+0.0%
runtime.dotnet.mem.committed58.06 ± (57.92 - 58.19) MB58.23 ± (58.07 - 58.38) MB+0.3%✅⬆️
runtime.dotnet.threads.count30 ± (30 - 30)30 ± (30 - 30)+0.0%✅⬆️
.NET 6 - Baseline
process.internal_duration_ms204.07 ± (203.42 - 204.73) ms205.25 ± (204.54 - 205.96) ms+0.6%✅⬆️
process.time_to_main_ms75.19 ± (74.95 - 75.42) ms76.07 ± (75.70 - 76.43) ms+1.2%✅⬆️
runtime.dotnet.exceptions.count4 ± (4 - 4)4 ± (4 - 4)+0.0%
runtime.dotnet.mem.committed16.23 ± (16.22 - 16.25) MB16.27 ± (16.25 - 16.28) MB+0.2%✅⬆️
runtime.dotnet.threads.count19 ± (19 - 19)19 ± (19 - 19)-0.2%
.NET 6 - Bailout
process.internal_duration_ms207.22 ± (206.23 - 208.20) ms204.16 ± (203.53 - 204.79) ms-1.5%
process.time_to_main_ms77.89 ± (77.43 - 78.36) ms77.13 ± (76.86 - 77.40) ms-1.0%
runtime.dotnet.exceptions.count4 ± (4 - 4)4 ± (4 - 4)+0.0%
runtime.dotnet.mem.committed16.21 ± (16.19 - 16.23) MB16.28 ± (16.26 - 16.30) MB+0.4%✅⬆️
runtime.dotnet.threads.count20 ± (20 - 21)20 ± (20 - 20)-1.4%
.NET 6 - CallTarget+Inlining+NGEN
process.internal_duration_ms598.17 ± (595.44 - 600.91) ms596.39 ± (593.82 - 598.96) ms-0.3%
process.time_to_main_ms539.07 ± (537.86 - 540.28) ms542.03 ± (540.73 - 543.33) ms+0.5%✅⬆️
runtime.dotnet.exceptions.count4 ± (4 - 4)4 ± (4 - 4)+0.0%
runtime.dotnet.mem.committed60.88 ± (60.78 - 60.98) MB61.03 ± (60.93 - 61.13) MB+0.2%✅⬆️
runtime.dotnet.threads.count31 ± (31 - 31)31 ± (31 - 31)-0.8%
.NET 8 - Baseline
process.internal_duration_ms204.04 ± (203.20 - 204.89) ms202.59 ± (201.77 - 203.41) ms-0.7%
process.time_to_main_ms74.31 ± (74.04 - 74.59) ms74.76 ± (74.41 - 75.11) ms+0.6%✅⬆️
runtime.dotnet.exceptions.count4 ± (4 - 4)4 ± (4 - 4)+0.0%
runtime.dotnet.mem.committed11.55 ± (11.54 - 11.57) MB11.57 ± (11.55 - 11.58) MB+0.2%✅⬆️
runtime.dotnet.threads.count19 ± (19 - 19)19 ± (19 - 19)-0.1%
.NET 8 - Bailout
process.internal_duration_ms202.12 ± (201.46 - 202.77) ms202.52 ± (201.73 - 203.31) ms+0.2%✅⬆️
process.time_to_main_ms75.67 ± (75.42 - 75.91) ms76.29 ± (75.99 - 76.60) ms+0.8%✅⬆️
runtime.dotnet.exceptions.count4 ± (4 - 4)4 ± (4 - 4)+0.0%
runtime.dotnet.mem.committed11.61 ± (11.60 - 11.62) MB11.65 ± (11.63 - 11.66) MB+0.3%✅⬆️
runtime.dotnet.threads.count20 ± (20 - 20)20 ± (20 - 20)+0.1%✅⬆️
.NET 8 - CallTarget+Inlining+NGEN
process.internal_duration_ms538.18 ± (531.82 - 544.54) ms535.27 ± (529.16 - 541.39) ms-0.5%
process.time_to_main_ms497.77 ± (496.58 - 498.97) ms501.64 ± (500.26 - 503.03) ms+0.8%✅⬆️
runtime.dotnet.exceptions.count4 ± (4 - 4)4 ± (4 - 4)+0.0%
runtime.dotnet.mem.committed50.45 ± (50.37 - 50.52) MB50.28 ± (50.20 - 50.35) MB-0.3%
runtime.dotnet.threads.count30 ± (30 - 30)30 ± (30 - 30)+0.1%✅⬆️
Comparison explanation

Execution-time benchmarks measure the whole time it takes to execute a program, and are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are highlighted in **red**. The following thresholds were used for comparing the execution times:

  • Welch test with statistical test for significance of 5%
  • Only results indicating a difference greater than 5% and 5 ms are considered.

Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard.

Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph).

Duration charts
FakeDbCommand (.NET Framework 4.8)
gantt
    title Execution time (ms) FakeDbCommand (.NET Framework 4.8)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8450) - mean (73ms)  : 70, 77
    master - mean (75ms)  : 70, 79

    section Bailout
    This PR (8450) - mean (78ms)  : 76, 80
    master - mean (77ms)  : 75, 80

    section CallTarget+Inlining+NGEN
    This PR (8450) - mean (1,086ms)  : 1042, 1129
    master - mean (1,082ms)  : 1029, 1135

Loading
FakeDbCommand (.NET Core 3.1)
gantt
    title Execution time (ms) FakeDbCommand (.NET Core 3.1)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8450) - mean (115ms)  : 111, 120
    master - mean (116ms)  : 111, 121

    section Bailout
    This PR (8450) - mean (120ms)  : 113, 127
    master - mean (116ms)  : 113, 119

    section CallTarget+Inlining+NGEN
    This PR (8450) - mean (777ms)  : 749, 804
    master - mean (782ms)  : 748, 816

Loading
FakeDbCommand (.NET 6)
gantt
    title Execution time (ms) FakeDbCommand (.NET 6)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8450) - mean (104ms)  : 98, 110
    master - mean (102ms)  : 97, 106

    section Bailout
    This PR (8450) - mean (102ms)  : 100, 105
    master - mean (102ms)  : 100, 105

    section CallTarget+Inlining+NGEN
    This PR (8450) - mean (946ms)  : 910, 982
    master - mean (941ms)  : 903, 979

Loading
FakeDbCommand (.NET 8)
gantt
    title Execution time (ms) FakeDbCommand (.NET 8)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8450) - mean (104ms)  : 98, 109
    master - mean (105ms)  : 100, 110

    section Bailout
    This PR (8450) - mean (105ms)  : 100, 111
    master - mean (102ms)  : 98, 106

    section CallTarget+Inlining+NGEN
    This PR (8450) - mean (822ms)  : 788, 856
    master - mean (829ms)  : 789, 870

Loading
HttpMessageHandler (.NET Framework 4.8)
gantt
    title Execution time (ms) HttpMessageHandler (.NET Framework 4.8)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8450) - mean (206ms)  : 196, 216
    master - mean (205ms)  : 194, 217

    section Bailout
    This PR (8450) - mean (211ms)  : 199, 223
    master - mean (210ms)  : 201, 219

    section CallTarget+Inlining+NGEN
    This PR (8450) - mean (1,224ms)  : 1163, 1284
    master - mean (1,214ms)  : 1156, 1273

Loading
HttpMessageHandler (.NET Core 3.1)
gantt
    title Execution time (ms) HttpMessageHandler (.NET Core 3.1)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8450) - mean (301ms)  : 280, 322
    master - mean (296ms)  : 281, 311

    section Bailout
    This PR (8450) - mean (298ms)  : 283, 314
    master - mean (297ms)  : 280, 315

    section CallTarget+Inlining+NGEN
    This PR (8450) - mean (981ms)  : 952, 1010
    master - mean (967ms)  : 934, 999

Loading
HttpMessageHandler (.NET 6)
gantt
    title Execution time (ms) HttpMessageHandler (.NET 6)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8450) - mean (292ms)  : 276, 307
    master - mean (289ms)  : 275, 304

    section Bailout
    This PR (8450) - mean (291ms)  : 278, 304
    master - mean (295ms)  : 275, 314

    section CallTarget+Inlining+NGEN
    This PR (8450) - mean (1,169ms)  : 1125, 1214
    master - mean (1,165ms)  : 1119, 1210

Loading
HttpMessageHandler (.NET 8)
gantt
    title Execution time (ms) HttpMessageHandler (.NET 8)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8450) - mean (290ms)  : 270, 311
    master - mean (291ms)  : 268, 314

    section Bailout
    This PR (8450) - mean (292ms)  : 271, 312
    master - mean (289ms)  : 275, 304

    section CallTarget+Inlining+NGEN
    This PR (8450) - mean (1,069ms)  : 984, 1154
    master - mean (1,071ms)  : 978, 1164

Loading

@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented Apr 14, 2026

Benchmarks

Benchmark execution time: 2026-04-22 20:38:55

Comparing candidate commit 4082bfd in PR branch kr-igor/dsm-overhead-optimizations with baseline commit 34902b0 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 27 metrics, 0 unstable metrics, 62 known flaky benchmarks, 25 flaky benchmarks without significant changes.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

  • 🟩 = significantly better candidate vs. baseline
  • 🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

Known flaky benchmarks

These benchmarks are marked as flaky and will not trigger a failure. Modify FLAKY_BENCHMARKS_REGEX to control which benchmarks are marked as flaky.

scenario:Benchmarks.Trace.ActivityBenchmark.StartStopWithChild net6.0

  • 🟩 throughput [+6415.516op/s; +9709.640op/s] or [+5.392%; +8.161%]

scenario:Benchmarks.Trace.AgentWriterBenchmark.WriteAndFlushEnrichedTraces net472

  • 🟥 execution_time [+308.224ms; +310.877ms] or [+152.952%; +154.268%]

scenario:Benchmarks.Trace.AgentWriterBenchmark.WriteAndFlushEnrichedTraces net6.0

  • 🟥 execution_time [+382.729ms; +385.286ms] or [+302.379%; +304.400%]

scenario:Benchmarks.Trace.AgentWriterBenchmark.WriteAndFlushEnrichedTraces netcoreapp3.1

  • 🟥 execution_time [+400.366ms; +401.520ms] or [+354.309%; +355.330%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleMoreComplexBody net472

  • 🟥 allocated_mem [+1.308KB; +1.308KB] or [+27.529%; +27.541%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleMoreComplexBody net6.0

  • 🟥 allocated_mem [+471 bytes; +472 bytes] or [+9.977%; +9.987%]
  • 🟩 execution_time [-15.590ms; -11.407ms] or [-7.281%; -5.327%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleMoreComplexBody netcoreapp3.1

  • 🟥 allocated_mem [+1.272KB; +1.272KB] or [+27.502%; +27.510%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleSimpleBody net472

  • 🟥 allocated_mem [+1.307KB; +1.307KB] or [+105.746%; +105.759%]
  • 🟥 throughput [-258788.703op/s; -254546.446op/s] or [-26.424%; -25.990%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleSimpleBody net6.0

  • 🟥 allocated_mem [+471 bytes; +472 bytes] or [+38.558%; +38.566%]
  • 🟩 execution_time [-27.062ms; -22.196ms] or [-12.069%; -9.899%]
  • 🟥 throughput [-70496.047op/s; -47876.618op/s] or [-7.531%; -5.115%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleSimpleBody netcoreapp3.1

  • 🟥 allocated_mem [+1.272KB; +1.272KB] or [+105.292%; +105.304%]
  • 🟥 throughput [-132372.763op/s; -116285.250op/s] or [-19.019%; -16.708%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorMoreComplexBody net6.0

  • 🟩 throughput [+8504.347op/s; +11583.878op/s] or [+5.411%; +7.371%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorSimpleBody net6.0

  • 🟩 throughput [+349712.510op/s; +386362.031op/s] or [+11.661%; +12.883%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorSimpleBody netcoreapp3.1

  • 🟩 execution_time [-19.437ms; -15.062ms] or [-8.960%; -6.943%]
  • 🟩 throughput [+135834.090op/s; +190225.651op/s] or [+5.392%; +7.551%]

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeArgs net472

  • 🟥 execution_time [+300.060ms; +300.589ms] or [+149.930%; +150.194%]

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeArgs net6.0

  • 🟥 execution_time [+299.933ms; +303.130ms] or [+151.257%; +152.869%]

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeArgs netcoreapp3.1

  • 🟥 execution_time [+299.894ms; +302.406ms] or [+151.063%; +152.329%]

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeLegacyArgs net472

  • 🟥 execution_time [+297.285ms; +297.807ms] or [+146.015%; +146.271%]

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeLegacyArgs net6.0

  • 🟥 execution_time [+298.265ms; +301.252ms] or [+145.811%; +147.271%]

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeLegacyArgs netcoreapp3.1

  • 🟥 execution_time [+301.405ms; +302.760ms] or [+150.642%; +151.319%]

scenario:Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafRealisticBenchmark net6.0

  • 🟥 throughput [-236.171op/s; -115.639op/s] or [-10.268%; -5.028%]

scenario:Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafRealisticBenchmarkWithAttack net6.0

  • 🟥 execution_time [+28.235µs; +52.840µs] or [+9.014%; +16.869%]
  • 🟥 throughput [-476.826op/s; -273.845op/s] or [-14.864%; -8.537%]

scenario:Benchmarks.Trace.AspNetCoreBenchmark.SendRequest net472

  • 🟥 execution_time [+299.562ms; +300.248ms] or [+149.512%; +149.855%]

scenario:Benchmarks.Trace.AspNetCoreBenchmark.SendRequest net6.0

  • unstable execution_time [+368.229ms; +403.553ms] or [+400.096%; +438.477%]
  • 🟩 throughput [+1038.936op/s; +1177.457op/s] or [+8.537%; +9.675%]

scenario:Benchmarks.Trace.AspNetCoreBenchmark.SendRequest netcoreapp3.1

  • unstable execution_time [+261.047ms; +304.660ms] or [+198.211%; +231.326%]
  • 🟩 throughput [+649.797op/s; +870.338op/s] or [+6.290%; +8.425%]

scenario:Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces net472

  • unstable execution_time [+243.071ms; +307.717ms] or [+111.762%; +141.485%]
  • 🟥 throughput [-530.937op/s; -459.353op/s] or [-48.108%; -41.622%]

scenario:Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces net6.0

  • unstable execution_time [+202.179ms; +335.387ms] or [+86.160%; +142.928%]
  • 🟥 throughput [-746.355op/s; -662.904op/s] or [-49.782%; -44.216%]

scenario:Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces netcoreapp3.1

  • 🟥 execution_time [+334.607ms; +341.165ms] or [+200.134%; +204.056%]
  • 🟥 throughput [-398.241op/s; -363.449op/s] or [-27.729%; -25.306%]

scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSliceWithPool netcoreapp3.1

  • unstable throughput [+3.569op/s; +60.368op/s] or [+0.666%; +11.268%]

scenario:Benchmarks.Trace.CharSliceBenchmark.OriginalCharSlice net6.0

  • 🟩 execution_time [-176.605µs; -137.310µs] or [-8.946%; -6.956%]
  • 🟩 throughput [+39.991op/s; +50.401op/s] or [+7.895%; +9.950%]

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearch net472

  • 🟥 execution_time [+301.808ms; +303.119ms] or [+151.985%; +152.645%]

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearch net6.0

  • 🟥 execution_time [+301.823ms; +303.289ms] or [+151.244%; +151.979%]

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearch netcoreapp3.1

  • 🟥 execution_time [+301.672ms; +304.920ms] or [+151.547%; +153.179%]

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearchAsync net472

  • 🟥 execution_time [+302.638ms; +303.859ms] or [+151.975%; +152.588%]
  • 🟩 throughput [+15387.380op/s; +17191.632op/s] or [+5.155%; +5.759%]

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearchAsync net6.0

  • 🟥 execution_time [+298.610ms; +300.553ms] or [+147.649%; +148.610%]

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearchAsync netcoreapp3.1

  • 🟥 execution_time [+304.477ms; +308.022ms] or [+154.322%; +156.119%]

scenario:Benchmarks.Trace.GraphQLBenchmark.ExecuteAsync net472

  • 🟥 execution_time [+301.589ms; +303.000ms] or [+151.370%; +152.079%]

scenario:Benchmarks.Trace.GraphQLBenchmark.ExecuteAsync net6.0

  • 🟥 execution_time [+297.601ms; +299.352ms] or [+148.327%; +149.199%]
  • 🟩 throughput [+44430.499op/s; +49707.712op/s] or [+8.822%; +9.870%]

scenario:Benchmarks.Trace.GraphQLBenchmark.ExecuteAsync netcoreapp3.1

  • 🟥 execution_time [+300.821ms; +303.216ms] or [+149.656%; +150.847%]

scenario:Benchmarks.Trace.ILoggerBenchmark.EnrichedLog net6.0

  • 🟩 execution_time [-16.021ms; -11.510ms] or [-7.450%; -5.352%]

scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatAspectBenchmark net472

  • unstable execution_time [+5.200µs; +46.487µs] or [+1.284%; +11.483%]

scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatAspectBenchmark net6.0

  • 🟩 allocated_mem [-25.419KB; -25.398KB] or [-9.272%; -9.265%]
  • unstable execution_time [-52.289µs; -0.114µs] or [-10.335%; -0.023%]

scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatAspectBenchmark netcoreapp3.1

  • 🟩 allocated_mem [-26.789KB; -26.772KB] or [-9.766%; -9.760%]
  • unstable execution_time [-14.666µs; +100.199µs] or [-2.542%; +17.364%]
  • unstable throughput [-80.671op/s; +125.316op/s] or [-4.609%; +7.159%]

scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatBenchmark net6.0

  • 🟥 execution_time [+5.785µs; +9.566µs] or [+13.673%; +22.611%]
  • 🟥 throughput [-4485.702op/s; -2759.521op/s] or [-18.883%; -11.617%]

scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatBenchmark netcoreapp3.1

  • unstable execution_time [-14.053µs; -6.935µs] or [-21.803%; -10.760%]
  • 🟩 throughput [+1760.497op/s; +3251.399op/s] or [+10.801%; +19.948%]

scenario:Benchmarks.Trace.Log4netBenchmark.EnrichedLog net472

  • 🟥 execution_time [+302.334ms; +303.717ms] or [+152.816%; +153.515%]

scenario:Benchmarks.Trace.Log4netBenchmark.EnrichedLog net6.0

  • 🟥 execution_time [+303.150ms; +305.171ms] or [+154.302%; +155.331%]

scenario:Benchmarks.Trace.Log4netBenchmark.EnrichedLog netcoreapp3.1

  • 🟥 execution_time [+300.996ms; +303.703ms] or [+150.686%; +152.041%]

scenario:Benchmarks.Trace.RedisBenchmark.SendReceive net6.0

  • 🟩 throughput [+36163.352op/s; +40473.897op/s] or [+6.845%; +7.661%]

scenario:Benchmarks.Trace.SerilogBenchmark.EnrichedLog net472

  • 🟥 execution_time [+302.431ms; +304.290ms] or [+150.734%; +151.661%]

scenario:Benchmarks.Trace.SerilogBenchmark.EnrichedLog net6.0

  • 🟥 execution_time [+301.394ms; +302.353ms] or [+151.346%; +151.827%]

scenario:Benchmarks.Trace.SerilogBenchmark.EnrichedLog netcoreapp3.1

  • 🟥 execution_time [+303.076ms; +305.367ms] or [+153.700%; +154.863%]

scenario:Benchmarks.Trace.SingleSpanAspNetCoreBenchmark.SingleSpanAspNetCore net472

  • 🟥 execution_time [+300.201ms; +300.910ms] or [+149.742%; +150.096%]
  • 🟩 throughput [+61159705.532op/s; +61429752.049op/s] or [+44.540%; +44.737%]

scenario:Benchmarks.Trace.SingleSpanAspNetCoreBenchmark.SingleSpanAspNetCore net6.0

  • unstable execution_time [+345.819ms; +384.324ms] or [+430.087%; +477.976%]
  • 🟩 throughput [+1039.204op/s; +1233.252op/s] or [+8.034%; +9.534%]

scenario:Benchmarks.Trace.SingleSpanAspNetCoreBenchmark.SingleSpanAspNetCore netcoreapp3.1

  • 🟥 execution_time [+299.783ms; +300.711ms] or [+149.525%; +149.988%]

scenario:Benchmarks.Trace.SpanBenchmark.StartFinishScope net6.0

  • 🟩 throughput [+90379.979op/s; +100025.660op/s] or [+8.438%; +9.339%]

scenario:Benchmarks.Trace.SpanBenchmark.StartFinishScope netcoreapp3.1

  • 🟩 throughput [+60816.554op/s; +80111.002op/s] or [+7.039%; +9.273%]

scenario:Benchmarks.Trace.SpanBenchmark.StartFinishSpan net6.0

  • 🟩 throughput [+88146.714op/s; +118898.786op/s] or [+6.823%; +9.203%]

scenario:Benchmarks.Trace.SpanBenchmark.StartFinishSpan netcoreapp3.1

  • 🟩 throughput [+90172.502op/s; +98764.280op/s] or [+8.956%; +9.809%]

scenario:Benchmarks.Trace.SpanBenchmark.StartFinishTwoScopes net6.0

  • 🟩 throughput [+43919.825op/s; +52911.054op/s] or [+7.975%; +9.608%]

scenario:Benchmarks.Trace.SpanBenchmark.StartFinishTwoScopes netcoreapp3.1

  • 🟩 throughput [+27138.070op/s; +36843.295op/s] or [+6.074%; +8.247%]

scenario:Benchmarks.Trace.TraceAnnotationsBenchmark.RunOnMethodBegin net472

  • 🟥 throughput [-44092.977op/s; -37665.414op/s] or [-6.453%; -5.513%]

scenario:Benchmarks.Trace.TraceAnnotationsBenchmark.RunOnMethodBegin net6.0

  • 🟩 throughput [+81502.056op/s; +98697.489op/s] or [+9.106%; +11.027%]

Known flaky benchmarks without significant changes:

  • scenario:Benchmarks.Trace.ActivityBenchmark.StartStopWithChild net472
  • scenario:Benchmarks.Trace.ActivityBenchmark.StartStopWithChild netcoreapp3.1
  • scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorMoreComplexBody net472
  • scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorMoreComplexBody netcoreapp3.1
  • scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorSimpleBody net472
  • scenario:Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafRealisticBenchmark net472
  • scenario:Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafRealisticBenchmark netcoreapp3.1
  • scenario:Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafRealisticBenchmarkWithAttack net472
  • scenario:Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafRealisticBenchmarkWithAttack netcoreapp3.1
  • scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSlice net472
  • scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSlice net6.0
  • scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSlice netcoreapp3.1
  • scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSliceWithPool net472
  • scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSliceWithPool net6.0
  • scenario:Benchmarks.Trace.CharSliceBenchmark.OriginalCharSlice net472
  • scenario:Benchmarks.Trace.CharSliceBenchmark.OriginalCharSlice netcoreapp3.1
  • scenario:Benchmarks.Trace.ILoggerBenchmark.EnrichedLog net472
  • scenario:Benchmarks.Trace.ILoggerBenchmark.EnrichedLog netcoreapp3.1
  • scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatBenchmark net472
  • scenario:Benchmarks.Trace.RedisBenchmark.SendReceive net472
  • scenario:Benchmarks.Trace.RedisBenchmark.SendReceive netcoreapp3.1
  • scenario:Benchmarks.Trace.SpanBenchmark.StartFinishScope net472
  • scenario:Benchmarks.Trace.SpanBenchmark.StartFinishSpan net472
  • scenario:Benchmarks.Trace.SpanBenchmark.StartFinishTwoScopes net472
  • scenario:Benchmarks.Trace.TraceAnnotationsBenchmark.RunOnMethodBegin netcoreapp3.1

@kr-igor kr-igor force-pushed the kr-igor/dsm-overhead-optimizations branch from b0dfa05 to ec5f36b Compare April 15, 2026 19:38
@kr-igor kr-igor marked this pull request as ready for review April 20, 2026 14:22
@kr-igor kr-igor requested review from a team as code owners April 20, 2026 14:22
Comment thread tracer/src/Datadog.Trace/DataStreamsMonitoring/DataStreamsManager.cs Outdated
return factory(key);
}

return cache.GetOrAdd(key, factory);
Copy link
Copy Markdown
Member

@andrewlock andrewlock Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could/should optimize this pattern. Currently:

  • cache.Count is expensive - it takes a full lock on the dictionary internally
  • cache.TryGetValue followed by cache.GetOrAdd(key, factory); is two lookups on failures

Given we don't need exactly 1000 items (and AFAICT, we never remove items) I think you could optimize this by moving the method call to the TagCache type directly, and storing an additional count locally there, and using that to avoid the full lock, roughly MaxEdgeTagCacheSize items

private int _edgeCacheCount = 0;
private EdgeCache _cache; // hand waving the generic issues

public string[] GetOrCreateEdgeTags<TKey>(TKey key, Func<TKey, string[]> factory)
        where TKey : notnull, IEquatable<TKey>
    {
        if (cache.TryGetValue(key, out var existing))
        {
            return existing;
        }

        if (Volatile.Read(ref _edgeCacheCount) < MaxEdgeTagCacheSize)
        {
            // High-cardinality key space — bypass cache to prevent unbounded memory growth
            return factory(key);
        }

        Interlocked.Increment(ref _editCacheCount);
        return cache.GetOrAdd(key, factory);
}

We still have the two lookups on cache exceeded, but we lose the expensive Count call at least

Comment thread tracer/src/Datadog.Trace/DataStreamsMonitoring/EdgeTagCache.cs Outdated
return new PathwayContext(new PathwayHash(hash), pathwayStartNs, edgeStartNs);
}

#if NETCOREAPP3_1_OR_GREATER
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After we merge this: #8476 we can open this up more broadly, and make it the only implementation 🙂

Copy link
Copy Markdown
Member

@andrewlock andrewlock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! I think it looks like a good plan overall, there's just the conversion to readonly record struct to simplify things, and the question about whether we can optimize the failure cases to avoid calling the expensive ConcurrentDictionary.Count property I think

Comment thread tracer/src/Datadog.Trace/DataStreamsMonitoring/DataStreamsManager.cs Outdated
@kr-igor
Copy link
Copy Markdown
Contributor Author

kr-igor commented Apr 21, 2026

Pushed fixes for all comments

@andrewlock andrewlock added the type:performance Performance, speed, latency, resource usage (CPU, memory) label Apr 22, 2026
Copy link
Copy Markdown
Member

@andrewlock andrewlock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one last cleanup we can do (reference equality is default, we don't need a custom comparer)

Just to sense check the limits, we have:

  • ~10 different cache key types currently
  • each array prob being ballpark ~200 bytes (depends on dynamic data, so hard to say)
  • We cache up to ~1000 distinct arrays

So this could raise the "static" memory usage by ~2MB (10x200x1000bytes) if I understand correctly. Given these paths are called many times, we expect a high hit ration, and that most of these are called in hot paths, plus this has a clear impact on throughput, this looks like a great tradeoff overall to me 👍 Thanks!

Comment on lines +47 to +50
// Keyed by string[] identity (reference equality) — safe because TagCache holds strong
// references to the cached arrays (bounded by MaxEdgeTagCacheSize).
private readonly ConcurrentDictionary<string[], NodeHash> _nodeHashCache =
new(NodeHashCacheKeyComparer.Instance);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The custom comparer is not required - object comparisons use reference equality by default

Suggested change
// Keyed by string[] identity (reference equality) — safe because TagCache holds strong
// references to the cached arrays (bounded by MaxEdgeTagCacheSize).
private readonly ConcurrentDictionary<string[], NodeHash> _nodeHashCache =
new(NodeHashCacheKeyComparer.Instance);
// Keyed by string[] identity (reference equality) — safe because TagCache holds strong
// references to the cached arrays (bounded by MaxEdgeTagCacheSize).
private readonly ConcurrentDictionary<string[], NodeHash> _nodeHashCache =
new();

Comment on lines +447 to +461
}

/// <summary>
/// Reference-equality comparer for string[] keys in <see cref="_nodeHashCache"/>.
/// Two string[] objects are considered equal only when they are the same instance,
/// which is always true for the cached arrays held by <see cref="TagCache{TKey, TValue}"/>.
/// </summary>
private sealed class NodeHashCacheKeyComparer : IEqualityComparer<string[]>
{
internal static readonly NodeHashCacheKeyComparer Instance = new();

public bool Equals(string[]? x, string[]? y) => ReferenceEquals(x, y);

public int GetHashCode(string[] obj) => RuntimeHelpers.GetHashCode(obj);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't necessary, equality uses reference equality by default

Suggested change
}
/// <summary>
/// Reference-equality comparer for string[] keys in <see cref="_nodeHashCache"/>.
/// Two string[] objects are considered equal only when they are the same instance,
/// which is always true for the cached arrays held by <see cref="TagCache{TKey, TValue}"/>.
/// </summary>
private sealed class NodeHashCacheKeyComparer : IEqualityComparer<string[]>
{
internal static readonly NodeHashCacheKeyComparer Instance = new();
public bool Equals(string[]? x, string[]? y) => ReferenceEquals(x, y);
public int GetHashCode(string[] obj) => RuntimeHelpers.GetHashCode(obj);
}
}

(If you like, prove it to yourself with this! 😄)

var random = new Random();
var dict = new ConcurrentDictionary<string[], int>();

var a = new string[] { "Hello", "World" };
var b = new string[] { "Hello", "World" };

Console.WriteLine(dict.GetOrAdd(a, key => random.Next()));
Console.WriteLine(dict.GetOrAdd(a, key => random.Next()));
Console.WriteLine(dict.GetOrAdd(b, key => random.Next()));
Console.WriteLine(dict.GetOrAdd(b, key => random.Next()));

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:data-streams-monitoring type:performance Performance, speed, latency, resource usage (CPU, memory)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants