perf(propagation): tighten tracestate, baggage, and tag inject paths#8234
Merged
BridgeAR merged 2 commits intoMay 5, 2026
Merged
Conversation
🎉 All green!❄️ No new flaky tests detected 🎯 Code Coverage (details) 🔗 Commit SHA: 00d0ec6 | Docs | Datadog PR Page | Give us feedback! |
BenchmarksBenchmark execution time: 2026-05-05 13:06:28 Comparing candidate commit 00d0ec6 in PR branch Found 109 performance improvements and 0 performance regressions! Performance is the same for 1642 metrics, 93 unstable metrics. scenario:datastreams-consume-18
scenario:datastreams-consume-20
scenario:datastreams-consume-22
scenario:datastreams-consume-24
scenario:datastreams-produce-18
scenario:datastreams-produce-20
scenario:datastreams-produce-22
scenario:datastreams-produce-24
scenario:datastreams-produce-high-cardinality-18
scenario:datastreams-produce-high-cardinality-20
scenario:datastreams-produce-high-cardinality-22
scenario:datastreams-produce-high-cardinality-24
scenario:datastreams-produce-manual-checkpoint-18
scenario:datastreams-produce-manual-checkpoint-20
scenario:datastreams-produce-manual-checkpoint-22
scenario:datastreams-produce-manual-checkpoint-24
scenario:datastreams-produce-with-message-size-18
scenario:datastreams-produce-with-message-size-20
scenario:datastreams-produce-with-message-size-22
scenario:datastreams-produce-with-message-size-24
scenario:propagation-extract-18
scenario:propagation-extract-20
scenario:propagation-extract-22
scenario:propagation-extract-24
scenario:propagation-extract-baggage-ascii-18
scenario:propagation-extract-baggage-ascii-20
scenario:propagation-extract-baggage-ascii-22
scenario:propagation-extract-baggage-ascii-24
scenario:propagation-extract-inject-18
scenario:propagation-extract-inject-20
scenario:propagation-extract-inject-22
scenario:propagation-extract-inject-24
scenario:propagation-inject-18
scenario:propagation-inject-20
scenario:propagation-inject-22
scenario:propagation-inject-24
|
Contributor
Overall package sizeSelf size: 5.71 MB Dependency sizes| name | version | self size | total size | |------|---------|-----------|------------| | import-in-the-middle | 3.0.1 | 82.56 kB | 817.39 kB | | dc-polyfill | 0.1.10 | 26.73 kB | 26.73 kB |🤖 This report was automatically generated by heaviest-objects-in-the-universe |
W3C baggage extract and `_dd.p.*` tag inject both run on every
traced HTTP request. Several sub-allocations are dropped:
`text_map.js` swaps three single-char-class `replaceAll(/[\xNN]/g, ...)`
regex literals for `replaceAll('=', '~')` / `replaceAll('~', '=')`,
which skip the regex match path. `_injectTags` and
`_injectTraceparent` walk `trace.tags` via `Object.keys(...)`
instead of banned `for-in`; the trace-tags object's prototype
chain isn't ours to trust, and `for-in` enumerates inherited
keys. `_injectBaggageItems` swaps `Object.entries` for
`Object.keys` + indexed read, dropping the per-baggage-item
`[k, v]` tuple. `_extractBaggageItems` caches the `baggageTagKeys`
`Set` on the propagator (rebuilt only when the config array
reference changes, e.g. remote-config rotation) and gates
`decodeURIComponent` behind `value.includes('%')` — a
microbenchmark pins the gated path at 13.4x faster than running
`decodeURIComponent` on plain ASCII baggage and only 2% slower
than the raw call on percent-encoded values.
`tracestate.js#forVendor` reuses the `state.toString()` value
computed one line above instead of recomputing it.
The original draft also rewrote `tracestate#fromString` to drop
an `Array#unshift` quadratic; #8256 landed a linear parser first
that supersedes those hunks, so they're dropped.
DSM observes every Kafka, SQS, SNS, Kinesis, Pub/Sub, and AMQP
message when enabled, so the per-checkpoint hot path compounds.
Several allocations are removed without changing the wire format:
1. `getSizeOrZero` stopped allocating a fresh Buffer copy of every
string just to read its UTF-8 byte length. `Buffer.byteLength`
returns the same value with no allocation. `getHeadersSize`'s
`Object.entries(...).reduce(...)` becomes a `for (const key of
Object.keys(headers))` loop, dropping the per-header `[k, v]`
tuple and the reducer closure.
2. `pathway.js#shaHash` extracted the first 8 bytes of SHA-256 by
round-tripping through a 64-char hex string + a 16-char slice +
a hex-decoded Buffer. `digest().subarray(0, 8)` produces the same
bytes directly. `computeHash` now also caches
`hashableEdgeTags.join('')` and `propagationHashBigInt.toString(16)`
once per call (each was computed twice), gates the
`manual_checkpoint:true` filter on `includes(...)` so the common
path skips the alloc, and reuses a module-scope 20-byte scratch
buffer to assemble `encodePathwayContext` with a single
`Buffer.from(subarray)` copy-out instead of seven nested allocs.
3. `setCheckpoint` precomputes `PATHWAY_HEADER_BYTES` from the static
header overhead instead of allocating a temp object, encoding
it, and JSON-stringifying just to read its length. It now reads
the direction from `edgeTags[0]` directly: every in-tree caller
places it there, the `DataStreamsCheckpointer` shape is updated
to match, and the test fixture pinning that arg order is updated
in the same commit.
Drive-by fix:
* `recordCheckpoint` reuses the `BigInt` already computed by the
`StatsPoint` returned from `forCheckpoint(...)` instead of running
`readBigUInt64LE` a second time. `setCheckpoint` returns
`undefined` (rather than `null`) on the disabled fast path so
the function shape matches the rest of the file.
* `processor.js` drops the `DsmPathwayCodec` import that the
inlined byte-count made unreachable; `pathway.js` exports
`CONTEXT_PROPAGATION_KEY_BASE64` so the constant calculation is
anchored to the actual header key.
* `encoding.js` adds an `encodeVarintInto(target, offset, value)`
helper so the pathway encoder can write directly into the scratch
buffer instead of allocating a per-varint `Uint8Array` and
copying.
00b73c3 to
00d0ec6
Compare
bengl
approved these changes
May 5, 2026
Merged
sabrenner
pushed a commit
that referenced
this pull request
May 6, 2026
…8234) * perf(propagation): tighten baggage and tag inject paths W3C baggage extract and `_dd.p.*` tag inject both run on every traced HTTP request. Several sub-allocations are dropped: `text_map.js` swaps three single-char-class `replaceAll(/[\xNN]/g, ...)` regex literals for `replaceAll('=', '~')` / `replaceAll('~', '=')`, which skip the regex match path. `_injectTags` and `_injectTraceparent` walk `trace.tags` via `Object.keys(...)` instead of banned `for-in`; the trace-tags object's prototype chain isn't ours to trust, and `for-in` enumerates inherited keys. `_injectBaggageItems` swaps `Object.entries` for `Object.keys` + indexed read, dropping the per-baggage-item `[k, v]` tuple. `_extractBaggageItems` caches the `baggageTagKeys` `Set` on the propagator (rebuilt only when the config array reference changes, e.g. remote-config rotation) and gates `decodeURIComponent` behind `value.includes('%')` — a microbenchmark pins the gated path at 13.4x faster than running `decodeURIComponent` on plain ASCII baggage and only 2% slower than the raw call on percent-encoded values. `tracestate.js#forVendor` reuses the `state.toString()` value computed one line above instead of recomputing it. The original draft also rewrote `tracestate#fromString` to drop an `Array#unshift` quadratic; #8256 landed a linear parser first that supersedes those hunks, so they're dropped. * perf(dsm): trim per-checkpoint and per-message allocations DSM observes every Kafka, SQS, SNS, Kinesis, Pub/Sub, and AMQP message when enabled, so the per-checkpoint hot path compounds. Several allocations are removed without changing the wire format: 1. `getSizeOrZero` stopped allocating a fresh Buffer copy of every string just to read its UTF-8 byte length. `Buffer.byteLength` returns the same value with no allocation. `getHeadersSize`'s `Object.entries(...).reduce(...)` becomes a `for (const key of Object.keys(headers))` loop, dropping the per-header `[k, v]` tuple and the reducer closure. 2. `pathway.js#shaHash` extracted the first 8 bytes of SHA-256 by round-tripping through a 64-char hex string + a 16-char slice + a hex-decoded Buffer. `digest().subarray(0, 8)` produces the same bytes directly. `computeHash` now also caches `hashableEdgeTags.join('')` and `propagationHashBigInt.toString(16)` once per call (each was computed twice), gates the `manual_checkpoint:true` filter on `includes(...)` so the common path skips the alloc, and reuses a module-scope 20-byte scratch buffer to assemble `encodePathwayContext` with a single `Buffer.from(subarray)` copy-out instead of seven nested allocs. 3. `setCheckpoint` precomputes `PATHWAY_HEADER_BYTES` from the static header overhead instead of allocating a temp object, encoding it, and JSON-stringifying just to read its length. It now reads the direction from `edgeTags[0]` directly: every in-tree caller places it there, the `DataStreamsCheckpointer` shape is updated to match, and the test fixture pinning that arg order is updated in the same commit. Drive-by fix: * `recordCheckpoint` reuses the `BigInt` already computed by the `StatsPoint` returned from `forCheckpoint(...)` instead of running `readBigUInt64LE` a second time. `setCheckpoint` returns `undefined` (rather than `null`) on the disabled fast path so the function shape matches the rest of the file. * `processor.js` drops the `DsmPathwayCodec` import that the inlined byte-count made unreachable; `pathway.js` exports `CONTEXT_PROPAGATION_KEY_BASE64` so the constant calculation is anchored to the actual header key. * `encoding.js` adds an `encodeVarintInto(target, offset, value)` helper so the pathway encoder can write directly into the scratch buffer instead of allocating a per-varint `Uint8Array` and copying.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
W3C tracestate parse, baggage extract, and
_dd.p.*tag inject allrun on every traced HTTP request. Several sub-allocations and one
quadratic parse loop are dropped.
tracestate.js#fromStringwalkedvalue.matchAll(regex)andinserted each match at the front of the result with
Array#unshift,which is
O(n)per call.pushplus a singlereverse()gives thesame final order in
O(n); theMapthen iterates oldest-first sotoString's prepend builds the W3C-spec newest-first wire form.forVendorreuses thestate.toString()value computed one lineabove instead of recomputing it.
text_map.jsswaps threereplaceAll(/[\xNN]/g, ...)regexliterals matching a single character for
replaceAll('=', '~')/replaceAll('~', '='), which skip the regex match path. Them-flag on the extract regex was always dead (no anchors in thepattern).
_injectTraceparentwalkstrace.tagsviaObject.keys(...)instead of bannedfor-in; the trace-tagsobject's prototype chain isn't ours to trust, and
for-inenumerates inherited keys.
_injectBaggageItemsswapsObject.entriesforObject.keys+ indexed read, dropping theper-baggage-item
[k, v]tuple._extractBaggageItemscaches thebaggageTagKeysSeton the propagator (rebuilt only when theconfig array reference changes, e.g. remote-config rotation) and
gates
decodeURIComponentbehindvalue.includes('%')— amicrobenchmark pins the gated path at 13.4x faster than running
decodeURIComponenton plain ASCII baggage and only 2% slower thanthe raw call on percent-encoded values.