v5.111.0 proposal#9150
Conversation
The OIDC-exchanged token from the npm registry is only valid for the publish operation; using it for npm dist-tag add produced E401. Remove the multi-tag logic and the OIDC exchange entirely: each branch now publishes with a single tag (latest for the current release line, latest-nodeXX for older lines), which is all npm's trusted publishing model supports without a stored token. Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…ad (#9086) Each test cell uploaded its own report to Codecov, so a commit sent ~430 uploads. Codecov silently parks uploads past its ~150-per-commit ceiling in `started` and never merges them, so roughly 40 reports' worth of coverage was dropped from every commit. The Datadog coverage upload was separately broken: `upload-coverage-artifact` probed for files with `find -maxdepth 1`, but the report lives one level deeper at `coverage/node-<version>/`, so the check found nothing, no `coverage-*` artifact was produced, and `datadog-ci coverage upload` reported nothing while passing green. All Green already downloads every `coverage-*` artifact to drive the Datadog upload, so it is the one place that sees a whole commit's coverage. It now groups the per-cell reports by integration and uploads ~100 groups to both backends instead of ~430 per-cell reports: 1. `upload-coverage-artifact` recurses for the report files and names each artifact `coverage-<flag>__<job>-<index>` so matrix cells that share a flag (cypress varies `spec` outside its flag) stop clobbering each other. 2. `scripts/group-coverage.mjs` sorts each cell's report into its integration's directory, stripping Node.js and library versions, which are noise for "which integration regressed". Reports are not merged locally — both backends merge same-flag uploads server-side — so each report passes through byte-for-byte and the harness needs no istanbul dependency in All Green's sparse checkout. ~430 cells collapse to ~100 groups. 3. Each cell emits both lcov and istanbul JSON: Codecov reads branch and function coverage from the JSON (its lcov parser ingests only line hits), Datadog reads the lcov and does not ingest the JSON. All Green uploads each format to the backend that reads it, one group per integration, flagged with the integration name. `master-coverage` still rides every Codecov upload on PRs targeting master so the `codecov/patch` gate fires; reruns de-duplicate to the newest run so a stale rerun's counters are not double-counted.
…9074) The "should emit one kafka.produce span per topicMessages entry" test hard-coded kafka.messages.offsets to start_offset "0". Kafka produce is at-least-once: a transient NOT_LEADER_FOR_PARTITION right after topic creation makes kafkajs retry and advance the broker-assigned base offset past 0, so the span faithfully reports a non-zero offset (CI observed "1") and the assertion never matched before the timeout. The expected offsets are read back from the sendBatch result instead, which still pins the per-topic isolation the test was written for. Each topicMessages entry is its own root span, so the two spans are separate traces the agent may deliver in a single payload in any order; the span lookup now scans every trace rather than only traces[0].
Key each expanded major by its bare major (`versions/mongodb@3`) instead of a bounded range (`versions/mongodb@>=3.0.0 <4.0.0`). The bare major reads cleanly as a folder name and covers each major's latest, including the floor major's, which the range form dropped. Follows the shared resolver from #9019. Widening the matrix to every major's latest surfaced several latent failures: 1. A bare-major key resolves to that major's newest version, so a range ending inside its top major overshoots: microgateway-core `>=2.1 <=3.0.0` keyed `3` installed 3.3.7 and the span came back `web.request` instead of `microgateway.request`. The top major keeps the declared range whenever it stops short of the major's ceiling; fully-spanned and lower majors stay bare. 2. `versions/ai@4` and `@langchain/core@0` resolve to versions that have no VCR cassette and would hit the live API (401). A central `brokenVersions` registry drops a matching resolved version and surfaces the reason as a pending test, each entry a stop-gap carrying a TODO. 3. A manifest carrying a `workspace:` protocol dependency was copied verbatim into a generated workspace, so yarn failed with "Couldn't find any versions for X that matches workspace:*". Fall back to the pinned compatible version. 4. The Apollo fetch-failure test gated the error span on `version > '2.3.0'`, a lexicographic compare that breaks once the key is bare (`'2' > '2.3.0'` is false). Compare the resolved version with `semver.gt`. 5. Single-digit keying renames folders that several specs hard-code by range (express, langchain, bedrock runtime, aws-sdk). The bedrock require threw after `agent.load` with no `agent.close`, leaving the Remote Config poll running and hanging the job to the 45-minute timeout; the others silently skipped suites. Point the requires at the renamed folders.
…As (#9101) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…lPropertyName (#8943) `internalPropertyName` carried a hand-maintained full property path (`telemetry.debug`, `remoteConfig.enabled`) that diverged from the canonical env name, so the same configuration was named twice and the two could drift. A new `namespace` field nests the canonical env name under a property path (`telemetry.DD_TELEMETRY_DEBUG`), so the runtime path is derived from the canonical name plus a namespace with no separate alias to maintain. It takes precedence over `configurationNames` and `internalPropertyName` when resolving the path, in the eslint sync rule, and in the type generator. Every group of entries (remoteConfig, telemetry, appsec api-security and sca, profiling, stats, llmobs, iast security-controls, the per-integration llm span limits) moves onto it, and their runtime consumers are updated to the renamed keys. The canonical name telemetry reports is unchanged: the rename only affects how the property path is derived, not which env name is sent. The namespace object is always built from the defaults, so the optional chaining and `?? 0` fallbacks on the api-security accesses guarded a state that cannot occur and are dropped. Drive-by fix: * Exempt integration-test fixture apps from `n/no-extraneous-require`: they `require('dd-trace')` as a customer does, so the rule fired once dd-trace became locally resolvable (yarn link) but stayed silent on a clean install.
…te (#9026) import-in-the-middle scanned the include and exclude arrays once per resolved module — up to ~290 include entries (RegExp.test or string compare) plus a fileURLToPath on every resolve, nearly all against modules that match nothing. Supplying iitm 3.2.0's shouldInclude predicate replaces that scan with a single Set lookup for bare specifiers and one combined RegExp covering every instrumented node_modules path and the configured security-control subpaths, plus one RegExp for the exclusions. Over a mixed resolve corpus this drops the per-resolve matching cost from ~2.5µs to ~25ns (about 100x). The Set also carries each built-in's node: specifier, mirroring iitm's include expansion, so `import 'node:crypto'` stays instrumented alongside `import 'crypto'`. Package names pass through regexpEscape so a metacharacter in a future package name cannot mis-match. The .mjs rewriter loader spec was the repository's only .spec.mjs and no CI job ran it: the misc suite glob matched .spec.js only, and the exercised-tests gate collected .spec.js/.test.mjs but not .spec.mjs, so it could not flag the orphan. 1. Match *.spec.{js,mjs} in test:instrumentations:misc so the spec runs. 2. Widen verify-exercised-tests globs to @(spec|test).@(js|mjs|cjs) so every naming convention is tracked and an unrun one fails the gate. 3. Load the loader through require(esm) where the runtime supports it so its transforms land on nyc's CommonJS instrumentation path; gate on process.features.require_module so Node 18 falls back to import() instead of crashing the suite on the CommonJS compiler's SyntaxError.
The sampling tests in #9030 build their own taggers with `{ llmobs: { enabled: true } }`, and #8943 renamed that config key to `DD_LLMOBS_ENABLED` everywhere it could see. The two landed in parallel, so #8943 normalized the rest of the file but never saw these four fixtures. On master the tagger now reads `DD_LLMOBS_ENABLED`, finds it undefined, and returns before registering the span; `Tagger.tagMap.get` then yields undefined and the "DROPPED at sampleRate 0" test throws synchronously, aborting the whole `test:llmobs:sdk:ci` run with exit 7. Fixes: https://github.com/DataDog/dd-trace-js/actions/runs/28265509637/job/83751636644
) feat(graphql): migrate instrumentation to orchestrion Migrates GraphQL instrumentation from shimmer wrappers to orchestrion AST rewriting for graphql execute / parse / validate entry points, including CJS and ESM paths for graphql >=0.10 and @graphql-tools/executor. Moves resolver instrumentation into the GraphQL execute plugin. The execute plugin now owns per-execute root context, resolver wrapping, resolve-span lifecycle, source tracking, and resolver hook invocation. The old separate resolve plugin is removed. Preserves and tests the existing cross-feature contracts: - IAST still receives one apm:graphql:resolve:start publish per resolver call, using the actual GraphQL args object. - AppSec still receives resolver payloads through datadog:graphql:resolver:start and can abort synchronously through the shared abort controller. - depth only limits resolve-span creation; IAST/AppSec resolver publishes still happen for depth-gated fields. - depth-gated resolvers now honor abort signals before falling through the no-span fast path. - caller-owned execute args and contextValue are preserved without mutation. - default field resolver behavior matches graphql for primitive parent values. - graphql-yoga / @graphql-tools/executor execution is instrumented. Adds public TypeScript declarations for the GraphQL resolve hook and FieldContext payload. Keeps the implementation orchestrion-only, with no shimmer fallback, and updates the GraphQL long benchmark calibration for the migrated hot path. Regression coverage was added for: - resolver abort behavior past the configured depth - depth: 0 AppSec resolver-channel publishing - primitive-source defaultFieldResolver parity - caller-supplied and frozen execute args - primitive contextValue forwarding - Yoga normalized executor instrumentation - IAST/AppSec per-resolver channel cardinality Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> Co-authored-by: Ruben Bridgewater <ruben@bridgewater.de>
Bumps the test-versions group with 1 update in the /integration-tests/esbuild directory: [openai](https://github.com/openai/openai-node). Updates `openai` from 6.44.0 to 6.45.0 - [Release notes](https://github.com/openai/openai-node/releases) - [Changelog](https://github.com/openai/openai-node/blob/main/CHANGELOG.md) - [Commits](openai/openai-node@v6.44.0...v6.45.0) --- updated-dependencies: - dependency-name: openai dependency-version: 6.45.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: test-versions ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…th 10 updates (#9127) Bumps the cloud-and-messaging group with 10 updates in the /packages/dd-trace/test/plugins/versions directory: | Package | From | To | | --- | --- | --- | | [@aws-sdk/client-bedrock-runtime](https://github.com/aws/aws-sdk-js-v3/tree/HEAD/clients/client-bedrock-runtime) | `3.1074.0` | `3.1075.0` | | [@aws-sdk/client-dynamodb](https://github.com/aws/aws-sdk-js-v3/tree/HEAD/clients/client-dynamodb) | `3.1074.0` | `3.1075.0` | | [@aws-sdk/client-kinesis](https://github.com/aws/aws-sdk-js-v3/tree/HEAD/clients/client-kinesis) | `3.1074.0` | `3.1075.0` | | [@aws-sdk/client-lambda](https://github.com/aws/aws-sdk-js-v3/tree/HEAD/clients/client-lambda) | `3.1074.0` | `3.1075.0` | | [@aws-sdk/client-s3](https://github.com/aws/aws-sdk-js-v3/tree/HEAD/clients/client-s3) | `3.1074.0` | `3.1075.0` | | [@aws-sdk/client-sfn](https://github.com/aws/aws-sdk-js-v3/tree/HEAD/clients/client-sfn) | `3.1074.0` | `3.1075.0` | | [@aws-sdk/client-sns](https://github.com/aws/aws-sdk-js-v3/tree/HEAD/clients/client-sns) | `3.1074.0` | `3.1075.0` | | [@aws-sdk/client-sqs](https://github.com/aws/aws-sdk-js-v3/tree/HEAD/clients/client-sqs) | `3.1074.0` | `3.1075.0` | | [azure-functions-core-tools](https://github.com/Azure/azure-functions-core-tools) | `4.12.0` | `4.12.1` | | [durable-functions](https://github.com/Azure/azure-functions-durable-js) | `3.3.1` | `3.4.0` | Updates `@aws-sdk/client-bedrock-runtime` from 3.1074.0 to 3.1075.0 - [Release notes](https://github.com/aws/aws-sdk-js-v3/releases) - [Changelog](https://github.com/aws/aws-sdk-js-v3/blob/main/clients/client-bedrock-runtime/CHANGELOG.md) - [Commits](https://github.com/aws/aws-sdk-js-v3/commits/v3.1075.0/clients/client-bedrock-runtime) Updates `@aws-sdk/client-dynamodb` from 3.1074.0 to 3.1075.0 - [Release notes](https://github.com/aws/aws-sdk-js-v3/releases) - [Changelog](https://github.com/aws/aws-sdk-js-v3/blob/main/clients/client-dynamodb/CHANGELOG.md) - [Commits](https://github.com/aws/aws-sdk-js-v3/commits/v3.1075.0/clients/client-dynamodb) Updates `@aws-sdk/client-kinesis` from 3.1074.0 to 3.1075.0 - [Release notes](https://github.com/aws/aws-sdk-js-v3/releases) - [Changelog](https://github.com/aws/aws-sdk-js-v3/blob/main/clients/client-kinesis/CHANGELOG.md) - [Commits](https://github.com/aws/aws-sdk-js-v3/commits/v3.1075.0/clients/client-kinesis) Updates `@aws-sdk/client-lambda` from 3.1074.0 to 3.1075.0 - [Release notes](https://github.com/aws/aws-sdk-js-v3/releases) - [Changelog](https://github.com/aws/aws-sdk-js-v3/blob/main/clients/client-lambda/CHANGELOG.md) - [Commits](https://github.com/aws/aws-sdk-js-v3/commits/v3.1075.0/clients/client-lambda) Updates `@aws-sdk/client-s3` from 3.1074.0 to 3.1075.0 - [Release notes](https://github.com/aws/aws-sdk-js-v3/releases) - [Changelog](https://github.com/aws/aws-sdk-js-v3/blob/main/clients/client-s3/CHANGELOG.md) - [Commits](https://github.com/aws/aws-sdk-js-v3/commits/v3.1075.0/clients/client-s3) Updates `@aws-sdk/client-sfn` from 3.1074.0 to 3.1075.0 - [Release notes](https://github.com/aws/aws-sdk-js-v3/releases) - [Changelog](https://github.com/aws/aws-sdk-js-v3/blob/main/clients/client-sfn/CHANGELOG.md) - [Commits](https://github.com/aws/aws-sdk-js-v3/commits/v3.1075.0/clients/client-sfn) Updates `@aws-sdk/client-sns` from 3.1074.0 to 3.1075.0 - [Release notes](https://github.com/aws/aws-sdk-js-v3/releases) - [Changelog](https://github.com/aws/aws-sdk-js-v3/blob/main/clients/client-sns/CHANGELOG.md) - [Commits](https://github.com/aws/aws-sdk-js-v3/commits/v3.1075.0/clients/client-sns) Updates `@aws-sdk/client-sqs` from 3.1074.0 to 3.1075.0 - [Release notes](https://github.com/aws/aws-sdk-js-v3/releases) - [Changelog](https://github.com/aws/aws-sdk-js-v3/blob/main/clients/client-sqs/CHANGELOG.md) - [Commits](https://github.com/aws/aws-sdk-js-v3/commits/v3.1075.0/clients/client-sqs) Updates `azure-functions-core-tools` from 4.12.0 to 4.12.1 - [Release notes](https://github.com/Azure/azure-functions-core-tools/releases) - [Changelog](https://github.com/Azure/azure-functions-core-tools/blob/4.12.1/release_notes.md) - [Commits](Azure/azure-functions-core-tools@4.12.0...4.12.1) Updates `durable-functions` from 3.3.1 to 3.4.0 - [Release notes](https://github.com/Azure/azure-functions-durable-js/releases) - [Commits](Azure/azure-functions-durable-js@v3.3.1...v3.4.0) --- updated-dependencies: - dependency-name: "@aws-sdk/client-bedrock-runtime" dependency-version: 3.1075.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: cloud-and-messaging - dependency-name: "@aws-sdk/client-dynamodb" dependency-version: 3.1075.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: cloud-and-messaging - dependency-name: "@aws-sdk/client-kinesis" dependency-version: 3.1075.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: cloud-and-messaging - dependency-name: "@aws-sdk/client-lambda" dependency-version: 3.1075.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: cloud-and-messaging - dependency-name: "@aws-sdk/client-s3" dependency-version: 3.1075.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: cloud-and-messaging - dependency-name: "@aws-sdk/client-sfn" dependency-version: 3.1075.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: cloud-and-messaging - dependency-name: "@aws-sdk/client-sns" dependency-version: 3.1075.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: cloud-and-messaging - dependency-name: "@aws-sdk/client-sqs" dependency-version: 3.1075.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: cloud-and-messaging - dependency-name: azure-functions-core-tools dependency-version: 4.12.1 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: cloud-and-messaging - dependency-name: durable-functions dependency-version: 3.4.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: cloud-and-messaging ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…pdates (#9128) Bumps the test-versions group with 4 updates in the /packages/dd-trace/test/plugins/versions directory: [@types/node](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/node), [pnpm](https://github.com/pnpm/pnpm/tree/HEAD/pnpm11/pnpm), [protobufjs](https://github.com/protobufjs/protobuf.js) and [stripe](https://github.com/stripe/stripe-node). Updates `@types/node` from 26.0.0 to 26.0.1 - [Release notes](https://github.com/DefinitelyTyped/DefinitelyTyped/releases) - [Commits](https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/node) Updates `pnpm` from 11.8.0 to 11.9.0 - [Release notes](https://github.com/pnpm/pnpm/releases) - [Changelog](https://github.com/pnpm/pnpm/blob/main/pnpm11/pnpm/CHANGELOG.md) - [Commits](https://github.com/pnpm/pnpm/commits/v11.9.0/pnpm11/pnpm) Updates `protobufjs` from 8.6.4 to 8.6.5 - [Release notes](https://github.com/protobufjs/protobuf.js/releases) - [Changelog](https://github.com/protobufjs/protobuf.js/blob/master/CHANGELOG.md) - [Commits](protobufjs/protobuf.js@protobufjs-v8.6.4...protobufjs-v8.6.5) Updates `stripe` from 22.2.3 to 22.3.0 - [Release notes](https://github.com/stripe/stripe-node/releases) - [Changelog](https://github.com/stripe/stripe-node/blob/master/CHANGELOG.md) - [Commits](stripe/stripe-node@v22.2.3...v22.3.0) --- updated-dependencies: - dependency-name: "@types/node" dependency-version: 26.0.1 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: test-versions - dependency-name: pnpm dependency-version: 11.9.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: test-versions - dependency-name: protobufjs dependency-version: 8.6.5 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: test-versions - dependency-name: stripe dependency-version: 22.3.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: test-versions ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…#9110) The request helper retries once on a 5xx with a 5–7.5 s jittered delay, but the getKnownTests "should return an error if the request fails" test mocked a single 500 interceptor and used real timers. The retried request had no interceptor and its retry timer always exceeded mocha's 5 s timeout, so the callback never fired and the test timed out. Collapse the retry delay to 0 ms and add a second 500 interceptor so the test exercises the real retry path and asserts both requests are consumed.
…9112) `internalPropertyName` made each supported config carry a second hand-maintained runtime path next to its canonical env name. Drop that alias and derive the config-object path from the canonical name, with Test Optimization entries grouped under `testOptimization` and top-level entries using their canonical key directly. The plugin shared-config boundary still forwards the existing per-plugin keys, including the flat dynamic-instrumentation flag `CiPlugin.configure` receives; plugins do not receive the namespaced tracer config. Drive-by fix: * Drop duplicate benchmark `enabled` leaves left behind by the previous namespace migration.
The NoSQL injection analyzer used `enterWith` to mark the async context, which leaked the marker past the query. A request that ended before its query finished stranded the marker for the next request, so that request's injection went unreported. Two concurrent queries within the same request also saw each other's marker, leaving one unanalyzed. Binding the marker on the query-build channel fixed the leak but lost it on deferred queries. A mongoose query builds, executes, and reaches the driver in three separate async steps. `find().then()`/`.exec()` runs the driver a turn after the synchronous build, outside the build's `runStores` scope, so the driver re-analyzed the same filter and reported the injection twice. Binding the marker around the execution channel instead covers the full async scope that reaches the driver, and `runStores` restores the parent on its own. This re-enables the mongoose nosqli suite on Node 20 + Express 5 (skipped for APPSEC-66705) and the mquery nosqli integration suite (skipped for APPSEC-62431, where the unscoped marker caused each injection to be reported N+1 times). The mquery suite skips mongodb >=7 on Node < 20: that driver reads Web Crypto off the `crypto` global, which Node 18 does not expose by default.
The `depth` filter counted a resolver's full execution path, including the numeric list indices that `collapse` later folds away. The same query therefore reached a different depth depending on whether `collapse` was on: a field one list-hop below the limit was instrumented when collapsing was off and dropped when it was on, even though both describe the same selection-set nesting. Count only selection-set segments (string path keys) toward `depth`, so the limit tracks query structure rather than execution artifacts. This shifts which resolvers are instrumented at a given `depth`, so it is gated behind `DD_MAJOR`: the v5 line keeps the old list-index counting when collapsing is on, and v6 counts selection-set depth only. The `countListIndices` config flag carries the gate so `shouldInstrumentNode` stays free of version checks. Fixes: #7468
Adds AppSec support for AWS lambda to dd-trace-js by introducing DC handlers that allow the datadog-lambda-js layer to delegate WAF execution to the tracer.
Overall package sizeSelf size: 6.53 MB Dependency sizes| name | version | self size | total size | |------|---------|-----------|------------| | import-in-the-middle | 3.2.0 | 104.26 kB | 843.44 kB | | opentracing | 0.14.7 | 194.81 kB | 194.81 kB | | dc-polyfill | 0.1.11 | 25.74 kB | 25.74 kB |🤖 This report was automatically generated by heaviest-objects-in-the-universe |
|
1. Remove test/asserts/profile.js. Nothing requires it; the only profile value-type assertion in use is the standalone helper in the profiling agent exporter spec. 2. Fix the "amont" -> "amount" typo in both telemetry heartbeat comments.
* ci(scripts): disable V8 Maglev for Windows test children Network-heavy specs intermittently abort with STATUS_STACK_BUFFER_OVERRUN (0xC0000409) on Windows: mocha-run-file forces process.exit() the moment mocha finishes, and that races V8's Maglev teardown while libuv still has in-flight sockets from the spec's real HTTP traffic. The child dies with no stderr and no crash report, so mocha-parallel-files only sees a non-zero exit and reports the file as crashed (e.g. inferred_proxy.spec.js, which is just whichever network spec lost the race that run). --no-maglev sidesteps the faulty tier and can only be passed as a CLI flag, not through NODE_OPTIONS, so the runner injects it into the spawned per-file node processes on win32. Refs: nodejs/node#62260 * ci(scripts): gate --no-maglev by V8 version The Windows --no-maglev workaround was appended on every win32 child, but the top-level --maglev/--no-maglev toggle only exists from V8 11 (Node 20). On the supported Node 18 line (V8 10) the flag does not exist, so each spawned spec aborts with `bad option: --no-maglev` (exit 9) before mocha runs, breaking the parallel runner for Windows Node 18. Gate on the running V8 major, which the children inherit via the shared binary.
Node 20 defines `fs.opendir` / `fs.opendirSync` as lazy getter+setter accessor
properties that resolve the real function on first read. Handing such a property
to `shimmer.wrap` instrumented the getter, so the property access was traced
while the real call ran uninstrumented — IAST then saw no `opendir` operation
and reported no PATH_TRAVERSAL vulnerability. Node 18/22 define these as plain
data properties, which is why the gap was Node-20-only.
Extend `shimmer.wrap`'s existing `replaceGetter` option to cover the lazy
getter+setter case: resolve the value once through the getter and wrap that,
rather than re-implementing the resolution in the fs instrumentation. The
property keeps its original shape — a getter+setter pair stays a getter+setter
pair whose setter still materializes a writable data property on assignment, so
the descriptor remains observationally identical for a downstream consumer that
inspects or overwrites it on that Node.js version.
A getter+setter pair without `replaceGetter` keeps being wrapped in place — the
wrapper becomes the new getter and the original setter is left untouched, as the
`url` instrumentation relies on for the `URL.prototype` `host` / `hostname`
accessors. Only a setter-only property throws. Narrowing the guard to reject
every unguarded getter+setter pair would have thrown inside the `url` hook,
silently dropping that instrumentation and the AppSec / IAST coverage built on
it.
`fs.js` now passes `{ replaceGetter: true }` through its `wrap` / `massWrap`
helpers instead of carrying its own materialization helper.
The second sendBatch/send in these tests is a real broker call that the
test expects to succeed after the stub is restored. A fresh topic's first
produce routinely returns the retryable NOT_LEADER_FOR_PARTITION while
metadata propagates; kafkajs normally refreshes metadata and retries it,
but retry:{retries:0} stripped that safety net, surfacing the transient
error as a hard KafkaJSNonRetriableError and flaking CI.
retries:0 bought nothing for the first call it was meant to speed up: the
stubbed UNKNOWN error is non-retryable, so it already fails on the first
attempt regardless of the retry count. Removing it restores the retry on
the real call while leaving the stubbed-rejection assertions unchanged.
The non-native runtime metrics test asserted that runtime.node.gc.pause.by.type.95percentile lands in [0.1ms, 100ms). On a fast or idle runner a gc_type can have a single scavenge sample whose p95 sits below 0.1ms, so the matcher rejected a legitimate value and the test failed on Windows. The bounds exist to catch a unit-conversion regression, not to assert a minimum pause length: a sub-microsecond value would mean the ms->ns conversion was dropped, a value over 100ms that it was left in ms or seconds. Lower the floor to 1µs so it still trips on a dropped conversion while no longer assuming a GC pause takes at least 0.1ms.
The http plugin requires datadog-plugin-google-cloud-pubsub's pubsub-push-subscription module at the top of its own file, so every process that instruments http pulls in that plugin graph even though it is only used on GCP Cloud Run (K_SERVICE set and DD_TRACE_GCP_PUBSUB_PUSH_ENABLED not opted out). Moving the require inside the existing gate keeps the module — and its transitive graph — out of the startup path for every other deployment. Drive-by fix: * Drop the try/catch around the require: the pubsub plugin ships with the tracer, so this require cannot fail independently of the tracer's own load.
…spec (#9078) The Bluebird specs set global.Promise = Bluebird in beforeEach and restored it only in afterEach, holding the mutation across the awaited span round-trip. Under load the subprocess span arrives late, and the shared assertion helper only rejects once a payload has already errored, so the spec hangs to mocha's 5 s timeout while leftover async keeps global.Promise flipped, corrupting context for every later spec in the file. 1. The instrumentation only reads global.Promise synchronously, when the wrapped method runs. Scope the swap to that window via withBluebird() and restore it in a finally before any await. 2. assert.strictEqual(promise.constructor, Bluebird) asserts a property that depends on module-load order and the call-time global, not on any contract the instrumentation makes; it held locally and on Node 18 but saw the native constructor on Node 20. Assert the real contract instead: the promisified call resolves with the expected stdout and still produces the span. 3. Await the span expectation and the subprocess completion together in one Promise.all per spec, so a late or failed span cannot float onto the shared agent and match a later spec's expectation. Drive-by fix: * Drop the dead `delete require.cache[require.resolve('util')]` (util is a builtin). * Replace the try/catch rejection check with assert.rejects.
* feat(span-stats): add OTLP metrics export for span stats
Export client-computed span stats as OTLP metrics (dd.trace.span.hits,
dd.trace.span.errors, dd.trace.span.top_level_hits, dd.trace.span.duration)
via a new OtlpStatsExporter alongside the existing Datadog /v0.6/stats
exporter.
Enabled via DD_TRACE_OTEL_METRICS_ENABLED=true, or auto-enabled when both
OTEL_TRACES_EXPORTER=otlp and OTEL_METRICS_EXPORTER=otlp are set. URL and
protocol are derived from the OTLP trace export configuration.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* test(config): register traceMetrics as internal runtime property
traceMetrics is a computed aggregate derived from OTEL_TRACES_EXPORTER,
OTEL_METRICS_EXPORTER, and DD_TRACE_OTEL_METRICS_ENABLED — not a raw
user-facing key — so it belongs in INTERNAL_RUNTIME_PROPERTIES alongside
sampler and stableConfig.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(config): guard traceMetrics URL derivation against invalid OTLP endpoint
When hostname is an unbracketed IPv6 address (e.g. ::1), the defaultOtlpBase
is http://::1:4318 which is not a valid URL. The new URL() call in the
traceMetrics block was the first code path to actually parse the string,
causing a TypeError that crashed config construction.
Wrap the URL derivation in a try/catch so that a malformed traces endpoint
falls back to the localhost default without throwing.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* chore(config): regenerate config types for DD_TRACE_OTEL_METRICS_ENABLED
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(config): add configurationNames to DD_TRACE_OTEL_METRICS_ENABLED entry
The eslint-config-names-sync rule verifies that every leaf property in
TracerOptions (index.d.ts) has a matching configurationNames entry in
supported-configurations.json. The entry for DD_TRACE_OTEL_METRICS_ENABLED
only had internalPropertyName, which is not checked by the rule.
Adding configurationNames: ["traceMetricsEnabled"] ties the two files
together and satisfies the lint check. Regenerated config types to match.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* bug fix
* update post RFC discussion
* bring implementation bug fix
* clean how cinfgs are set
* feat(otlp-trace-metrics): align span-stats export with the trace-metrics contract
Update the OTLP trace-metrics export to match the agreed RFC/system-test contract:
- Rename the enablement env var to OTEL_CLIENT_STATS_COMPUTATION_ENABLED and add
DD_TRACE_OTEL_SEMANTICS_ENABLED (OTel-semantics mode: emit only OTel attributes, no dd.*).
- Emit a single histogram named traces.span.sdk.metrics.duration.
- Map dimensions to OTel attributes (span.name, span.kind, http.*, rpc.* from grpc tags) and
convey errors via OTel status.code; default mode also adds dd.operation.name, dd.span.type,
dd.origin and dd.span.top_level.
- Add telemetry.sdk.{name,language,version} resource attributes and emit process tags as dd.<key>
(default mode only); gate all dd.* resource attributes behind default mode.
- Drive the flush/export cadence from OTEL_METRIC_EXPORT_INTERVAL and drop the
_DD_TRACE_STATS_WRITER_INTERVAL override.
- Read grpc.status.code from span.metrics (numeric) with a meta fallback.
Update unit tests accordingly and regenerate config types.
Co-authored-by: Cursor <cursoragent@cursor.com>
* feat(otlp-trace-metrics): report service identity per InstrumentationScope
Partition span-stats data points by service so one OTLP payload can carry
multiple services, each as its own InstrumentationScope with service.name,
service.version and deployment.environment.name. These move off the resource,
which now only carries SDK identity, host.name and dd.* attributes.
Fix the trace-metrics flush cadence at 10s (no longer driven by
OTEL_METRIC_EXPORT_INTERVAL); the internal _DD_TRACE_METRICS_OTEL_FLUSH_INTERVAL
overrides it in tests only.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(otlp-trace-metrics): apply internal flush interval override
The generic env applier only reads DD_/OTEL_ prefixed vars, so the
internal _DD_TRACE_METRICS_OTEL_FLUSH_INTERVAL (which starts with _DD_)
was never wired into config. Read it explicitly so the test-only flush
cadence override takes effect.
Co-authored-by: Cursor <cursoragent@cursor.com>
* feat(otlp-span-stats): emit fixed explicit-bounds histogram from the DDSketch
Derive the OTLP duration histogram from each group's DDSketch into the
spanmetrics-connector default bounds (in seconds), and drop the duplicate
exact-cell accumulator in span_stats. Each group now emits at most two data
points (ok/error) with a per-group dd.span.top_level heuristic, mirroring
libdatadog.
Co-authored-by: Cursor <cursoragent@cursor.com>
* refactor(span-stats): carry service identity as resource attributes
Move service.name/service.version/deployment.environment.name onto the OTLP
resource (the configured default service), emit a single InstrumentationScope,
and add service.name as a data-point attribute only when a span's service
differs from the configured default. Thread DD_SERVICE through the processor so
the transformer can compare against it.
Co-authored-by: Cursor <cursoragent@cursor.com>
* refactor(span-stats): drop redundant dd-trace InstrumentationScope
The exported OTLP metrics no longer carry an InstrumentationScope: a `dd-trace`
scope (name/version) is redundant with the resource's telemetry.sdk.* attributes.
The single scopeMetrics omits the scope field.
Co-authored-by: Cursor <cursoragent@cursor.com>
* refactor(span-stats): datadog.* attribute prefix and OTEL_TRACES_SPAN_METRICS_ENABLED
Rename the OTLP trace-metric attributes from dd.* to datadog.* (operation.name,
span.type, span.top_level, origin, runtime_id, datadog.<process tags>) and rename
the enablement env var OTEL_CLIENT_STATS_COMPUTATION_ENABLED ->
OTEL_TRACES_SPAN_METRICS_ENABLED.
Co-authored-by: Cursor <cursoragent@cursor.com>
* feat(otlp): set _dd.stats_computed resource attribute on OTLP traces when trace metrics enabled
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(span-stats): use timer.unref?.() for Electron compatibility
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(rebase): restore extractRootTags; rename traceMetricsEnabled; add otelSemanticsEnabled to types
span_format.js: rebase conflict resolved to branch's addTag refactor which no longer exists in
master — revert to explicit typeof checks while keeping the FR06.3 BUG comment.
index.d.ts: rename traceMetricsEnabled -> otlpTraceMetricsEnabled to match supported-configurations.json;
add otelSemanticsEnabled (DD_TRACE_OTEL_SEMANTICS_ENABLED). Fixes eslint-config-names-sync errors.
Regenerate generated-config-types.d.ts from updated inputs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(span-stats): wire OTLP metrics endpoint/protocol and trim dead code
The crash fix: SpanStatsProcessor read config.otelMetricsUrl/otelMetricsProtocol,
which Config never set (dropped in 40014ae), so `new URL(undefined)` threw
ERR_INVALID_URL and crashed tracer init whenever OTLP trace metrics were enabled.
Read the canonical OTEL_EXPORTER_OTLP_METRICS_ENDPOINT/OTEL_EXPORTER_OTLP_METRICS_PROTOCOL
directly instead of introducing redundant alias properties.
Also fix the dead auto-enable check: `this.otelMetricsEnabled` does not exist
(the property is DD_METRICS_OTEL_ENABLED), so `undefined === true` made the
"auto-enable when OTLP traces + OTEL metrics are on" path never trigger.
Minimize the diff vs master without changing behavior:
- drop 4 unused SpanAggStats fields (errorDuration/topLevel*) and their test
- collapse the duplicate JSON/protobuf transformer methods into transform()
- remove two `// BUG` WIP narration comments (reverts the comment-only
span_format.js hunk; tracked separately)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
* refactor(span-stats): privatize internals, trim telemetry, simplify transformer
- Make _drainBuckets / _toLegacyPayload true private (#) — neither
crosses the class boundary; _ prefix implied false publicness
- Guard SpanStatsExporter construction behind !otlpTraceMetricsEnabled
so it is never instantiated when the OTLP path is active
- Replace #errorStatus() / #boolAttr() one-shot methods with inline
literals and a module-level ERROR_STATUS_ATTR constant to avoid
per-call allocations
- sketchToFixedHistogram now returns number[] directly; #pushPoint
references EXPLICIT_BOUNDS_SECONDS from the module constant
- Remove this.recordTelemetry calls from OtlpStatsExporter.export —
not part of the OTLP trace-metrics spec
- Rewrite whitebox _drainBuckets test as blackbox: assert buckets are
empty after onInterval() instead of calling the private method
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix configs
Co-authored-by: Munir Abdinur <munir.abdinur@datadoghq.com>
* chore: regenerate config types after supported-configurations.json update
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(opentelemetry): use otelSemanticsEnabled config key instead of DD_TRACE_OTEL_SEMANTICS_ENABLED
Our branch maps DD_TRACE_OTEL_SEMANTICS_ENABLED to the internal property
otelSemanticsEnabled via supported-configurations.json internalPropertyName.
The merged master code was reading config.DD_TRACE_OTEL_SEMANTICS_ENABLED
directly, which was undefined in our config layout.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(test): update span.spec.js to use otelSemanticsEnabled config key
The test was setting config.DD_TRACE_OTEL_SEMANTICS_ENABLED but span.js
now reads config.otelSemanticsEnabled (the internal property name mapped
from DD_TRACE_OTEL_SEMANTICS_ENABLED via supported-configurations.json).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(config): use string default for _DD_TRACE_METRICS_OTEL_FLUSH_INTERVAL
Schema requires default to be a string or null, not a number literal.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(config): update description for _DD_TRACE_METRICS_OTEL_FLUSH_INTERVAL to match registry
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(config): use short description for _DD_TRACE_METRICS_OTEL_FLUSH_INTERVAL
The description field maps to Short Description in the config registry.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(config): remove description field from _DD_TRACE_METRICS_OTEL_FLUSH_INTERVAL
No other int entry with allowed field uses description; may be mutually exclusive in schema.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(otlp-span-stats): address Codex review comments
- Use dd-trace VERSION (not app version) for telemetry.sdk.version resource attribute
- Pass OTEL_EXPORTER_OTLP_METRICS_HEADERS and OTEL_EXPORTER_OTLP_METRICS_TIMEOUT
to OtlpStatsExporter so authenticated/custom endpoints work correctly
- Fix index.d.ts doc: env var is OTEL_TRACES_SPAN_METRICS_ENABLED and
auto-enable condition is DD_METRICS_OTEL_ENABLED (not OTEL_METRICS_EXPORTER=otlp)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(lint): fix import order and no-useless-undefined in span_stats and otlp-span-stats
- Move ../../../version import before ./constants to satisfy import/order rule
- Remove explicit = undefined default for headers param (unicorn/no-useless-undefined)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* test(config): cover _DD_TRACE_METRICS_OTEL_FLUSH_INTERVAL override branch
Adds test that exercises the setAndTrack call inside the conditional
that reads the internal flush interval override from the environment.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(config): align DD_TRACE_OTEL_SEMANTICS_ENABLED with registry definition
The config registry has this entry as a plain boolean with default "false"
and no internalPropertyName. Revert our custom mapping so the entry matches
the registry exactly — the validator compares against the registered definition.
All code that previously accessed config.otelSemanticsEnabled now reads
config.DD_TRACE_OTEL_SEMANTICS_ENABLED directly; the destructuring alias
in span_stats.js preserves the otelSemanticsEnabled local variable name.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(lint): add configurationNames for otelSemanticsEnabled to satisfy eslint-config-names-sync
The rule requires that every option name in index.d.ts has a corresponding
entry in supported-configurations.json (as a key, configurationNames value,
or internalPropertyName). Adding configurationNames: ["otelSemanticsEnabled"]
to DD_TRACE_OTEL_SEMANTICS_ENABLED satisfies this while keeping default: "false"
to match the registry. The generator uses configurationNames[0] as the config
key, so code reverts to config.otelSemanticsEnabled.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor(config): remove internalPropertyName and unnecessary configurationNames
Per reviewer feedback:
- Remove internalPropertyName from _DD_TRACE_METRICS_OTEL_FLUSH_INTERVAL; the flush
interval is read directly via getEnvironmentVariable() in #applyCalculated
- Remove configurationNames/internalPropertyName from OTEL_TRACES_SPAN_METRICS_ENABLED
and drop otlpTraceMetricsEnabled as a programmatic option from index.d.ts; use
this.OTEL_TRACES_SPAN_METRICS_ENABLED directly in #applyCalculated instead
- Remove configurationNames from DD_TRACE_OTEL_SEMANTICS_ENABLED and drop
otelSemanticsEnabled as a programmatic option from index.d.ts; all callers
now read config.DD_TRACE_OTEL_SEMANTICS_ENABLED directly
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(config): resolve Unknown Config properties for otlpTraceMetricsEnabled/ddTraceMetricsOtelFlushInterval
- Map setAndTrack to OTEL_TRACES_SPAN_METRICS_ENABLED (a declared config key)
instead of the undeclared otlpTraceMetricsEnabled alias; all call sites
updated to read config.OTEL_TRACES_SPAN_METRICS_ENABLED directly
- Move _DD_TRACE_METRICS_OTEL_FLUSH_INTERVAL reading from #applyCalculated
into span_stats.js; import getEnvironmentVariable there directly — removes
the undeclared ddTraceMetricsOtelFlushInterval setAndTrack write
- Update tests to use the env-var key names and remove the now-irrelevant
config override test
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(lint): remove blank line before closing brace in config spec
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(config): generate GeneratedEnvVarConfig interface for env var types
- Extract getBaseType helper from getTypeForEntry to share base type computation
- Add getEnvVarType that only adds undefined when there is no registered default
- Add generateEnvVarConfigTypes to map every env var name (canonical + aliases) to its resolved type
- Append GeneratedEnvVarConfig interface to generated-config-types.d.ts
Rationale: Callers of getValueFromEnvSources need per-env-var typed return values instead of the full config property union, enabling type-safe lookups by literal env var name.
This commit made by [/dd:git:commit:quick](https://github.com/DataDog/claude-marketplace/tree/main/dd/commands/git/commit/quick.md)
* fix(span-stats): address PR review comments
- Gate OTLP-only SpanAggKey dimensions (origin, spanKind, rpcMethod,
rpcStatusCode) on otlpTraceMetricsEnabled to avoid inflating legacy
span stats aggregation key cardinality
- Thread _DD_TRACE_METRICS_OTEL_FLUSH_INTERVAL through the typed config
system (via setAndTrack/getValueFromEnvSources) instead of reading
the raw env var directly in SpanStatsProcessor
- Add config tests covering OTEL_TRACES_SPAN_METRICS_ENABLED auto-enable
logic (both conditions, explicit override)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(span-stats): pass otlpEnabled=true in transformer test bucket helper
makeBucket is used exclusively by OtlpStatsTransformer tests, so spans
must be keyed with otlpEnabled=true to populate the OTLP-gated fields
(origin, spanKind, rpcMethod, rpcStatusCode).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* style(span-stats): minor style and readability cleanups
- Extract flush interval to variable before setAndTrack call
- Remove unnecessary quotes on property key
- Tighten test description wording
Rationale: Small consistency and readability improvements from PR review
This commit made by [/dd:git:commit:quick](https://github.com/DataDog/claude-marketplace/tree/main/dd/commands/git/commit/quick.md)
* fix(span-stats): remove rpc.method from otlp span stats aggregation key
- Drop grpc.method.name from SpanAggKey and OtlpStatsTransformer
- rpc.method inflates aggregation key cardinality without sufficient benefit
- Update all affected test assertions
This commit made by [/dd:git:commit:quick](https://github.com/DataDog/claude-marketplace/tree/main/dd/commands/git/commit/quick.md)
* refactor(span-stats): remove stale comments and clarify TODO
- Remove redundant inline comments in otlp-span-stats transformer
- Replace misleading comment about OTLP-only dimensions with a TODO
noting origin and spanKind should eventually be included in legacy
client stats aggregation
This commit made by [/dd:git:commit:quick](https://github.com/DataDog/claude-marketplace/tree/main/dd/commands/git/commit/quick.md)
* Apply suggestion from @mabdinur
* refactor(span-stats): remove redundant inline comments
- Drop comments that restate what the code already shows
- Keep code self-documenting per project style guidelines
This commit made by [/dd:git:commit:quick](https://github.com/DataDog/claude-marketplace/tree/main/dd/commands/git/commit/quick.md)
* fix(opentelemetry): fix max-len lint violation in span_processor.js
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(opentelemetry): fix max-len lint violation in span_processor.js
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* chore: add otlp-span-stats exporter to CODEOWNERS
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor(opentelemetry): encapsulate OTLP span stats in opentelemetry/metrics
Move OtlpStatsExporter and OtlpStatsTransformer into opentelemetry/metrics/
so the opentelemetry package is self-contained ahead of potential extraction
into its own npm package.
Key changes:
- Move exporters/otlp-span-stats/{index,transformer}.js to
opentelemetry/metrics/otlp_span_stats_{exporter,transformer}.js
- Move buildResourceAttributes from span_stats.js to opentelemetry/metrics/index.js;
add createOtlpSpanStatsExporter factory there
- Wire OtlpStatsExporter via DI: opentracing/tracer.js creates it when
OTEL_TRACES_SPAN_METRICS_ENABLED and passes it through SpanProcessor to
SpanStatsProcessor — span_stats.js no longer imports from opentelemetry/
- config/index.js mirrors OTEL_TRACES_SPAN_METRICS_ENABLED into
stats.DD_TRACE_STATS_COMPUTATION_ENABLED so downstream checks are unified
- Remove otlpEnabled flag from SpanAggKey/SpanBuckets — origin, spanKind,
rpcStatusCode are always populated
- Remove OTEL-specific check from AgentExporter (relies on mirrored flag)
- Remove CODEOWNERS entry for deleted exporters/otlp-span-stats/ path
- Move tests to test/opentelemetry/metrics/
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(opentelemetry): align grpc stats with libdatadog and enforce mutual exclusion
- Move GRPC_STATUS_CODE constant to ext/tags.js
- Emit rpc.response.status_code as string (aligns with libdatadog kv_str)
- Use else if in onInterval to make native and OTLP export mutually exclusive
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor(opentelemetry): trim JSDoc, privatize transformer, drop empty description
- Make OtlpStatsExporter#transformer a private field (#transformer)
- Remove empty description field from histogram metric
- Trim redundant @param prose in exporter and transformer JSDoc
- Use GRPC_STATUS_CODE import in transformer spec instead of string literal
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(opentelemetry): suppress self-instrumentation spans from OTLP exporter requests
Wrap sendPayload's HTTP request in legacyStorage.run({ noop: true }) so the
tracer does not instrument its own outbound connections to the OTLP collector.
Without this, tcp.connect client spans for /v1/metrics requests were fed into
the traces.span.sdk.metrics.duration histogram, displacing real span data points
and inflating counts in the system-tests parametric suite.
Same pattern used by exporters/common/request.js and exporters/common/agents.js.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* test(opentelemetry): improve patch coverage for span stats OTLP export
- Inline URL parsing into OtlpHttpExporterBase constructor; remove setUrl
(the if(telemetryTags !== undefined) branch was dead — telemetryTags is
always undefined when setUrl was called from the constructor, and no
external caller ever invoked it post-construction)
- Add tests for buildResourceAttributes (sdk identity, runtime-id, OTel-
semantics mode) and createOtlpSpanStatsExporter
- Add tests for HTTP error response and request error paths in sendPayload
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(opentelemetry): emit raw grpc.status.code name for rpc.response.status_code
Prefer the meta status NAME string over the numeric metrics tag and emit it
upper-cased to rpc.response.status_code, aligning with the OTel gRPC semantic
conventions (canonical status name) without any code<->name mapping.
* fix(opentelemetry): restore setUrl method removed by dead-code cleanup
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* chore(opentelemetry): trim redundant comments in grpc status mapping
Co-authored-by: Cursor <cursoragent@cursor.com>
* refactor(opentelemetry): read grpc.status.code from meta only
Drop the numeric metrics fallback; the gRPC status code is the canonical status
NAME and is read from span meta.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(opentelemetry): translate numeric grpc.status.code from metrics to status name
The dd gRPC plugin sets grpc.status.code as a numeric integer via
span.setTag, which span_format.js routes into span.metrics rather than
span.meta. SpanAggKey was reading meta only, so rpcStatusCode was always
empty for real gRPC spans.
Now falls back to span.metrics[GRPC_STATUS_CODE] and translates the
integer to the canonical status name (OK, NOT_FOUND, etc.) using the
gRPC status code table. Meta string takes priority when both are present.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(opentelemetry): split agg key by top-level to fix mixed-bucket OTLP metrics
When a bucket contained both top-level and measured non-top-level spans,
the heuristic (topLevelHits === hits) always resolved to false, causing
the OTLP histogram to be emitted as datadog.span.top_level=false and
dropping top-level traffic from APM metrics.
Adding topLevel as a dimension to SpanAggKey causes top-level and
non-top-level spans to bucket separately. Each bucket is now always
purely top-level or purely non-top-level, so the attribute is always
accurate. The native stats path is unaffected because toJSON() omits
topLevel; the Agent merges groups with identical key fields, preserving
the same Hits/TopLevelHits totals.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(opentelemetry): track per-top-level distributions to fix native stats regression
The previous fix added topLevel to SpanAggKey to separate top-level and
non-top-level spans into distinct buckets. This created a real regression
in the native /v0.6/stats path: the Agent's mergeDuplicates() correctly
sums Hits/Errors/Duration from duplicate rows but silently drops
TopLevelHits from the merged-away entry. If the non-top-level row is
processed first and becomes the canonical, TopLevelHits from the
top-level row is lost.
Fix: revert topLevel from SpanAggKey (no more duplicate rows). Instead,
split SpanAggStats into four distributions (topLevelOk, topLevelError,
nonTopLevelOk, nonTopLevelError). The native stats path merges them at
export time so toJSON() produces the same combined OkSummary/ErrorSummary
as before. The OTLP path emits separate data points per top-level status
with the correct datadog.span.top_level attribute. OTel-semantics mode
merges the distributions (no top-level attribute to distinguish them).
Also adds SpanKind, Origin, and RpcStatusCode to the native stats
toJSON() payload so the Agent receives these new dimensions.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor(opentelemetry): split native stats rows by top-level status; use GRPCStatusCode key
toJSON() now returns an array of up to 2 rows (top-level row first, non-top-level
row second). #toLegacyPayload uses flatMap to flatten them. This eliminates the
merge-time DDSketch allocation and ensures TopLevelHits is always non-zero on
the top-level row, so the Agent's mergeDuplicates retains it as the canonical entry.
Duration and Errors are derived from distribution .sum/.count, removing the
redundant this.duration and this.errors accumulators.
GRPCStatusCode matches the agent's msgpack decoder key (confirmed from
pkg/proto/pbgo/trace/stats_gen.go).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* chore(opentelemetry): minimize comments, move GRPC_STATUS_NAMES to constants
- Remove descriptive/narrating comments throughout; keep only non-obvious constraints
- Move GRPC_STATUS_NAMES from span_stats.js into constants.js
- Rename #toLegacyPayload -> #v06Payload
- Remove section comment from ext/tags.js
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(opentelemetry): restrict ORIGIN_KEY to synthetics boolean; remove origin from aggregation key
ORIGIN_KEY now only populates SpanAggKey.synthetics. The origin string field
is removed from SpanAggKey, toString(), and the v0.6 payload (Origin is not
a field the agent decodes). In the OTLP path, datadog.origin='synthetics' is
emitted when aggKey.synthetics is true.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* chore: remove pr_description.md
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Munir <munir.abdinur@datadoghq.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
With `flushInterval: 0` (the test-agent config and the AWS Lambda config) the processor armed `setInterval(onInterval, 0)`, which fires on every event-loop tick instead of when a checkpoint is recorded. A tick landing in the window between the test agent tearing its listener down and bringing it back up posts to a dead port; the bucket is cleared on serialize, so that single payload is lost. A producer-only DSM test waiting on exactly one payload then times out. Honor `flushInterval === 0` as the flush-on-write sentinel the agent and agentless trace exporters already use: skip the timer and push each checkpoint, offset, and transaction the moment it is recorded, while the writer URL is live.
5c1620a to
ade7b83
Compare
* add breaking changes to release proposal * fetch from master instead * removes stable path fallback * fix wrong look up
ade7b83 to
1a8b486
Compare
1a8b486 to
b9822a7
Compare
There was a problem hiding this comment.
More details
PR aggregates 30+ merged features, fixes, and refactorings (AWS durable FAILED checkpoint replay, DataStreams flush-on-write, MongoDB nosql scoping, AppSec Lambda WAF, OpenTelemetry OTLP metrics, Vitest no-worker init refactor). All highest-risk behavioral changes carry comprehensive test coverage — attempt normalization, concurrent query isolation, checkpoint handling, and error cases all verified. No concrete bugs found; logic inversions and renamings are consistent and intentional.
📊 Validated against 9 scenarios · Open Bits AI session
🤖 Datadog Autotest · Commit b9822a7 · What is Autotest? · Any feedback? Reach out in #autotest
Features
Fixes
Performance
Documentation
Internal (CI, Testing, Benchmarking)