v6.0.0 proposal#9195
Conversation
The OIDC-exchanged token from the npm registry is only valid for the publish operation; using it for npm dist-tag add produced E401. Remove the multi-tag logic and the OIDC exchange entirely: each branch now publishes with a single tag (latest for the current release line, latest-nodeXX for older lines), which is all npm's trusted publishing model supports without a stored token. Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…ad (#9086) Each test cell uploaded its own report to Codecov, so a commit sent ~430 uploads. Codecov silently parks uploads past its ~150-per-commit ceiling in `started` and never merges them, so roughly 40 reports' worth of coverage was dropped from every commit. The Datadog coverage upload was separately broken: `upload-coverage-artifact` probed for files with `find -maxdepth 1`, but the report lives one level deeper at `coverage/node-<version>/`, so the check found nothing, no `coverage-*` artifact was produced, and `datadog-ci coverage upload` reported nothing while passing green. All Green already downloads every `coverage-*` artifact to drive the Datadog upload, so it is the one place that sees a whole commit's coverage. It now groups the per-cell reports by integration and uploads ~100 groups to both backends instead of ~430 per-cell reports: 1. `upload-coverage-artifact` recurses for the report files and names each artifact `coverage-<flag>__<job>-<index>` so matrix cells that share a flag (cypress varies `spec` outside its flag) stop clobbering each other. 2. `scripts/group-coverage.mjs` sorts each cell's report into its integration's directory, stripping Node.js and library versions, which are noise for "which integration regressed". Reports are not merged locally — both backends merge same-flag uploads server-side — so each report passes through byte-for-byte and the harness needs no istanbul dependency in All Green's sparse checkout. ~430 cells collapse to ~100 groups. 3. Each cell emits both lcov and istanbul JSON: Codecov reads branch and function coverage from the JSON (its lcov parser ingests only line hits), Datadog reads the lcov and does not ingest the JSON. All Green uploads each format to the backend that reads it, one group per integration, flagged with the integration name. `master-coverage` still rides every Codecov upload on PRs targeting master so the `codecov/patch` gate fires; reruns de-duplicate to the newest run so a stale rerun's counters are not double-counted.
…9074) The "should emit one kafka.produce span per topicMessages entry" test hard-coded kafka.messages.offsets to start_offset "0". Kafka produce is at-least-once: a transient NOT_LEADER_FOR_PARTITION right after topic creation makes kafkajs retry and advance the broker-assigned base offset past 0, so the span faithfully reports a non-zero offset (CI observed "1") and the assertion never matched before the timeout. The expected offsets are read back from the sendBatch result instead, which still pins the per-topic isolation the test was written for. Each topicMessages entry is its own root span, so the two spans are separate traces the agent may deliver in a single payload in any order; the span lookup now scans every trace rather than only traces[0].
Key each expanded major by its bare major (`versions/mongodb@3`) instead of a bounded range (`versions/mongodb@>=3.0.0 <4.0.0`). The bare major reads cleanly as a folder name and covers each major's latest, including the floor major's, which the range form dropped. Follows the shared resolver from #9019. Widening the matrix to every major's latest surfaced several latent failures: 1. A bare-major key resolves to that major's newest version, so a range ending inside its top major overshoots: microgateway-core `>=2.1 <=3.0.0` keyed `3` installed 3.3.7 and the span came back `web.request` instead of `microgateway.request`. The top major keeps the declared range whenever it stops short of the major's ceiling; fully-spanned and lower majors stay bare. 2. `versions/ai@4` and `@langchain/core@0` resolve to versions that have no VCR cassette and would hit the live API (401). A central `brokenVersions` registry drops a matching resolved version and surfaces the reason as a pending test, each entry a stop-gap carrying a TODO. 3. A manifest carrying a `workspace:` protocol dependency was copied verbatim into a generated workspace, so yarn failed with "Couldn't find any versions for X that matches workspace:*". Fall back to the pinned compatible version. 4. The Apollo fetch-failure test gated the error span on `version > '2.3.0'`, a lexicographic compare that breaks once the key is bare (`'2' > '2.3.0'` is false). Compare the resolved version with `semver.gt`. 5. Single-digit keying renames folders that several specs hard-code by range (express, langchain, bedrock runtime, aws-sdk). The bedrock require threw after `agent.load` with no `agent.close`, leaving the Remote Config poll running and hanging the job to the 45-minute timeout; the others silently skipped suites. Point the requires at the renamed folders.
…As (#9101) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…lPropertyName (#8943) `internalPropertyName` carried a hand-maintained full property path (`telemetry.debug`, `remoteConfig.enabled`) that diverged from the canonical env name, so the same configuration was named twice and the two could drift. A new `namespace` field nests the canonical env name under a property path (`telemetry.DD_TELEMETRY_DEBUG`), so the runtime path is derived from the canonical name plus a namespace with no separate alias to maintain. It takes precedence over `configurationNames` and `internalPropertyName` when resolving the path, in the eslint sync rule, and in the type generator. Every group of entries (remoteConfig, telemetry, appsec api-security and sca, profiling, stats, llmobs, iast security-controls, the per-integration llm span limits) moves onto it, and their runtime consumers are updated to the renamed keys. The canonical name telemetry reports is unchanged: the rename only affects how the property path is derived, not which env name is sent. The namespace object is always built from the defaults, so the optional chaining and `?? 0` fallbacks on the api-security accesses guarded a state that cannot occur and are dropped. Drive-by fix: * Exempt integration-test fixture apps from `n/no-extraneous-require`: they `require('dd-trace')` as a customer does, so the rule fired once dd-trace became locally resolvable (yarn link) but stayed silent on a clean install.
…te (#9026) import-in-the-middle scanned the include and exclude arrays once per resolved module — up to ~290 include entries (RegExp.test or string compare) plus a fileURLToPath on every resolve, nearly all against modules that match nothing. Supplying iitm 3.2.0's shouldInclude predicate replaces that scan with a single Set lookup for bare specifiers and one combined RegExp covering every instrumented node_modules path and the configured security-control subpaths, plus one RegExp for the exclusions. Over a mixed resolve corpus this drops the per-resolve matching cost from ~2.5µs to ~25ns (about 100x). The Set also carries each built-in's node: specifier, mirroring iitm's include expansion, so `import 'node:crypto'` stays instrumented alongside `import 'crypto'`. Package names pass through regexpEscape so a metacharacter in a future package name cannot mis-match. The .mjs rewriter loader spec was the repository's only .spec.mjs and no CI job ran it: the misc suite glob matched .spec.js only, and the exercised-tests gate collected .spec.js/.test.mjs but not .spec.mjs, so it could not flag the orphan. 1. Match *.spec.{js,mjs} in test:instrumentations:misc so the spec runs. 2. Widen verify-exercised-tests globs to @(spec|test).@(js|mjs|cjs) so every naming convention is tracked and an unrun one fails the gate. 3. Load the loader through require(esm) where the runtime supports it so its transforms land on nyc's CommonJS instrumentation path; gate on process.features.require_module so Node 18 falls back to import() instead of crashing the suite on the CommonJS compiler's SyntaxError.
The sampling tests in #9030 build their own taggers with `{ llmobs: { enabled: true } }`, and #8943 renamed that config key to `DD_LLMOBS_ENABLED` everywhere it could see. The two landed in parallel, so #8943 normalized the rest of the file but never saw these four fixtures. On master the tagger now reads `DD_LLMOBS_ENABLED`, finds it undefined, and returns before registering the span; `Tagger.tagMap.get` then yields undefined and the "DROPPED at sampleRate 0" test throws synchronously, aborting the whole `test:llmobs:sdk:ci` run with exit 7. Fixes: https://github.com/DataDog/dd-trace-js/actions/runs/28265509637/job/83751636644
) feat(graphql): migrate instrumentation to orchestrion Migrates GraphQL instrumentation from shimmer wrappers to orchestrion AST rewriting for graphql execute / parse / validate entry points, including CJS and ESM paths for graphql >=0.10 and @graphql-tools/executor. Moves resolver instrumentation into the GraphQL execute plugin. The execute plugin now owns per-execute root context, resolver wrapping, resolve-span lifecycle, source tracking, and resolver hook invocation. The old separate resolve plugin is removed. Preserves and tests the existing cross-feature contracts: - IAST still receives one apm:graphql:resolve:start publish per resolver call, using the actual GraphQL args object. - AppSec still receives resolver payloads through datadog:graphql:resolver:start and can abort synchronously through the shared abort controller. - depth only limits resolve-span creation; IAST/AppSec resolver publishes still happen for depth-gated fields. - depth-gated resolvers now honor abort signals before falling through the no-span fast path. - caller-owned execute args and contextValue are preserved without mutation. - default field resolver behavior matches graphql for primitive parent values. - graphql-yoga / @graphql-tools/executor execution is instrumented. Adds public TypeScript declarations for the GraphQL resolve hook and FieldContext payload. Keeps the implementation orchestrion-only, with no shimmer fallback, and updates the GraphQL long benchmark calibration for the migrated hot path. Regression coverage was added for: - resolver abort behavior past the configured depth - depth: 0 AppSec resolver-channel publishing - primitive-source defaultFieldResolver parity - caller-supplied and frozen execute args - primitive contextValue forwarding - Yoga normalized executor instrumentation - IAST/AppSec per-resolver channel cardinality Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> Co-authored-by: Ruben Bridgewater <ruben@bridgewater.de>
Bumps the test-versions group with 1 update in the /integration-tests/esbuild directory: [openai](https://github.com/openai/openai-node). Updates `openai` from 6.44.0 to 6.45.0 - [Release notes](https://github.com/openai/openai-node/releases) - [Changelog](https://github.com/openai/openai-node/blob/main/CHANGELOG.md) - [Commits](openai/openai-node@v6.44.0...v6.45.0) --- updated-dependencies: - dependency-name: openai dependency-version: 6.45.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: test-versions ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…th 10 updates (#9127) Bumps the cloud-and-messaging group with 10 updates in the /packages/dd-trace/test/plugins/versions directory: | Package | From | To | | --- | --- | --- | | [@aws-sdk/client-bedrock-runtime](https://github.com/aws/aws-sdk-js-v3/tree/HEAD/clients/client-bedrock-runtime) | `3.1074.0` | `3.1075.0` | | [@aws-sdk/client-dynamodb](https://github.com/aws/aws-sdk-js-v3/tree/HEAD/clients/client-dynamodb) | `3.1074.0` | `3.1075.0` | | [@aws-sdk/client-kinesis](https://github.com/aws/aws-sdk-js-v3/tree/HEAD/clients/client-kinesis) | `3.1074.0` | `3.1075.0` | | [@aws-sdk/client-lambda](https://github.com/aws/aws-sdk-js-v3/tree/HEAD/clients/client-lambda) | `3.1074.0` | `3.1075.0` | | [@aws-sdk/client-s3](https://github.com/aws/aws-sdk-js-v3/tree/HEAD/clients/client-s3) | `3.1074.0` | `3.1075.0` | | [@aws-sdk/client-sfn](https://github.com/aws/aws-sdk-js-v3/tree/HEAD/clients/client-sfn) | `3.1074.0` | `3.1075.0` | | [@aws-sdk/client-sns](https://github.com/aws/aws-sdk-js-v3/tree/HEAD/clients/client-sns) | `3.1074.0` | `3.1075.0` | | [@aws-sdk/client-sqs](https://github.com/aws/aws-sdk-js-v3/tree/HEAD/clients/client-sqs) | `3.1074.0` | `3.1075.0` | | [azure-functions-core-tools](https://github.com/Azure/azure-functions-core-tools) | `4.12.0` | `4.12.1` | | [durable-functions](https://github.com/Azure/azure-functions-durable-js) | `3.3.1` | `3.4.0` | Updates `@aws-sdk/client-bedrock-runtime` from 3.1074.0 to 3.1075.0 - [Release notes](https://github.com/aws/aws-sdk-js-v3/releases) - [Changelog](https://github.com/aws/aws-sdk-js-v3/blob/main/clients/client-bedrock-runtime/CHANGELOG.md) - [Commits](https://github.com/aws/aws-sdk-js-v3/commits/v3.1075.0/clients/client-bedrock-runtime) Updates `@aws-sdk/client-dynamodb` from 3.1074.0 to 3.1075.0 - [Release notes](https://github.com/aws/aws-sdk-js-v3/releases) - [Changelog](https://github.com/aws/aws-sdk-js-v3/blob/main/clients/client-dynamodb/CHANGELOG.md) - [Commits](https://github.com/aws/aws-sdk-js-v3/commits/v3.1075.0/clients/client-dynamodb) Updates `@aws-sdk/client-kinesis` from 3.1074.0 to 3.1075.0 - [Release notes](https://github.com/aws/aws-sdk-js-v3/releases) - [Changelog](https://github.com/aws/aws-sdk-js-v3/blob/main/clients/client-kinesis/CHANGELOG.md) - [Commits](https://github.com/aws/aws-sdk-js-v3/commits/v3.1075.0/clients/client-kinesis) Updates `@aws-sdk/client-lambda` from 3.1074.0 to 3.1075.0 - [Release notes](https://github.com/aws/aws-sdk-js-v3/releases) - [Changelog](https://github.com/aws/aws-sdk-js-v3/blob/main/clients/client-lambda/CHANGELOG.md) - [Commits](https://github.com/aws/aws-sdk-js-v3/commits/v3.1075.0/clients/client-lambda) Updates `@aws-sdk/client-s3` from 3.1074.0 to 3.1075.0 - [Release notes](https://github.com/aws/aws-sdk-js-v3/releases) - [Changelog](https://github.com/aws/aws-sdk-js-v3/blob/main/clients/client-s3/CHANGELOG.md) - [Commits](https://github.com/aws/aws-sdk-js-v3/commits/v3.1075.0/clients/client-s3) Updates `@aws-sdk/client-sfn` from 3.1074.0 to 3.1075.0 - [Release notes](https://github.com/aws/aws-sdk-js-v3/releases) - [Changelog](https://github.com/aws/aws-sdk-js-v3/blob/main/clients/client-sfn/CHANGELOG.md) - [Commits](https://github.com/aws/aws-sdk-js-v3/commits/v3.1075.0/clients/client-sfn) Updates `@aws-sdk/client-sns` from 3.1074.0 to 3.1075.0 - [Release notes](https://github.com/aws/aws-sdk-js-v3/releases) - [Changelog](https://github.com/aws/aws-sdk-js-v3/blob/main/clients/client-sns/CHANGELOG.md) - [Commits](https://github.com/aws/aws-sdk-js-v3/commits/v3.1075.0/clients/client-sns) Updates `@aws-sdk/client-sqs` from 3.1074.0 to 3.1075.0 - [Release notes](https://github.com/aws/aws-sdk-js-v3/releases) - [Changelog](https://github.com/aws/aws-sdk-js-v3/blob/main/clients/client-sqs/CHANGELOG.md) - [Commits](https://github.com/aws/aws-sdk-js-v3/commits/v3.1075.0/clients/client-sqs) Updates `azure-functions-core-tools` from 4.12.0 to 4.12.1 - [Release notes](https://github.com/Azure/azure-functions-core-tools/releases) - [Changelog](https://github.com/Azure/azure-functions-core-tools/blob/4.12.1/release_notes.md) - [Commits](Azure/azure-functions-core-tools@4.12.0...4.12.1) Updates `durable-functions` from 3.3.1 to 3.4.0 - [Release notes](https://github.com/Azure/azure-functions-durable-js/releases) - [Commits](Azure/azure-functions-durable-js@v3.3.1...v3.4.0) --- updated-dependencies: - dependency-name: "@aws-sdk/client-bedrock-runtime" dependency-version: 3.1075.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: cloud-and-messaging - dependency-name: "@aws-sdk/client-dynamodb" dependency-version: 3.1075.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: cloud-and-messaging - dependency-name: "@aws-sdk/client-kinesis" dependency-version: 3.1075.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: cloud-and-messaging - dependency-name: "@aws-sdk/client-lambda" dependency-version: 3.1075.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: cloud-and-messaging - dependency-name: "@aws-sdk/client-s3" dependency-version: 3.1075.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: cloud-and-messaging - dependency-name: "@aws-sdk/client-sfn" dependency-version: 3.1075.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: cloud-and-messaging - dependency-name: "@aws-sdk/client-sns" dependency-version: 3.1075.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: cloud-and-messaging - dependency-name: "@aws-sdk/client-sqs" dependency-version: 3.1075.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: cloud-and-messaging - dependency-name: azure-functions-core-tools dependency-version: 4.12.1 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: cloud-and-messaging - dependency-name: durable-functions dependency-version: 3.4.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: cloud-and-messaging ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…pdates (#9128) Bumps the test-versions group with 4 updates in the /packages/dd-trace/test/plugins/versions directory: [@types/node](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/node), [pnpm](https://github.com/pnpm/pnpm/tree/HEAD/pnpm11/pnpm), [protobufjs](https://github.com/protobufjs/protobuf.js) and [stripe](https://github.com/stripe/stripe-node). Updates `@types/node` from 26.0.0 to 26.0.1 - [Release notes](https://github.com/DefinitelyTyped/DefinitelyTyped/releases) - [Commits](https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/node) Updates `pnpm` from 11.8.0 to 11.9.0 - [Release notes](https://github.com/pnpm/pnpm/releases) - [Changelog](https://github.com/pnpm/pnpm/blob/main/pnpm11/pnpm/CHANGELOG.md) - [Commits](https://github.com/pnpm/pnpm/commits/v11.9.0/pnpm11/pnpm) Updates `protobufjs` from 8.6.4 to 8.6.5 - [Release notes](https://github.com/protobufjs/protobuf.js/releases) - [Changelog](https://github.com/protobufjs/protobuf.js/blob/master/CHANGELOG.md) - [Commits](protobufjs/protobuf.js@protobufjs-v8.6.4...protobufjs-v8.6.5) Updates `stripe` from 22.2.3 to 22.3.0 - [Release notes](https://github.com/stripe/stripe-node/releases) - [Changelog](https://github.com/stripe/stripe-node/blob/master/CHANGELOG.md) - [Commits](stripe/stripe-node@v22.2.3...v22.3.0) --- updated-dependencies: - dependency-name: "@types/node" dependency-version: 26.0.1 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: test-versions - dependency-name: pnpm dependency-version: 11.9.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: test-versions - dependency-name: protobufjs dependency-version: 8.6.5 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: test-versions - dependency-name: stripe dependency-version: 22.3.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: test-versions ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…#9110) The request helper retries once on a 5xx with a 5–7.5 s jittered delay, but the getKnownTests "should return an error if the request fails" test mocked a single 500 interceptor and used real timers. The retried request had no interceptor and its retry timer always exceeded mocha's 5 s timeout, so the callback never fired and the test timed out. Collapse the retry delay to 0 ms and add a second 500 interceptor so the test exercises the real retry path and asserts both requests are consumed.
…9112) `internalPropertyName` made each supported config carry a second hand-maintained runtime path next to its canonical env name. Drop that alias and derive the config-object path from the canonical name, with Test Optimization entries grouped under `testOptimization` and top-level entries using their canonical key directly. The plugin shared-config boundary still forwards the existing per-plugin keys, including the flat dynamic-instrumentation flag `CiPlugin.configure` receives; plugins do not receive the namespaced tracer config. Drive-by fix: * Drop duplicate benchmark `enabled` leaves left behind by the previous namespace migration.
The NoSQL injection analyzer used `enterWith` to mark the async context, which leaked the marker past the query. A request that ended before its query finished stranded the marker for the next request, so that request's injection went unreported. Two concurrent queries within the same request also saw each other's marker, leaving one unanalyzed. Binding the marker on the query-build channel fixed the leak but lost it on deferred queries. A mongoose query builds, executes, and reaches the driver in three separate async steps. `find().then()`/`.exec()` runs the driver a turn after the synchronous build, outside the build's `runStores` scope, so the driver re-analyzed the same filter and reported the injection twice. Binding the marker around the execution channel instead covers the full async scope that reaches the driver, and `runStores` restores the parent on its own. This re-enables the mongoose nosqli suite on Node 20 + Express 5 (skipped for APPSEC-66705) and the mquery nosqli integration suite (skipped for APPSEC-62431, where the unscoped marker caused each injection to be reported N+1 times). The mquery suite skips mongodb >=7 on Node < 20: that driver reads Web Crypto off the `crypto` global, which Node 18 does not expose by default.
The `depth` filter counted a resolver's full execution path, including the numeric list indices that `collapse` later folds away. The same query therefore reached a different depth depending on whether `collapse` was on: a field one list-hop below the limit was instrumented when collapsing was off and dropped when it was on, even though both describe the same selection-set nesting. Count only selection-set segments (string path keys) toward `depth`, so the limit tracks query structure rather than execution artifacts. This shifts which resolvers are instrumented at a given `depth`, so it is gated behind `DD_MAJOR`: the v5 line keeps the old list-index counting when collapsing is on, and v6 counts selection-set depth only. The `countListIndices` config flag carries the gate so `shouldInstrumentNode` stays free of version checks. Fixes: #7468
Adds AppSec support for AWS lambda to dd-trace-js by introducing DC handlers that allow the datadog-lambda-js layer to delegate WAF execution to the tracer.
1. Remove test/asserts/profile.js. Nothing requires it; the only profile value-type assertion in use is the standalone helper in the profiling agent exporter spec. 2. Fix the "amont" -> "amount" typo in both telemetry heartbeat comments.
* ci(scripts): disable V8 Maglev for Windows test children Network-heavy specs intermittently abort with STATUS_STACK_BUFFER_OVERRUN (0xC0000409) on Windows: mocha-run-file forces process.exit() the moment mocha finishes, and that races V8's Maglev teardown while libuv still has in-flight sockets from the spec's real HTTP traffic. The child dies with no stderr and no crash report, so mocha-parallel-files only sees a non-zero exit and reports the file as crashed (e.g. inferred_proxy.spec.js, which is just whichever network spec lost the race that run). --no-maglev sidesteps the faulty tier and can only be passed as a CLI flag, not through NODE_OPTIONS, so the runner injects it into the spawned per-file node processes on win32. Refs: nodejs/node#62260 * ci(scripts): gate --no-maglev by V8 version The Windows --no-maglev workaround was appended on every win32 child, but the top-level --maglev/--no-maglev toggle only exists from V8 11 (Node 20). On the supported Node 18 line (V8 10) the flag does not exist, so each spawned spec aborts with `bad option: --no-maglev` (exit 9) before mocha runs, breaking the parallel runner for Windows Node 18. Gate on the running V8 major, which the children inherit via the shared binary.
Node 20 defines `fs.opendir` / `fs.opendirSync` as lazy getter+setter accessor
properties that resolve the real function on first read. Handing such a property
to `shimmer.wrap` instrumented the getter, so the property access was traced
while the real call ran uninstrumented — IAST then saw no `opendir` operation
and reported no PATH_TRAVERSAL vulnerability. Node 18/22 define these as plain
data properties, which is why the gap was Node-20-only.
Extend `shimmer.wrap`'s existing `replaceGetter` option to cover the lazy
getter+setter case: resolve the value once through the getter and wrap that,
rather than re-implementing the resolution in the fs instrumentation. The
property keeps its original shape — a getter+setter pair stays a getter+setter
pair whose setter still materializes a writable data property on assignment, so
the descriptor remains observationally identical for a downstream consumer that
inspects or overwrites it on that Node.js version.
A getter+setter pair without `replaceGetter` keeps being wrapped in place — the
wrapper becomes the new getter and the original setter is left untouched, as the
`url` instrumentation relies on for the `URL.prototype` `host` / `hostname`
accessors. Only a setter-only property throws. Narrowing the guard to reject
every unguarded getter+setter pair would have thrown inside the `url` hook,
silently dropping that instrumentation and the AppSec / IAST coverage built on
it.
`fs.js` now passes `{ replaceGetter: true }` through its `wrap` / `massWrap`
helpers instead of carrying its own materialization helper.
The second sendBatch/send in these tests is a real broker call that the
test expects to succeed after the stub is restored. A fresh topic's first
produce routinely returns the retryable NOT_LEADER_FOR_PARTITION while
metadata propagates; kafkajs normally refreshes metadata and retries it,
but retry:{retries:0} stripped that safety net, surfacing the transient
error as a hard KafkaJSNonRetriableError and flaking CI.
retries:0 bought nothing for the first call it was meant to speed up: the
stubbed UNKNOWN error is non-retryable, so it already fails on the first
attempt regardless of the retry count. Removing it restores the retry on
the real call while leaving the stubbed-rejection assertions unchanged.
The non-native runtime metrics test asserted that runtime.node.gc.pause.by.type.95percentile lands in [0.1ms, 100ms). On a fast or idle runner a gc_type can have a single scavenge sample whose p95 sits below 0.1ms, so the matcher rejected a legitimate value and the test failed on Windows. The bounds exist to catch a unit-conversion regression, not to assert a minimum pause length: a sub-microsecond value would mean the ms->ns conversion was dropped, a value over 100ms that it was left in ms or seconds. Lower the floor to 1µs so it still trips on a dropped conversion while no longer assuming a GC pause takes at least 0.1ms.
… running (#9145) * ci(engines): widen engines.node to >=18 in CI to keep Node 18/20 jobs running engines.node is bumped to >=22 to match the supported runtime range. The runtime guard (packages/dd-trace/src/guardrails/index.js) and the withVersions test helper both read engines.node and bail when the running major is below it, so on Node 18/20 every suite would silently skip (withVersions returns early, the init guard short-circuits, mocha reports 0 passing). CI widens engines.node back to >=18 at the action level (right after actions/setup-node) via scripts/ci/widen-engines-for-ci.js, so Node 18/20 jobs keep exercising real tests. The on-disk package.json that ships is unchanged (engines.node stays >=22). Co-authored-by: Cursor <cursoragent@cursor.com> * ci(engines): only widen engines.node on Node 18/20 jobs The widen step lives in the shared node/setup action, so it ran on every CI job, including integration-guardrails-unsupported, which installs Node 0.8 through 14 on purpose to verify the runtime guard aborts. Two problems: the script used a node:-prefixed require (unsupported before Node 14.18) so it crashed and failed those jobs, and even had it run, rewriting engines.node there would undermine a test that relies on the shipped >=22. Gate the step in the action so it only runs when the installed major is 18 or 20 (the supported majors the >=22 bump newly excludes). Every other version, including the unsupported matrix, keeps the shipped >=22 and the script is never parsed by a runtime too old to run it. Co-authored-by: Cursor <cursoragent@cursor.com> * ci(node): pin oldest alias to Node 18 instead of deriving from engines.node Commit 1a79e10 (prepare to drop Node 18/20) changed the `oldest` version alias to read the minimum major out of package.json engines.node. With engines.node now bumped to >=22, `oldest` resolved to Node 22, which dropped Node 18 from the matrix and shifted the child_process job's legs to {22,20,24,26}. The resulting leg order (Node 22 first, then 20) destabilized the Bluebird global-Promise tests on the Node 20 leg, which pass on master's {18,20,24,26}. Pin `oldest` back to a literal 18 so CI keeps testing the oldest version we still exercise, decoupled from the shipped engines floor. This restores the matrix and leg order to match master; the widen step still restores >=18 on the 18/20 legs. Co-authored-by: Cursor <cursoragent@cursor.com> * add early return --------- Co-authored-by: Cursor <cursoragent@cursor.com>
* test(child_process): load the mock agent once per suite The Integration and Bluebird suites called agent.load / agent.close in beforeEach / afterEach, so every test tore down the mock agent and the next brought a new one up on a fresh port. Async child processes spawned in a test finish on their own 'close' event, which can fire after the test resolves — during the teardown gap. The leaked command_execution span then flushed to the previous test's now-closed port (ECONNREFUSED), and a stale assertion handler matched the leftover manual 'parent' span instead, timing the next test out. Under CI load the first miss cascaded across the whole suite. Loading the agent once per describe block in before / after keeps a single port alive for every test in the block, so a late span lands on the still-open agent. The Bluebird global.Promise swap stays per-test in beforeEach / afterEach. Fixes: https://github.com/DataDog/dd-trace-js/actions/runs/28281208791 * test(child_process): drain ls spans in span-maintenance tests The shared mock agent now stays alive across every test in the Integration suite. The "should maintain previous span" cases spawn an async ls child process but never await its command_execution span, so a late flush can land during a later test whose expectSomeSpan carries the same ls expectation and be consumed there, masking a regression in that later command instead of failing. Awaiting the span in the originating test consumes it before the next test runs. Refs: #9113 (comment)
…eration spans (#8595) * workflow(aws-durable-execution-sdk-js): install_package * workflow(aws-durable-execution-sdk-js): generate_app * workflow(aws-durable-execution-sdk-js): compile * workflow(aws-durable-execution-sdk-js): test:att1:iter1:fixer * workflow(aws-durable-execution-sdk-js): test:att1:iter2:fixer * workflow(aws-durable-execution-sdk-js): feature_implement * workflow(aws-durable-execution-sdk-js): get_lint_failures * workflow(aws-durable-execution-sdk-js): lint_and_fix:att1:iter1:fix_lint_errors * workflow(aws-durable-execution-sdk-js): review_cycle:att1:iter1:batch_fix * remove the unnecessary dd-api-key * clean up # Conflicts: # index.d.ts * yarn.lock changed... * fixing yarn.lock * remove the unintended finish() guard * update span names * use a fixed service name instead * update resource names * naming consistency * small fix * Python PR parity * Undo unnecessary changes * Finish error spans on asyncEnd * Simplify orchestrion file * Class/file name changes * Several simplifications and improvements * Do not explicitly set component * Remove includeReplayedTag * Smaller simplifications * Tests simplification * chore: update supported-integrations * More test simplification * Add aws.durable.operation_id and aws.durable.operation_name * Fix checks * Linter * Test simplifications * More test improvements * Lazy thenables + only close this integration's spans * Code simplifications * Fix rebase * Mirror changes in v5 * Test waitForCondition happy path * Comment improvements based on guidelines * supress child context for WaitForCallback * Increase tested version * Address review comments * Avoid patching on the plugin by creating a "settle" channel * Do not skipTime to avoid interfering with tracer's timers * Fix test flakiness * Test durable-execution-sdk-js only on node >=22 * Linter * feat(aws-durable-execution-sdk-js): trace-context checkpoint for cross-invocation continuity Persist the current trace context as a synthetic `_datadog_{N}` STEP operation when the SDK suspends to PENDING, so subsequent invocations (read by the upstream datadog-lambda-js wrapper) can resume the same trace. Files: - src/handler.js: install a hook on the SDK's terminationManager.terminate inside bindStart. Save fires only for resumable reasons (PENDING_TERMINATION_REASONS allow-list mirrors the SDK's TerminationReason enum entries that result in Status: PENDING). Gated by DD_DURABLE_CROSS_INVOCATION_TRACING_ENABLED (default on; opt out with 'false'/'0'). - src/trace-checkpoint.js: NEW. Datadog-only header inject (private TextMapPropagator with tracePropagationStyle.inject = ['datadog'], shadows the live tracer config), dedup against prior _datadog_N op via JSON.stringify-after-stripping-x-datadog-parent-id, deterministic blake2b stepId so the save is idempotent within an execution. - test/handler.checkpoint.spec.js: unit tests for the termination hook (pending vs non-pending reasons, env-var gate, idempotency, default reason). - test/trace-checkpoint.spec.js: unit tests for the save module (queue START+SUCCEED before terminating, dedup on parent-id-only changes). - test/index.spec.js: integration coverage for SDK safe-paths (single cycle, child-context, step-suspend-step). - packages/dd-trace/src/config/supported-configurations.json and generated-config-types.d.ts: register DD_DURABLE_CROSS_INVOCATION_TRACING_ENABLED. * small fixes: align format * test(aws-durable-execution-sdk-js): skip 3 tests that race against TimerScheduler bug Skip wait_for_callback (happy path) and the entire invoke describe block (happy + error). All three fail deterministically in CI under @aws/durable-execution-sdk-js-testing's current TimerScheduler, whose hasScheduledFunction() undercounts in-flight scheduled functions and trips the test orchestrator's "Cannot return PENDING status with no pending operations." validation. Production (real AWS backend) is not affected — the validation is mock-only. Fix is open upstream as aws/aws-durable-execution-sdk-js#544; re-enable these tests once a release containing it is pinned in packages/dd-trace/test/plugins/versions/package.json. * refactor(aws-durable-execution-sdk-js): remove kTerminationHookInstalled guard The guard was defensive against a "same terminationManager passed to bindStart twice" scenario that cannot happen in the SDK as it stands — each Lambda invocation calls initializeExecutionContext, which constructs a fresh `new TerminationManager()`, so warm starts share the wrapper closure but not the terminationManager instance. Removing the Symbol + the guard + the explicit "twice across invocations" unit test that only covered a contrived re-entry. Drive-by: fix four pre-existing space-before-function-paren lint errors in the same file. * refactor(aws-durable-execution-sdk-js): anchor checkpoints at the execute span, not its parent Drop the `getParentSpanId` helper and inline the read directly during `state` initialization. While inlining, switch the anchor from the execute span's *parent* (typically `aws.lambda`'s id) to the execute span's *own* id (`span.context().toSpanId()`). Why anchor at the execute span: - It's a span this integration owns and just created, so always defined and never depends on what upstream context happened to be active when `bindStart` fired. - Topology becomes "resumed invocations are continuations of the first execute" — matching the user-facing model of a single durable execution. The old shape made resumes look like sibling Lambda invocations under whatever upstream span happened to be there. - In the no-upstream case the old code already fell through to the propagator default (= execute span's own id) via `if (parentId)` — so this just makes the behavior consistent across environments. Rename for clarity: - `saveTraceContextCheckpointIfUpdated`'s `checkpointAnchorSpanId` parameter -> `firstExecutionSpanId`. JSDoc spells out it's only consulted on the very first save; once a prior `_datadog_{N}` exists, the function reuses that checkpoint's `x-datadog-parent-id` verbatim. - The local `latestParentId` (the value carried forward across saves) -> `anchoredSpanId`, reflecting that it IS the anchor we've been using since the first save. - handler.js's `state.parentSpanId` -> `state.firstExecutionSpanId`. Note: dd-trace-py's `_resolve_override_parent_id` currently anchors at the execute span's parent (matching the old JS behavior). A follow-up should bring Python in line with this change so both languages produce the same trace shape. * Revert "test(aws-durable-execution-sdk-js): skip 3 tests that race against TimerScheduler bug" This reverts commit 8baa8ce. * Reapply "test(aws-durable-execution-sdk-js): skip 3 tests that race against TimerScheduler bug" This reverts commit 748a826. * force-disable legacyBaggageEnabled * Rename context variables * feat(aws-durable-execution-sdk-js): add aws.durable.operation_attempt tag Tag aws.durable.step and aws.durable.wait_for_condition spans with the 1-indexed attempt number (1 for the original attempt, 2 for the first retry, etc.), matching the AWS UI's attempt-count convention. Sourced from the SDK's checkpoint StepDetails.Attempt field; defaults to 1 when no checkpoint exists yet (the very first execution before the START checkpoint). Mirrors dd-trace-py #18191. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(aws-durable-execution-sdk-js): use 0-indexed operation_attempt Pass StepDetails.Attempt through as-is (0 = original attempt, 1 = first retry, etc.) instead of remapping with max(1, …). The production AWS Lambda Durable service stores Attempt as "number of prior failed attempts" starting at 0, so the raw value already carries correct 0-indexed semantics. Default to 0 (not 1) when no checkpoint exists yet. Matches dd-trace-py after end-to-end verification on a deployed Lambda: CloudWatch logs confirmed server values 0/1/2 for original/retry/retry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Apply suggestions from code review Co-authored-by: Ruben Bridgewater <ruben@bridgewater.de> * make installTerminationCheckpointHook real private * simplify getDatadogOnlyPropagator * better names * lint * refactor(aws-durable-execution-sdk-js): use static retryable property instead of span name Set Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(aws-durable-execution-sdk-js): derive test RETRYABLE_SPAN_NAMES from plugin static property Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Lint * remove unnecessary try catch * return undefined instead of null * use timers.promises.setImmediate() instead * Add esbuild bundling acceptance test * ESM smoke tests * Update packages/datadog-plugin-aws-durable-execution-sdk-js/src/handler.js Co-authored-by: Ruben Bridgewater <ruben@bridgewater.de> * don't skip the established tests * also update buildPlugin * Use the normalized `tracer` getter, not the raw `_tracer`: the latter is the proxy, whose`_config` is undefined. trace-checkpoint reads `tracer._config`, so it needs the unwrapped tracer that actually carries the config. * lint * improve the coverage * Update packages/datadog-plugin-aws-durable-execution-sdk-js/src/checkpoint.js Co-authored-by: Ruben Bridgewater <ruben@bridgewater.de> * Address various review comments * Operation name * fix(aws-durable-execution-sdk-js): correct operation_attempt on replayed ops On a SUCCEEDED checkpoint (an op replayed from its stored result), StepDetails.Attempt holds the 1-indexed number of the attempt that succeeded rather than the "prior failed attempts" count used on a live run, so a replayed op reported its attempt one too high. Subtract 1 in that case, floored at 0, so a replay agrees with the original run. Mirrors dd-trace-py#18520. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(aws-durable-execution-sdk-js): gate termination hook install in bindStart Extract #shouldInstallTerminationHook so bindStart decides whether to install the cross-invocation checkpoint hook, instead of the hook self-gating with early returns. The hook now assumes its preconditions and recomputes the handler args, execute span, and termination manager it wraps. Stub operationName() in the handler checkpoint spec: it reaches into the tracer's nomenclature, which the bare tracer stub doesn't provide. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * refactor and move instrumentation logic to aws-durable-execution-sdk-js so that it's only done once instead of every invocation * Apply suggestion from @BridgeAR Co-authored-by: Ruben Bridgewater <ruben@bridgewater.de> * modify a inefficient check * simplify comparison * Update packages/datadog-plugin-aws-durable-execution-sdk-js/src/handler.js Co-authored-by: Ruben Bridgewater <ruben@bridgewater.de> * Update packages/datadog-plugin-aws-durable-execution-sdk-js/src/trace-checkpoint.js Co-authored-by: Ruben Bridgewater <ruben@bridgewater.de> * Update packages/datadog-plugin-aws-durable-execution-sdk-js/src/trace-checkpoint.js Co-authored-by: Ruben Bridgewater <ruben@bridgewater.de> * enhance getDatadogOnlyPropagator * docs(aws-durable-execution-sdk-js): avoid hyphen line-break in getOperationAttempt comment Reflow so "server-maintained" stays on one line, per review feedback. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(aws-durable-execution-sdk-js): cover waitForCondition replay attempt normalization waitForCondition is the other retryable op and shares step's StepDetails.Attempt convention (1-indexed on a SUCCEEDED replay), so add a replay test asserting its operation_attempt normalizes back to 0. Guards against the SDK diverging the two operations' attempt semantics. Per review feedback. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(aws-durable-execution-sdk-js): make retryable an explicit required option Drop the `retryable = false` default and the `= {}` fallback in makeContextPlugin so every call site states retryability explicitly, matching the named-boolean-option convention used elsewhere (e.g. createCallbackInstrumentor's { captureResult }). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs(aws-durable-execution-sdk-js): spell out "operation" in util.js JSDoc Replace the "op" shorthand with "operation" in the addOpMeta and getOperationAttempt JSDoc prose. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs(aws-durable-execution-sdk-js): revert to "op" shorthand in util.js JSDoc "op" is already the established shorthand across the durable plugin, so keep the JSDoc consistent with it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs(aws-durable-execution-sdk-js): tighten getOperationAttempt JSDoc Condense the attempt-indexing explanation while keeping the non-obvious parts: the pending-vs-SUCCEEDED dual indexing and the floor-at-0 rationale. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * test(aws-durable-execution-sdk-js): cover waitForCondition live attempt=1 on second poll A multi-poll waitForCondition (first check returns shouldContinue:true) emits a separate span per poll, the second carrying operation_attempt=1. This exercises getOperationAttempt's live (non-replay) branch at attempt>0 for waitForCondition, the path that would catch the SDK ever indexing its StepDetails.Attempt differently from step's. No production change — verified the value is already correct on the current code. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * refactor(aws-durable-execution-sdk-js): declare retryable=false explicitly on RunInChildContextPlugin RunInChildContextPlugin is the one context plugin not built via makeContextPlugin, so it relied on this.constructor.retryable resolving to undefined. Declare it explicitly so every concrete plugin states its retryability, matching the required option on the makeContextPlugin calls. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * refactor(aws-durable-execution-sdk-js): resolve next-step lookup once per span start Extract getStepDataForNext, which resolves the next stepId and its checkpoint entry in one pass. addOpMeta and getOperationAttempt now consume the pre-resolved data instead of each calling getNextStepId() + getStepData(), so a retryable span start no longer traverses the SDK internals twice. No behavior change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(aws-durable-execution-sdk-js): guard getOperationAttempt against non-finite Attempt typeof NaN === 'number', so a NaN StepDetails.Attempt slipped past the old guard and propagated as a NaN metric, which span_format.js silently drops — the operation_attempt metric would vanish instead of defaulting to 0. Number.isFinite rejects NaN and Infinity too, so it falls back to 0 explicitly. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Joey Zhao <5253430+joeyzhao2018@users.noreply.github.com> Co-authored-by: Pablo Martínez Bernardo <pablo.martinezbernardo@datadoghq.com> Co-authored-by: dd-octo-sts[bot] <200755185+dd-octo-sts[bot]@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Ruben Bridgewater <ruben@bridgewater.de> Co-authored-by: pablomartinezbernardo <134320516+pablomartinezbernardo@users.noreply.github.com>
* chore(release): add v6 into release workflows * fix(sts): add v6 into job_workflow_ref
* feat(span-stats): add OTLP metrics export for span stats
Export client-computed span stats as OTLP metrics (dd.trace.span.hits,
dd.trace.span.errors, dd.trace.span.top_level_hits, dd.trace.span.duration)
via a new OtlpStatsExporter alongside the existing Datadog /v0.6/stats
exporter.
Enabled via DD_TRACE_OTEL_METRICS_ENABLED=true, or auto-enabled when both
OTEL_TRACES_EXPORTER=otlp and OTEL_METRICS_EXPORTER=otlp are set. URL and
protocol are derived from the OTLP trace export configuration.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* test(config): register traceMetrics as internal runtime property
traceMetrics is a computed aggregate derived from OTEL_TRACES_EXPORTER,
OTEL_METRICS_EXPORTER, and DD_TRACE_OTEL_METRICS_ENABLED — not a raw
user-facing key — so it belongs in INTERNAL_RUNTIME_PROPERTIES alongside
sampler and stableConfig.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(config): guard traceMetrics URL derivation against invalid OTLP endpoint
When hostname is an unbracketed IPv6 address (e.g. ::1), the defaultOtlpBase
is http://::1:4318 which is not a valid URL. The new URL() call in the
traceMetrics block was the first code path to actually parse the string,
causing a TypeError that crashed config construction.
Wrap the URL derivation in a try/catch so that a malformed traces endpoint
falls back to the localhost default without throwing.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* chore(config): regenerate config types for DD_TRACE_OTEL_METRICS_ENABLED
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(config): add configurationNames to DD_TRACE_OTEL_METRICS_ENABLED entry
The eslint-config-names-sync rule verifies that every leaf property in
TracerOptions (index.d.ts) has a matching configurationNames entry in
supported-configurations.json. The entry for DD_TRACE_OTEL_METRICS_ENABLED
only had internalPropertyName, which is not checked by the rule.
Adding configurationNames: ["traceMetricsEnabled"] ties the two files
together and satisfies the lint check. Regenerated config types to match.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* bug fix
* update post RFC discussion
* bring implementation bug fix
* clean how cinfgs are set
* feat(otlp-trace-metrics): align span-stats export with the trace-metrics contract
Update the OTLP trace-metrics export to match the agreed RFC/system-test contract:
- Rename the enablement env var to OTEL_CLIENT_STATS_COMPUTATION_ENABLED and add
DD_TRACE_OTEL_SEMANTICS_ENABLED (OTel-semantics mode: emit only OTel attributes, no dd.*).
- Emit a single histogram named traces.span.sdk.metrics.duration.
- Map dimensions to OTel attributes (span.name, span.kind, http.*, rpc.* from grpc tags) and
convey errors via OTel status.code; default mode also adds dd.operation.name, dd.span.type,
dd.origin and dd.span.top_level.
- Add telemetry.sdk.{name,language,version} resource attributes and emit process tags as dd.<key>
(default mode only); gate all dd.* resource attributes behind default mode.
- Drive the flush/export cadence from OTEL_METRIC_EXPORT_INTERVAL and drop the
_DD_TRACE_STATS_WRITER_INTERVAL override.
- Read grpc.status.code from span.metrics (numeric) with a meta fallback.
Update unit tests accordingly and regenerate config types.
Co-authored-by: Cursor <cursoragent@cursor.com>
* feat(otlp-trace-metrics): report service identity per InstrumentationScope
Partition span-stats data points by service so one OTLP payload can carry
multiple services, each as its own InstrumentationScope with service.name,
service.version and deployment.environment.name. These move off the resource,
which now only carries SDK identity, host.name and dd.* attributes.
Fix the trace-metrics flush cadence at 10s (no longer driven by
OTEL_METRIC_EXPORT_INTERVAL); the internal _DD_TRACE_METRICS_OTEL_FLUSH_INTERVAL
overrides it in tests only.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(otlp-trace-metrics): apply internal flush interval override
The generic env applier only reads DD_/OTEL_ prefixed vars, so the
internal _DD_TRACE_METRICS_OTEL_FLUSH_INTERVAL (which starts with _DD_)
was never wired into config. Read it explicitly so the test-only flush
cadence override takes effect.
Co-authored-by: Cursor <cursoragent@cursor.com>
* feat(otlp-span-stats): emit fixed explicit-bounds histogram from the DDSketch
Derive the OTLP duration histogram from each group's DDSketch into the
spanmetrics-connector default bounds (in seconds), and drop the duplicate
exact-cell accumulator in span_stats. Each group now emits at most two data
points (ok/error) with a per-group dd.span.top_level heuristic, mirroring
libdatadog.
Co-authored-by: Cursor <cursoragent@cursor.com>
* refactor(span-stats): carry service identity as resource attributes
Move service.name/service.version/deployment.environment.name onto the OTLP
resource (the configured default service), emit a single InstrumentationScope,
and add service.name as a data-point attribute only when a span's service
differs from the configured default. Thread DD_SERVICE through the processor so
the transformer can compare against it.
Co-authored-by: Cursor <cursoragent@cursor.com>
* refactor(span-stats): drop redundant dd-trace InstrumentationScope
The exported OTLP metrics no longer carry an InstrumentationScope: a `dd-trace`
scope (name/version) is redundant with the resource's telemetry.sdk.* attributes.
The single scopeMetrics omits the scope field.
Co-authored-by: Cursor <cursoragent@cursor.com>
* refactor(span-stats): datadog.* attribute prefix and OTEL_TRACES_SPAN_METRICS_ENABLED
Rename the OTLP trace-metric attributes from dd.* to datadog.* (operation.name,
span.type, span.top_level, origin, runtime_id, datadog.<process tags>) and rename
the enablement env var OTEL_CLIENT_STATS_COMPUTATION_ENABLED ->
OTEL_TRACES_SPAN_METRICS_ENABLED.
Co-authored-by: Cursor <cursoragent@cursor.com>
* feat(otlp): set _dd.stats_computed resource attribute on OTLP traces when trace metrics enabled
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(span-stats): use timer.unref?.() for Electron compatibility
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(rebase): restore extractRootTags; rename traceMetricsEnabled; add otelSemanticsEnabled to types
span_format.js: rebase conflict resolved to branch's addTag refactor which no longer exists in
master — revert to explicit typeof checks while keeping the FR06.3 BUG comment.
index.d.ts: rename traceMetricsEnabled -> otlpTraceMetricsEnabled to match supported-configurations.json;
add otelSemanticsEnabled (DD_TRACE_OTEL_SEMANTICS_ENABLED). Fixes eslint-config-names-sync errors.
Regenerate generated-config-types.d.ts from updated inputs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(span-stats): wire OTLP metrics endpoint/protocol and trim dead code
The crash fix: SpanStatsProcessor read config.otelMetricsUrl/otelMetricsProtocol,
which Config never set (dropped in 40014ae), so `new URL(undefined)` threw
ERR_INVALID_URL and crashed tracer init whenever OTLP trace metrics were enabled.
Read the canonical OTEL_EXPORTER_OTLP_METRICS_ENDPOINT/OTEL_EXPORTER_OTLP_METRICS_PROTOCOL
directly instead of introducing redundant alias properties.
Also fix the dead auto-enable check: `this.otelMetricsEnabled` does not exist
(the property is DD_METRICS_OTEL_ENABLED), so `undefined === true` made the
"auto-enable when OTLP traces + OTEL metrics are on" path never trigger.
Minimize the diff vs master without changing behavior:
- drop 4 unused SpanAggStats fields (errorDuration/topLevel*) and their test
- collapse the duplicate JSON/protobuf transformer methods into transform()
- remove two `// BUG` WIP narration comments (reverts the comment-only
span_format.js hunk; tracked separately)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
* refactor(span-stats): privatize internals, trim telemetry, simplify transformer
- Make _drainBuckets / _toLegacyPayload true private (#) — neither
crosses the class boundary; _ prefix implied false publicness
- Guard SpanStatsExporter construction behind !otlpTraceMetricsEnabled
so it is never instantiated when the OTLP path is active
- Replace #errorStatus() / #boolAttr() one-shot methods with inline
literals and a module-level ERROR_STATUS_ATTR constant to avoid
per-call allocations
- sketchToFixedHistogram now returns number[] directly; #pushPoint
references EXPLICIT_BOUNDS_SECONDS from the module constant
- Remove this.recordTelemetry calls from OtlpStatsExporter.export —
not part of the OTLP trace-metrics spec
- Rewrite whitebox _drainBuckets test as blackbox: assert buckets are
empty after onInterval() instead of calling the private method
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix configs
Co-authored-by: Munir Abdinur <munir.abdinur@datadoghq.com>
* chore: regenerate config types after supported-configurations.json update
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(opentelemetry): use otelSemanticsEnabled config key instead of DD_TRACE_OTEL_SEMANTICS_ENABLED
Our branch maps DD_TRACE_OTEL_SEMANTICS_ENABLED to the internal property
otelSemanticsEnabled via supported-configurations.json internalPropertyName.
The merged master code was reading config.DD_TRACE_OTEL_SEMANTICS_ENABLED
directly, which was undefined in our config layout.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(test): update span.spec.js to use otelSemanticsEnabled config key
The test was setting config.DD_TRACE_OTEL_SEMANTICS_ENABLED but span.js
now reads config.otelSemanticsEnabled (the internal property name mapped
from DD_TRACE_OTEL_SEMANTICS_ENABLED via supported-configurations.json).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(config): use string default for _DD_TRACE_METRICS_OTEL_FLUSH_INTERVAL
Schema requires default to be a string or null, not a number literal.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(config): update description for _DD_TRACE_METRICS_OTEL_FLUSH_INTERVAL to match registry
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(config): use short description for _DD_TRACE_METRICS_OTEL_FLUSH_INTERVAL
The description field maps to Short Description in the config registry.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(config): remove description field from _DD_TRACE_METRICS_OTEL_FLUSH_INTERVAL
No other int entry with allowed field uses description; may be mutually exclusive in schema.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(otlp-span-stats): address Codex review comments
- Use dd-trace VERSION (not app version) for telemetry.sdk.version resource attribute
- Pass OTEL_EXPORTER_OTLP_METRICS_HEADERS and OTEL_EXPORTER_OTLP_METRICS_TIMEOUT
to OtlpStatsExporter so authenticated/custom endpoints work correctly
- Fix index.d.ts doc: env var is OTEL_TRACES_SPAN_METRICS_ENABLED and
auto-enable condition is DD_METRICS_OTEL_ENABLED (not OTEL_METRICS_EXPORTER=otlp)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(lint): fix import order and no-useless-undefined in span_stats and otlp-span-stats
- Move ../../../version import before ./constants to satisfy import/order rule
- Remove explicit = undefined default for headers param (unicorn/no-useless-undefined)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* test(config): cover _DD_TRACE_METRICS_OTEL_FLUSH_INTERVAL override branch
Adds test that exercises the setAndTrack call inside the conditional
that reads the internal flush interval override from the environment.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(config): align DD_TRACE_OTEL_SEMANTICS_ENABLED with registry definition
The config registry has this entry as a plain boolean with default "false"
and no internalPropertyName. Revert our custom mapping so the entry matches
the registry exactly — the validator compares against the registered definition.
All code that previously accessed config.otelSemanticsEnabled now reads
config.DD_TRACE_OTEL_SEMANTICS_ENABLED directly; the destructuring alias
in span_stats.js preserves the otelSemanticsEnabled local variable name.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(lint): add configurationNames for otelSemanticsEnabled to satisfy eslint-config-names-sync
The rule requires that every option name in index.d.ts has a corresponding
entry in supported-configurations.json (as a key, configurationNames value,
or internalPropertyName). Adding configurationNames: ["otelSemanticsEnabled"]
to DD_TRACE_OTEL_SEMANTICS_ENABLED satisfies this while keeping default: "false"
to match the registry. The generator uses configurationNames[0] as the config
key, so code reverts to config.otelSemanticsEnabled.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor(config): remove internalPropertyName and unnecessary configurationNames
Per reviewer feedback:
- Remove internalPropertyName from _DD_TRACE_METRICS_OTEL_FLUSH_INTERVAL; the flush
interval is read directly via getEnvironmentVariable() in #applyCalculated
- Remove configurationNames/internalPropertyName from OTEL_TRACES_SPAN_METRICS_ENABLED
and drop otlpTraceMetricsEnabled as a programmatic option from index.d.ts; use
this.OTEL_TRACES_SPAN_METRICS_ENABLED directly in #applyCalculated instead
- Remove configurationNames from DD_TRACE_OTEL_SEMANTICS_ENABLED and drop
otelSemanticsEnabled as a programmatic option from index.d.ts; all callers
now read config.DD_TRACE_OTEL_SEMANTICS_ENABLED directly
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(config): resolve Unknown Config properties for otlpTraceMetricsEnabled/ddTraceMetricsOtelFlushInterval
- Map setAndTrack to OTEL_TRACES_SPAN_METRICS_ENABLED (a declared config key)
instead of the undeclared otlpTraceMetricsEnabled alias; all call sites
updated to read config.OTEL_TRACES_SPAN_METRICS_ENABLED directly
- Move _DD_TRACE_METRICS_OTEL_FLUSH_INTERVAL reading from #applyCalculated
into span_stats.js; import getEnvironmentVariable there directly — removes
the undeclared ddTraceMetricsOtelFlushInterval setAndTrack write
- Update tests to use the env-var key names and remove the now-irrelevant
config override test
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(lint): remove blank line before closing brace in config spec
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(config): generate GeneratedEnvVarConfig interface for env var types
- Extract getBaseType helper from getTypeForEntry to share base type computation
- Add getEnvVarType that only adds undefined when there is no registered default
- Add generateEnvVarConfigTypes to map every env var name (canonical + aliases) to its resolved type
- Append GeneratedEnvVarConfig interface to generated-config-types.d.ts
Rationale: Callers of getValueFromEnvSources need per-env-var typed return values instead of the full config property union, enabling type-safe lookups by literal env var name.
This commit made by [/dd:git:commit:quick](https://github.com/DataDog/claude-marketplace/tree/main/dd/commands/git/commit/quick.md)
* fix(span-stats): address PR review comments
- Gate OTLP-only SpanAggKey dimensions (origin, spanKind, rpcMethod,
rpcStatusCode) on otlpTraceMetricsEnabled to avoid inflating legacy
span stats aggregation key cardinality
- Thread _DD_TRACE_METRICS_OTEL_FLUSH_INTERVAL through the typed config
system (via setAndTrack/getValueFromEnvSources) instead of reading
the raw env var directly in SpanStatsProcessor
- Add config tests covering OTEL_TRACES_SPAN_METRICS_ENABLED auto-enable
logic (both conditions, explicit override)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(span-stats): pass otlpEnabled=true in transformer test bucket helper
makeBucket is used exclusively by OtlpStatsTransformer tests, so spans
must be keyed with otlpEnabled=true to populate the OTLP-gated fields
(origin, spanKind, rpcMethod, rpcStatusCode).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* style(span-stats): minor style and readability cleanups
- Extract flush interval to variable before setAndTrack call
- Remove unnecessary quotes on property key
- Tighten test description wording
Rationale: Small consistency and readability improvements from PR review
This commit made by [/dd:git:commit:quick](https://github.com/DataDog/claude-marketplace/tree/main/dd/commands/git/commit/quick.md)
* fix(span-stats): remove rpc.method from otlp span stats aggregation key
- Drop grpc.method.name from SpanAggKey and OtlpStatsTransformer
- rpc.method inflates aggregation key cardinality without sufficient benefit
- Update all affected test assertions
This commit made by [/dd:git:commit:quick](https://github.com/DataDog/claude-marketplace/tree/main/dd/commands/git/commit/quick.md)
* refactor(span-stats): remove stale comments and clarify TODO
- Remove redundant inline comments in otlp-span-stats transformer
- Replace misleading comment about OTLP-only dimensions with a TODO
noting origin and spanKind should eventually be included in legacy
client stats aggregation
This commit made by [/dd:git:commit:quick](https://github.com/DataDog/claude-marketplace/tree/main/dd/commands/git/commit/quick.md)
* Apply suggestion from @mabdinur
* refactor(span-stats): remove redundant inline comments
- Drop comments that restate what the code already shows
- Keep code self-documenting per project style guidelines
This commit made by [/dd:git:commit:quick](https://github.com/DataDog/claude-marketplace/tree/main/dd/commands/git/commit/quick.md)
* fix(opentelemetry): fix max-len lint violation in span_processor.js
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(opentelemetry): fix max-len lint violation in span_processor.js
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* chore: add otlp-span-stats exporter to CODEOWNERS
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor(opentelemetry): encapsulate OTLP span stats in opentelemetry/metrics
Move OtlpStatsExporter and OtlpStatsTransformer into opentelemetry/metrics/
so the opentelemetry package is self-contained ahead of potential extraction
into its own npm package.
Key changes:
- Move exporters/otlp-span-stats/{index,transformer}.js to
opentelemetry/metrics/otlp_span_stats_{exporter,transformer}.js
- Move buildResourceAttributes from span_stats.js to opentelemetry/metrics/index.js;
add createOtlpSpanStatsExporter factory there
- Wire OtlpStatsExporter via DI: opentracing/tracer.js creates it when
OTEL_TRACES_SPAN_METRICS_ENABLED and passes it through SpanProcessor to
SpanStatsProcessor — span_stats.js no longer imports from opentelemetry/
- config/index.js mirrors OTEL_TRACES_SPAN_METRICS_ENABLED into
stats.DD_TRACE_STATS_COMPUTATION_ENABLED so downstream checks are unified
- Remove otlpEnabled flag from SpanAggKey/SpanBuckets — origin, spanKind,
rpcStatusCode are always populated
- Remove OTEL-specific check from AgentExporter (relies on mirrored flag)
- Remove CODEOWNERS entry for deleted exporters/otlp-span-stats/ path
- Move tests to test/opentelemetry/metrics/
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(opentelemetry): align grpc stats with libdatadog and enforce mutual exclusion
- Move GRPC_STATUS_CODE constant to ext/tags.js
- Emit rpc.response.status_code as string (aligns with libdatadog kv_str)
- Use else if in onInterval to make native and OTLP export mutually exclusive
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor(opentelemetry): trim JSDoc, privatize transformer, drop empty description
- Make OtlpStatsExporter#transformer a private field (#transformer)
- Remove empty description field from histogram metric
- Trim redundant @param prose in exporter and transformer JSDoc
- Use GRPC_STATUS_CODE import in transformer spec instead of string literal
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(opentelemetry): suppress self-instrumentation spans from OTLP exporter requests
Wrap sendPayload's HTTP request in legacyStorage.run({ noop: true }) so the
tracer does not instrument its own outbound connections to the OTLP collector.
Without this, tcp.connect client spans for /v1/metrics requests were fed into
the traces.span.sdk.metrics.duration histogram, displacing real span data points
and inflating counts in the system-tests parametric suite.
Same pattern used by exporters/common/request.js and exporters/common/agents.js.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* test(opentelemetry): improve patch coverage for span stats OTLP export
- Inline URL parsing into OtlpHttpExporterBase constructor; remove setUrl
(the if(telemetryTags !== undefined) branch was dead — telemetryTags is
always undefined when setUrl was called from the constructor, and no
external caller ever invoked it post-construction)
- Add tests for buildResourceAttributes (sdk identity, runtime-id, OTel-
semantics mode) and createOtlpSpanStatsExporter
- Add tests for HTTP error response and request error paths in sendPayload
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(opentelemetry): emit raw grpc.status.code name for rpc.response.status_code
Prefer the meta status NAME string over the numeric metrics tag and emit it
upper-cased to rpc.response.status_code, aligning with the OTel gRPC semantic
conventions (canonical status name) without any code<->name mapping.
* fix(opentelemetry): restore setUrl method removed by dead-code cleanup
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* chore(opentelemetry): trim redundant comments in grpc status mapping
Co-authored-by: Cursor <cursoragent@cursor.com>
* refactor(opentelemetry): read grpc.status.code from meta only
Drop the numeric metrics fallback; the gRPC status code is the canonical status
NAME and is read from span meta.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(opentelemetry): translate numeric grpc.status.code from metrics to status name
The dd gRPC plugin sets grpc.status.code as a numeric integer via
span.setTag, which span_format.js routes into span.metrics rather than
span.meta. SpanAggKey was reading meta only, so rpcStatusCode was always
empty for real gRPC spans.
Now falls back to span.metrics[GRPC_STATUS_CODE] and translates the
integer to the canonical status name (OK, NOT_FOUND, etc.) using the
gRPC status code table. Meta string takes priority when both are present.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(opentelemetry): split agg key by top-level to fix mixed-bucket OTLP metrics
When a bucket contained both top-level and measured non-top-level spans,
the heuristic (topLevelHits === hits) always resolved to false, causing
the OTLP histogram to be emitted as datadog.span.top_level=false and
dropping top-level traffic from APM metrics.
Adding topLevel as a dimension to SpanAggKey causes top-level and
non-top-level spans to bucket separately. Each bucket is now always
purely top-level or purely non-top-level, so the attribute is always
accurate. The native stats path is unaffected because toJSON() omits
topLevel; the Agent merges groups with identical key fields, preserving
the same Hits/TopLevelHits totals.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(opentelemetry): track per-top-level distributions to fix native stats regression
The previous fix added topLevel to SpanAggKey to separate top-level and
non-top-level spans into distinct buckets. This created a real regression
in the native /v0.6/stats path: the Agent's mergeDuplicates() correctly
sums Hits/Errors/Duration from duplicate rows but silently drops
TopLevelHits from the merged-away entry. If the non-top-level row is
processed first and becomes the canonical, TopLevelHits from the
top-level row is lost.
Fix: revert topLevel from SpanAggKey (no more duplicate rows). Instead,
split SpanAggStats into four distributions (topLevelOk, topLevelError,
nonTopLevelOk, nonTopLevelError). The native stats path merges them at
export time so toJSON() produces the same combined OkSummary/ErrorSummary
as before. The OTLP path emits separate data points per top-level status
with the correct datadog.span.top_level attribute. OTel-semantics mode
merges the distributions (no top-level attribute to distinguish them).
Also adds SpanKind, Origin, and RpcStatusCode to the native stats
toJSON() payload so the Agent receives these new dimensions.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor(opentelemetry): split native stats rows by top-level status; use GRPCStatusCode key
toJSON() now returns an array of up to 2 rows (top-level row first, non-top-level
row second). #toLegacyPayload uses flatMap to flatten them. This eliminates the
merge-time DDSketch allocation and ensures TopLevelHits is always non-zero on
the top-level row, so the Agent's mergeDuplicates retains it as the canonical entry.
Duration and Errors are derived from distribution .sum/.count, removing the
redundant this.duration and this.errors accumulators.
GRPCStatusCode matches the agent's msgpack decoder key (confirmed from
pkg/proto/pbgo/trace/stats_gen.go).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* chore(opentelemetry): minimize comments, move GRPC_STATUS_NAMES to constants
- Remove descriptive/narrating comments throughout; keep only non-obvious constraints
- Move GRPC_STATUS_NAMES from span_stats.js into constants.js
- Rename #toLegacyPayload -> #v06Payload
- Remove section comment from ext/tags.js
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(opentelemetry): restrict ORIGIN_KEY to synthetics boolean; remove origin from aggregation key
ORIGIN_KEY now only populates SpanAggKey.synthetics. The origin string field
is removed from SpanAggKey, toString(), and the v0.6 payload (Origin is not
a field the agent decodes). In the OTLP path, datadog.origin='synthetics' is
emitted when aggKey.synthetics is true.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* chore: remove pr_description.md
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Munir <munir.abdinur@datadoghq.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
With `flushInterval: 0` (the test-agent config and the AWS Lambda config) the processor armed `setInterval(onInterval, 0)`, which fires on every event-loop tick instead of when a checkpoint is recorded. A tick landing in the window between the test agent tearing its listener down and bringing it back up posts to a dead port; the bucket is cleared on serialize, so that single payload is lost. A producer-only DSM test waiting on exactly one payload then times out. Honor `flushInterval === 0` as the flush-on-write sentinel the agent and agentless trace exporters already use: skip the timer and push each checkpoint, offset, and transaction the moment it is recorded, while the writer URL is live.
Overall package sizeSelf size: 6.52 MB Dependency sizes| name | version | self size | total size | |------|---------|-----------|------------| | import-in-the-middle | 3.2.0 | 104.26 kB | 843.44 kB | | opentracing | 0.14.7 | 194.81 kB | 194.81 kB | | dc-polyfill | 0.1.11 | 25.74 kB | 25.74 kB |🤖 This report was automatically generated by heaviest-objects-in-the-universe |
|
* add breaking changes to release proposal * fetch from master instead * removes stable path fallback * fix wrong look up
4d2a0ff to
e8c1558
Compare
e8c1558 to
c723e09
Compare
There was a problem hiding this comment.
💡 Codex Review
Lines 56 to 58 in c723e09
For non-Jest Test Optimization workers that initialize through ci/init, this still passes the old telemetry.enabled programmatic option. The v6 config table now stores this as telemetry.DD_INSTRUMENTATION_TELEMETRY_ENABLED and no longer registers telemetry.enabled, so this option is ignored and Mocha/Cucumber/Playwright/Vitest workers keep instrumentation telemetry enabled instead of suppressing it as intended, adding extra telemetry work and traffic from every worker process.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
More details
All 21 adversarial test scenarios targeting the PR's riskiest behavioral changes pass cleanly: SpanAggStats.toJSON() correctly returns an array (not object), flatMap in #toV06Payload handles it correctly, SpanAggKey gRPC code mapping is correct for all edge cases (code 0, out-of-range, string, missing), sketchToFixedHistogram places spans in the right histogram buckets including overflow, SpanStatsProcessor correctly reads DD_TRACE_STATS_COMPUTATION_ENABLED (ignoring the old enabled key), and makeUtilities reads DD_LANGCHAIN_SPAN_CHAR_LIMIT/DD_VERTEXAI_SPAN_CHAR_LIMIT correctly while ignoring the old camelCase spanCharLimit key. The one failing test in proxy.spec.js (line 410) is pre-existing on the v6.x baseline — DD_AGENT_HOST='' in the sandbox causes url.format({hostname:''}) to produce an invalid URL in Config constructor, the try-catch in proxy.init() swallows it, and the #updateTracing stub is never called. The diff-relative change (renaming payload key tracing → DD_TRACE_ENABLED) is not the cause.
📊 Validated against 21 scenarios · Open Bits AI session
🤖 Datadog Autotest · Commit c723e09 · What is Autotest? · Any feedback? Reach out in #autotest
Breaking Changes
Features
Fixes
Performance
Documentation
Internal (CI, Testing, Benchmarking)
Contributors