feat: add adaptive sampling to node sdk by jy-tan · Pull Request #155 · Use-Tusk/drift-node-sdk

jy-tan · 2026-04-11T01:33:02Z

Summary

Add the Node half of adaptive sampling so the SDK can shed root-request recording load locally based on queue, exporter, event-loop, and memory pressure signals while preserving whole-trace semantics and pre-app-start capture.

This keeps admission decisions local to the SDK process and defers backend-driven rarity sampling in favor of app and exporter protection.

Changes

Parse recording.sampling.mode, base_rate, and min_rate while keeping legacy recording.sampling_rate behavior and supporting TUSK_SAMPLING_RATE as a compatibility alias alongside canonical TUSK_RECORDING_SAMPLING_RATE
Add AdaptiveSamplingController and wire it into TuskDrift with periodic health updates driven by queue fill, dropped spans, export failures/timeouts, event-loop lag, and memory pressure
Replace the opaque batching path with a Drift-owned DriftBatchSpanProcessor so queue health, dropped spans, and export failures are first-class control signals for load shedding
Add exporter retry and circuit-breaker resilience in the API span adapter and surface exporter health through TdSpanExporter
Gate inbound HTTP and Next.js root requests through adaptive admission and run sampled-out requests under explicit no-record context propagation
Preserve pre-app-start capture for the first inbound HTTP request before auto-marking the app ready
Update the Node docs for adaptive sampling configuration and add focused tests for controller behavior, env-var precedence, batch-processor health, and HTTP pre-app-start admission

cubic-dev-ai

5 issues found across 18 files

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/core/tracing/adapters/ApiSpanAdapter.ts">

<violation number="1" location="src/core/tracing/adapters/ApiSpanAdapter.ts:233">
P2: Timeout failures are not counted because the abort check only matches `AbortError`, while this code aborts with `new Error(...)` as the reason.</violation>
</file>

<file name="docs/nextjs-initialization.md">

<violation number="1" location="docs/nextjs-initialization.md:207">
P2: The startup note overstates pre-ready capture semantics. It says all requests before `markAppAsReady()` are always recorded, but this feature is described as preserving only the first inbound pre-app-start request.</violation>
</file>

<file name="src/core/tracing/TdSpanExporter.ts">

<violation number="1" location="src/core/tracing/TdSpanExporter.ts:102">
P2: Preserve `null` when no export latency has been observed instead of coercing it to `0` during aggregation.</violation>
</file>

<file name="src/core/sampling/AdaptiveSamplingController.ts">

<violation number="1" location="src/core/sampling/AdaptiveSamplingController.ts:205">
P2: When adaptive load shedding drives `effectiveRate` to exactly 0 (hot/warm state with `minRate = 0`), the reason is misreported as `"not_sampled"` instead of `"load_shed"`. This breaks observability: operators cannot distinguish "configured rate is zero" from "adaptive controller suppressed all traffic".</violation>
</file>

<file name="src/core/tracing/DriftBatchSpanProcessor.ts">

<violation number="1" location="src/core/tracing/DriftBatchSpanProcessor.ts:51">
P2: `void this.flushOneBatch()` discards the returned promise. If the exporter throws synchronously inside `export()`, the resulting rejection is unhandled and can crash the process. Add a `.catch()` at these fire-and-forget call sites, or wrap the `this.exporter.export(...)` call in a try/catch that calls `resolve()`.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.}

cubic-dev-ai · 2026-04-11T01:38:09Z

src/core/tracing/adapters/ApiSpanAdapter.ts

+        throw new Error(`Remote export failed: ${parsed.message}`);
+      }
+    } catch (error) {
+      if (error instanceof Error && error.name === "AbortError") {


P2: Timeout failures are not counted because the abort check only matches AbortError, while this code aborts with new Error(...) as the reason.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/core/tracing/adapters/ApiSpanAdapter.ts, line 233: <comment>Timeout failures are not counted because the abort check only matches `AbortError`, while this code aborts with `new Error(...)` as the reason.</comment> <file context> @@ -178,4 +167,76 @@ export class ApiSpanAdapter implements SpanExportAdapter { + throw new Error(`Remote export failed: ${parsed.message}`); + } + } catch (error) { + if (error instanceof Error && error.name === "AbortError") { + this.timeoutCount += 1; + throw error; </file context>

Suggested change

if (error instanceof Error && error.name === "AbortError") {

if (

(error instanceof Error && error.name === "AbortError") ||

error === controller.signal.reason

) {

cubic-dev-ai · 2026-04-11T01:38:09Z

docs/nextjs-initialization.md

+   - `recording.sampling.mode` comes from `.tusk/config.yaml` and defaults to `fixed`
+   - `recording.sampling.min_rate` is only used in `adaptive` mode and defaults to `0.001` when omitted
+
+> **Note:** Requests before `TuskDrift.markAppAsReady()` are always recorded. Sampling applies to normal inbound traffic after startup.


P2: The startup note overstates pre-ready capture semantics. It says all requests before markAppAsReady() are always recorded, but this feature is described as preserving only the first inbound pre-app-start request.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At docs/nextjs-initialization.md, line 207: <comment>The startup note overstates pre-ready capture semantics. It says all requests before `markAppAsReady()` are always recorded, but this feature is described as preserving only the first inbound pre-app-start request.</comment> <file context> @@ -184,17 +184,31 @@ More context on setting up instrumentations for Next.js apps can be found [here] + - `recording.sampling.mode` comes from `.tusk/config.yaml` and defaults to `fixed` + - `recording.sampling.min_rate` is only used in `adaptive` mode and defaults to `0.001` when omitted + +> **Note:** Requests before `TuskDrift.markAppAsReady()` are always recorded. Sampling applies to normal inbound traffic after startup. ### Method 1: Init Parameter </file context>

Suggested change

> **Note:** Requests before `TuskDrift.markAppAsReady()` are always recorded. Sampling applies to normal inbound traffic after startup.

> **Note:** The first inbound request before the SDK marks the app ready is always recorded. Sampling applies to normal inbound traffic after startup.

cubic-dev-ai · 2026-04-11T01:38:09Z

src/core/tracing/TdSpanExporter.ts

+        lastExportLatencyMs: Math.max(
+          accumulator.lastExportLatencyMs ?? 0,
+          snapshot.lastExportLatencyMs ?? 0,
+        ),


P2: Preserve null when no export latency has been observed instead of coercing it to 0 during aggregation.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/core/tracing/TdSpanExporter.ts, line 102: <comment>Preserve `null` when no export latency has been observed instead of coercing it to `0` during aggregation.</comment> <file context> @@ -71,6 +80,39 @@ export class TdSpanExporter implements SpanExporter { + failureCount: accumulator.failureCount + snapshot.failureCount, + timeoutCount: accumulator.timeoutCount + snapshot.timeoutCount, + circuitOpen: accumulator.circuitOpen || snapshot.circuitState === "open", + lastExportLatencyMs: Math.max( + accumulator.lastExportLatencyMs ?? 0, + snapshot.lastExportLatencyMs ?? 0, </file context>

Suggested change

lastExportLatencyMs: Math.max(

accumulator.lastExportLatencyMs ?? 0,

snapshot.lastExportLatencyMs ?? 0,

),

lastExportLatencyMs:

snapshot.lastExportLatencyMs === null

? accumulator.lastExportLatencyMs

: accumulator.lastExportLatencyMs === null

? snapshot.lastExportLatencyMs

: Math.max(accumulator.lastExportLatencyMs, snapshot.lastExportLatencyMs),

cubic-dev-ai · 2026-04-11T01:38:09Z

src/core/sampling/AdaptiveSamplingController.ts

+    if (effectiveRate <= 0) {
+      return {
+        shouldRecord: false,
+        reason: this.state === "critical_pause" ? "critical_pause" : "not_sampled",


P2: When adaptive load shedding drives effectiveRate to exactly 0 (hot/warm state with minRate = 0), the reason is misreported as "not_sampled" instead of "load_shed". This breaks observability: operators cannot distinguish "configured rate is zero" from "adaptive controller suppressed all traffic".

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/core/sampling/AdaptiveSamplingController.ts, line 205: <comment>When adaptive load shedding drives `effectiveRate` to exactly 0 (hot/warm state with `minRate = 0`), the reason is misreported as `"not_sampled"` instead of `"load_shed"`. This breaks observability: operators cannot distinguish "configured rate is zero" from "adaptive controller suppressed all traffic".</comment> <file context> @@ -0,0 +1,283 @@ + if (effectiveRate <= 0) { + return { + shouldRecord: false, + reason: this.state === "critical_pause" ? "critical_pause" : "not_sampled", + mode: this.config.mode, + state: this.state, </file context>

cubic-dev-ai · 2026-04-11T01:38:09Z

src/core/tracing/DriftBatchSpanProcessor.ts

+    this.config = config;
+    this.mode = mode;
+    this.interval = setInterval(() => {
+      void this.flushOneBatch();


P2: void this.flushOneBatch() discards the returned promise. If the exporter throws synchronously inside export(), the resulting rejection is unhandled and can crash the process. Add a .catch() at these fire-and-forget call sites, or wrap the this.exporter.export(...) call in a try/catch that calls resolve().

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/core/tracing/DriftBatchSpanProcessor.ts, line 51: <comment>`void this.flushOneBatch()` discards the returned promise. If the exporter throws synchronously inside `export()`, the resulting rejection is unhandled and can crash the process. Add a `.catch()` at these fire-and-forget call sites, or wrap the `this.exporter.export(...)` call in a try/catch that calls `resolve()`.</comment> <file context> @@ -0,0 +1,152 @@ + this.config = config; + this.mode = mode; + this.interval = setInterval(() => { + void this.flushOneBatch(); + }, this.config.scheduledDelayMillis); + this.interval.unref?.(); </file context>

Commit

488a1aa

jy-tan requested review from sohankshirsagar and sohil-kshirsagar April 11, 2026 01:33

cubic-dev-ai bot reviewed Apr 11, 2026

View reviewed changes

sohil-kshirsagar approved these changes Apr 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add adaptive sampling to node sdk#155

feat: add adaptive sampling to node sdk#155
jy-tan wants to merge 1 commit intomainfrom
adaptive-sampling

jy-tan commented Apr 11, 2026

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

cubic-dev-ai bot Apr 11, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai bot Apr 11, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai bot Apr 11, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai bot Apr 11, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai bot Apr 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-      if (error instanceof Error && error.name === "AbortError") {
+      if (
+        (error instanceof Error && error.name === "AbortError") ||
+        error === controller.signal.reason
+      ) {

	> Note: Requests before `TuskDrift.markAppAsReady()` are always recorded. Sampling applies to normal inbound traffic after startup.
	> Note: The first inbound request before the SDK marks the app ready is always recorded. Sampling applies to normal inbound traffic after startup.

-        lastExportLatencyMs: Math.max(
-          accumulator.lastExportLatencyMs ?? 0,
-          snapshot.lastExportLatencyMs ?? 0,
-        ),
+        lastExportLatencyMs:
+          snapshot.lastExportLatencyMs === null
+            ? accumulator.lastExportLatencyMs
+            : accumulator.lastExportLatencyMs === null
+              ? snapshot.lastExportLatencyMs
+              : Math.max(accumulator.lastExportLatencyMs, snapshot.lastExportLatencyMs),

Conversation

jy-tan commented Apr 11, 2026

Summary

Changes

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cubic-dev-ai bot Apr 11, 2026 •

edited

Loading

cubic-dev-ai bot Apr 11, 2026 •

edited

Loading

cubic-dev-ai bot Apr 11, 2026 •

edited

Loading

cubic-dev-ai bot Apr 11, 2026 •

edited

Loading

cubic-dev-ai bot Apr 11, 2026 •

edited

Loading