Skip to content

agents,lib,src,test: add traceSampleRate support#430

Open
santigimeno wants to merge 1 commit intonode-v24.x-nsolid-v6.xfrom
santi/tracing_rate
Open

agents,lib,src,test: add traceSampleRate support#430
santigimeno wants to merge 1 commit intonode-v24.x-nsolid-v6.xfrom
santi/tracing_rate

Conversation

@santigimeno
Copy link
Copy Markdown
Member

@santigimeno santigimeno commented Mar 3, 2026

Add end-to-end traceSampleRate handling across config, runtime propagation, tracing decisions, and regression tests.

Why:

  • Enable configurable probabilistic trace sampling with predictable behavior.
  • Ensure consistent semantics across all config entry points.
  • Prevent invalid updates from corrupting current sampling behavior.
  • Keep transaction consistency by deciding sampling at the root span only.

What changed:

  • Added traceSampleRate parsing and normalization in JS config paths with explicit default fallback and finite/range validation in [0, 1].
  • Added native config sanitization for traceSampleRate to reject invalid values before merge, preserving previous valid configuration.
  • Ensured runtime sampling state is synchronized from effective current config after updates to avoid stale shared-memory sample rates.
  • Added gRPC reconfigure support for traceSampleRate in proto and agent mapping, including generated protobuf updates.
  • Updated tracing logic so root spans perform the sampling decision and child spans inherit parent traceFlags.
  • Extended tests for:
    • invalid value handling (including NaN/Infinity)
    • env/package bootstrap behavior
    • partial updates preserving existing traceSampleRate
    • gRPC invalid-update fallback behavior
    • sampling behavior at 0%, 50% (tolerance), and 100%
    • worker-thread sampling behavior
    • explicit parent/child trace consistency assertions

Summary by CodeRabbit

  • New Features

    • Probabilistic root-span sampling (configurable via env, package config, or runtime API); default 1.0 and observable from JS at runtime (exposed sampling value).
    • Runtime reconfiguration preserves and propagates the trace sampling rate across environments.
  • Bug Fixes

    • Invalid, non-finite, or out-of-range sampling values are sanitized/rejected and do not overwrite a prior valid rate.
  • Tests

    • Added tests and fixtures validating config precedence, validation/persistence, reconfigure behavior, sampling statistics and cross-thread sampling.

@santigimeno santigimeno requested a review from RafaelGSS March 3, 2026 16:01
@santigimeno santigimeno self-assigned this Mar 3, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 3, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds an optional double traceSampleRate to the reconfigure protobuf and wires it end-to-end: gRPC proto + generated C++ support, runtime validation and propagation into a per-env shared binding, JS config parsing, probabilistic root-span sampling, and tests/fixtures for config precedence, validation, reconfigure, and sampling.

Changes

Trace sampling end-to-end

Layer / File(s) Summary
Data Shape
agents/grpc/proto/reconfigure.proto
Adds optional double traceSampleRate = 14 to ReconfigureBody.
Generated / Serialization
agents/grpc/src/proto/reconfigure.pb.h, agents/grpc/src/proto/reconfigure.pb.cc
Generated C++ support for new field: storage, accessors, has-bit layout, parse table, serialization, merge/clear/size logic, and descriptor metadata updated.
gRPC wiring
agents/grpc/src/grpc_agent.cc
PopulateReconfigureEvent reads traceSampleRate from parsed JSON into ReconfigureBody; reconfigure maps tracesamplerate back into outgoing JSON when present.
Runtime C++ API & binding
src/nsolid/nsolid_api.h, src/nsolid/nsolid_api.cc
Adds EnvList atomic trace_sample_rate_ and per-EnvInst trace_sample_rate_; validation helpers (validate_trace_sample_rate, validate_config); update_tracing_sample_rate to sanitize and propagate rate to EnvInsts; exports JS-visible trace_sample_rate ArrayBuffer (Float64Array).
JS config & tracing logic
lib/nsolid.js, lib/internal/otel/trace.js
Adds DEFAULT_TRACING_SAMPLING_RATE and parseTraceSampleRate(); initialize/update use validated value; root-span sampling becomes probabilistic using binding.trace_sample_rate[0]; child spans inherit parent traceFlags.
Tests & Fixtures
test/fixtures/*, test/parallel/*, test/agents/*, test/addons/*
New fixtures and tests: env/package precedence, config validation (reject out-of-range/non-finite), reconfigure propagation, sampling behavior (including worker threads), and child-span flag inheritance; updated gRPC reconfigure tests to use Number conversion and invalid-rate-preservation checks.

Sequence Diagram

sequenceDiagram
    participant App as JavaScript App
    participant Config as nsolid.config
    participant gRPC as gRPC Agent
    participant EnvList as EnvList (C++)
    participant Binding as binding.trace_sample_rate
    participant Tracer as OpenTelemetry Tracer
    participant Span as Root Span

    App->>Config: updateConfig({ traceSampleRate: X })
    Config->>Config: parseTraceSampleRate(X)
    Config->>gRPC: send reconfigure (traceSampleRate: X)
    gRPC->>EnvList: receive reconfigure
    EnvList->>EnvList: validate_trace_sample_rate -> sanitize
    EnvList->>EnvList: update_tracing_sample_rate(X)
    EnvList->>Binding: write trace_sample_rate (shared buffer)

    App->>Tracer: start root span
    Tracer->>Binding: read trace_sample_rate
    Tracer->>Span: decide sampled = Math.random() < trace_sample_rate
    alt sampled
        Span->>Span: set traceFlags = SAMPLED
    else not sampled
        Span->>Span: set traceFlags = NONE
    end
    Span-->>Tracer: return span
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • RafaelGSS
  • EHortua

Poem

🐰 I tucked a number small and bright,

into proto, C++, JS — tight.
Roots now roll dice, children keep the flag,
shared through envs while devs sip their tea and wag. 🍵🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 4.65% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: adding traceSampleRate support across multiple components (agents, lib, src, test).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch santi/tracing_rate

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
test/agents/test-grpc-reconfigure.mjs (1)

171-208: Consider adding NaN/Infinity invalid-rate cases in this gRPC path test.

This block currently checks only out-of-range finite numbers. Extending it with Number.NaN and ±Infinity would better guard the gRPC reconfigure edge cases already covered in non-gRPC tests.

✅ Minimal test extension
-        const invalidRates = [2, -0.5];
+        const invalidRates = [2, -0.5, Number.NaN, Number.POSITIVE_INFINITY, Number.NEGATIVE_INFINITY];
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/agents/test-grpc-reconfigure.mjs` around lines 171 - 208, Update the
test "should preserve previous traceSampleRate for invalid values" to include
NaN and Infinity cases: add Number.NaN, Number.POSITIVE_INFINITY and
Number.NEGATIVE_INFINITY (or +/-Infinity) to the invalidRates array used with
grpcServer.reconfigure and client.config assertions so
grpcServer.reconfigure(agentId, { traceSampleRate: invalidRate }) is exercised
for NaN/Infinity and the existing assert.strictEqual checks that
nsolidConfig.traceSampleRate remained 0.4 still apply.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@agents/grpc/src/grpc_agent.cc`:
- Around line 1761-1763: PopulateReconfigureEvent currently omits the
traceSampleRate field when building the outgoing reconfigure event body; update
PopulateReconfigureEvent to set body.traceSampleRate from the internal config so
the outgoing event mirrors inbound updates (i.e., when you previously read
body.tracesamplerate() on incoming updates, ensure PopulateReconfigureEvent
calls the corresponding setter to populate body.traceSampleRate in the event).
Locate the PopulateReconfigureEvent function in grpc_agent.cc and add the
traceSampleRate assignment using the same source/field used for other mapped
settings so reconfigure responses include traceSampleRate.

In `@lib/nsolid.js`:
- Around line 1144-1154: The parseTraceSampleRate function currently coerces
unintended types via unary +; update parseTraceSampleRate to only accept inputs
that are typeof 'number' or 'string', explicitly reject booleans and other
types, trim string inputs and return undefined for empty/whitespace-only
strings, then convert the trimmed numeric string (or the number input) to a
numeric value and validate with NumberIsFinite and range checks (>=0 && <=1);
ensure the function returns undefined for invalid types, whitespace-only
strings, NaN, non-finite numbers, or values outside the 0–1 range while
returning the numeric rate for valid inputs.

In `@test/parallel/test-nsolid-config-trace-sample-rate-env.js`:
- Around line 15-18: The test currently spreads process.env into the child env
(env: {...process.env, ...envVars}), which can leak ambient NSOLID_* variables
and make the test nondeterministic; update the env construction in
test/parallel/test-nsolid-config-trace-sample-rate-env.js to explicitly filter
out any keys starting with "NSOLID_" (e.g., build a filteredEnv =
Object.fromEntries(Object.entries(process.env).filter(([k]) =>
!k.startsWith('NSOLID_'))) and then use env: { ...filteredEnv, ...envVars } so
the child process is isolated from ambient NSOLID_* variables while still
allowing envVars to override.

---

Nitpick comments:
In `@test/agents/test-grpc-reconfigure.mjs`:
- Around line 171-208: Update the test "should preserve previous traceSampleRate
for invalid values" to include NaN and Infinity cases: add Number.NaN,
Number.POSITIVE_INFINITY and Number.NEGATIVE_INFINITY (or +/-Infinity) to the
invalidRates array used with grpcServer.reconfigure and client.config assertions
so grpcServer.reconfigure(agentId, { traceSampleRate: invalidRate }) is
exercised for NaN/Infinity and the existing assert.strictEqual checks that
nsolidConfig.traceSampleRate remained 0.4 still apply.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 369ce34 and 5048dab.

📒 Files selected for processing (15)
  • agents/grpc/proto/reconfigure.proto
  • agents/grpc/src/grpc_agent.cc
  • agents/grpc/src/proto/reconfigure.pb.cc
  • agents/grpc/src/proto/reconfigure.pb.h
  • lib/internal/otel/trace.js
  • lib/nsolid.js
  • src/nsolid/nsolid_api.cc
  • src/nsolid/nsolid_api.h
  • test/addons/nsolid-tracing/test-otel-basic2.js
  • test/agents/test-grpc-reconfigure.mjs
  • test/fixtures/nsolid-trace-sample-rate-package.json
  • test/fixtures/test-nsolid-config-trace-sample-rate-env-script.js
  • test/parallel/test-nsolid-config-trace-sample-rate-env.js
  • test/parallel/test-nsolid-config-trace-sample-rate.js
  • test/parallel/test-nsolid-trace-sample-rate-sampling.js

Comment thread agents/grpc/src/grpc_agent.cc
Comment thread lib/nsolid.js
Comment thread test/parallel/test-nsolid-config-trace-sample-rate-env.js
RafaelGSS
RafaelGSS previously approved these changes Mar 3, 2026
Copy link
Copy Markdown
Member

@RafaelGSS RafaelGSS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RSLGTM

coderabbitai[bot]
coderabbitai Bot previously approved these changes Mar 9, 2026
coderabbitai[bot]
coderabbitai Bot previously approved these changes May 4, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/nsolid/nsolid_api.cc (1)

1251-1256: 💤 Low value

Prefer checking config instead of curr to avoid redundant propagation.

Currently, update_tracing_sample_rate() is called whenever traceSampleRate exists in the merged curr config, even if the incoming config didn't include it. This triggers RunCommand to all EnvInst instances on every config update—even for unrelated fields.

Other fields like promiseTracking (line 1202) check config.find() instead, which limits action to updates that actually include that field.

♻️ Suggested fix
-    it = curr.find("traceSampleRate");
-    DCHECK(it == curr.end() || it->is_number());
-    if (it != curr.end() && it->is_number()) {
+    it = config.find("traceSampleRate");
+    if (it != config.end()) {
+      // Validation already ran; curr has the sanitized value
+      auto curr_it = curr.find("traceSampleRate");
+      DCHECK(curr_it != curr.end() && curr_it->is_number());
       update_tracing_sample_rate(*it);
     }

Alternatively, if re-applying the same rate is acceptable, consider adding a short-circuit in update_tracing_sample_rate() to skip propagation when the value hasn't changed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/nsolid/nsolid_api.cc` around lines 1251 - 1256, The current code checks
curr for "traceSampleRate" and calls update_tracing_sample_rate even when the
incoming config didn't include it; change the condition to check
config.find("traceSampleRate") (like promiseTracking does) and only call
update_tracing_sample_rate when the incoming config contains that key, so you
don't propagate RunCommand to all EnvInsts for unrelated updates; alternatively,
if you prefer keeping the check on curr, add an early-return/short-circuit
inside update_tracing_sample_rate to detect and skip re-propagation when the
effective sample rate value hasn't changed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/nsolid/nsolid_api.cc`:
- Around line 1251-1256: The current code checks curr for "traceSampleRate" and
calls update_tracing_sample_rate even when the incoming config didn't include
it; change the condition to check config.find("traceSampleRate") (like
promiseTracking does) and only call update_tracing_sample_rate when the incoming
config contains that key, so you don't propagate RunCommand to all EnvInsts for
unrelated updates; alternatively, if you prefer keeping the check on curr, add
an early-return/short-circuit inside update_tracing_sample_rate to detect and
skip re-propagation when the effective sample rate value hasn't changed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7ba3f980-5ecd-4511-aa15-09c8a7a00c29

📥 Commits

Reviewing files that changed from the base of the PR and between 57df292 and ea1123a.

📒 Files selected for processing (15)
  • agents/grpc/proto/reconfigure.proto
  • agents/grpc/src/grpc_agent.cc
  • agents/grpc/src/proto/reconfigure.pb.cc
  • agents/grpc/src/proto/reconfigure.pb.h
  • lib/internal/otel/trace.js
  • lib/nsolid.js
  • src/nsolid/nsolid_api.cc
  • src/nsolid/nsolid_api.h
  • test/addons/nsolid-tracing/test-otel-basic2.js
  • test/agents/test-grpc-reconfigure.mjs
  • test/fixtures/nsolid-trace-sample-rate-package.json
  • test/fixtures/test-nsolid-config-trace-sample-rate-env-script.js
  • test/parallel/test-nsolid-config-trace-sample-rate-env.js
  • test/parallel/test-nsolid-config-trace-sample-rate.js
  • test/parallel/test-nsolid-trace-sample-rate-sampling.js
✅ Files skipped from review due to trivial changes (7)
  • test/fixtures/nsolid-trace-sample-rate-package.json
  • agents/grpc/proto/reconfigure.proto
  • agents/grpc/src/grpc_agent.cc
  • test/parallel/test-nsolid-trace-sample-rate-sampling.js
  • agents/grpc/src/proto/reconfigure.pb.h
  • test/fixtures/test-nsolid-config-trace-sample-rate-env-script.js
  • agents/grpc/src/proto/reconfigure.pb.cc
🚧 Files skipped from review as they are similar to previous changes (7)
  • lib/internal/otel/trace.js
  • test/parallel/test-nsolid-config-trace-sample-rate-env.js
  • test/parallel/test-nsolid-config-trace-sample-rate.js
  • test/addons/nsolid-tracing/test-otel-basic2.js
  • lib/nsolid.js
  • test/agents/test-grpc-reconfigure.mjs
  • src/nsolid/nsolid_api.h

Comment thread src/nsolid/nsolid_api.cc
}

it = curr.find("traceSampleRate");
DCHECK(it == curr.end() || it->is_number());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is DCHECK the best assertion here? I guess we want to log it?

Add end-to-end traceSampleRate handling across config, runtime
propagation, tracing decisions, and regression tests.

Why:
- Enable configurable probabilistic trace sampling with predictable
  behavior.
- Ensure consistent semantics across all config entry points.
- Prevent invalid updates from corrupting current sampling behavior.
- Keep transaction consistency by deciding sampling at the root span
  only.

What changed:
- Added traceSampleRate parsing and normalization in JS config paths
  with explicit default fallback and finite/range validation in [0, 1].
- Added native config sanitization for traceSampleRate to reject invalid
  values before merge, preserving previous valid configuration.
- Ensured runtime sampling state is synchronized from effective current
  config after updates to avoid stale shared-memory sample rates.
- Added gRPC reconfigure support for traceSampleRate in proto and agent
  mapping, including generated protobuf updates.
- Updated tracing logic so root spans perform the sampling decision and
  child spans inherit parent traceFlags.
- Extended tests for:
  - invalid value handling (including NaN/Infinity)
  - env/package bootstrap behavior
  - partial updates preserving existing traceSampleRate
  - gRPC invalid-update fallback behavior
  - sampling behavior at 0%, 50% (tolerance), and 100%
  - worker-thread sampling behavior
  - explicit parent/child trace consistency assertions
@santigimeno santigimeno force-pushed the santi/tracing_rate branch from ea1123a to d46dcf9 Compare May 5, 2026 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants