Skip to content

feat(weave): add centralized trace_sample_rate setting with eval carve-out#7008

Draft
ro31337 wants to merge 1 commit into
masterfrom
roman/trace-sample-rate
Draft

feat(weave): add centralized trace_sample_rate setting with eval carve-out#7008
ro31337 wants to merge 1 commit into
masterfrom
roman/trace-sample-rate

Conversation

@ro31337
Copy link
Copy Markdown
Contributor

@ro31337 ro31337 commented May 29, 2026

JIRA Issue(s)

https://coreweave.atlassian.net/browse/WB-34680

Description

Weave can already sample traces per op with @weave.op(tracing_sample_rate=...), but there is no single switch to limit everything an application sends. This adds a centralized trace_sample_rate setting, configurable with the WEAVE_TRACE_SAMPLE_RATE environment variable or through weave.init(settings=...). It applies to every root call: the decision is made once on the trace root and child calls inherit it, so a trace is always kept or dropped as a whole rather than ending up half-recorded. It composes multiplicatively with the per-op rate (a global 0.5 and a per-op 0.5 keep about 25%), and it is off by default — 1.0 keeps everything, so nothing changes until someone opts in.

Evaluations are deliberately exempt: an Evaluation.evaluate root is never sampled out, so evals always appear in full even at a low rate. A vanished evaluation would be far more surprising than a dropped ad-hoc trace, and because the SDK knows locally whether a trace is an eval, the carve-out is a simple, race-free check at the root that covers both the declarative and imperative eval paths.

Why this approach

Sampling in the SDK, rather than dropping traces on the server, lets the client use the full local trace context to keep evals correctly and to drop a trace before it ever goes over the network — saving bandwidth as well as storage. It also matches the common pattern across the ecosystem, where sampling is client-side and decided at the trace root. It is meant to be set once by whoever owns a deployment, so individual engineers do not each have to remember a rate per op. Writers that talk to the REST or OTEL endpoints directly bypass the SDK and are out of scope here.

Testing

Unit tests cover the setting (default, clamping to [0, 1], env-var coercion, and staying in sync with the override TypedDict) and the behavior: off by default, dropping non-eval roots at 0.0 while the wrapped function still runs, env-var control, multiplicative composition, and the eval carve-out for both the declarative Evaluation.evaluate and the imperative EvaluationLogger paths. Verified locally on both backends — tests/trace/test_trace_settings.py (46 passed) and the sampling and carve-out tests in tests/trace/test_client_trace.py (13 passed on SQLite and ClickHouse) — with the existing op-sampling suite still green.

…e-out

Adds a centralized trace_sample_rate setting (WEAVE_TRACE_SAMPLE_RATE env var
or weave.init) that samples whole traces at the root, composed multiplicatively
with the existing per-op tracing_sample_rate. The decision is made once on the
root call and children inherit it, so a trace is kept or dropped as a whole.

Evaluation roots are never sampled out, so evals are always preserved. Off by
default (1.0), so existing behavior is unchanged until a rate is set.
@wandbot-3000
Copy link
Copy Markdown

wandbot-3000 Bot commented May 29, 2026

@codecov
Copy link
Copy Markdown

codecov Bot commented May 29, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant