pkg/hook: generic pub/sub hook for pipeline observation (SMP benchmark branch)#49883
pkg/hook: generic pub/sub hook for pipeline observation (SMP benchmark branch)#49883misteriaud wants to merge 48 commits intomainfrom
Conversation
Go Package Import DifferencesBaseline: f05ed3b
|
b08ec01 to
9b9f49d
Compare
Files inventory check summaryFile checks results against ancestor f05ed3b1: Results for datadog-agent_7.80.0~devel.git.255.df6cb5f.pipeline.110390822-1_amd64.deb:No change detected |
This comment has been minimized.
This comment has been minimized.
Static quality checks✅ Please find below the results from static quality gates Successful checksInfo
4 successful checks with minimal change (< 2 KiB)
On-wire sizes (compressed)
|
Regression DetectorRegression Detector ResultsMetrics dashboard Baseline: f05ed3b Optimization Goals: ✅ No significant changes detected
|
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | docker_containers_cpu | % cpu utilization | +3.13 | [+0.03, +6.23] | 1 | Logs |
Fine details of change detection per experiment
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | docker_containers_cpu | % cpu utilization | +3.13 | [+0.03, +6.23] | 1 | Logs |
| ➖ | quality_gate_logs | % cpu utilization | +2.04 | [+0.36, +3.71] | 1 | Logs bounds checks dashboard |
| ➖ | quality_gate_metrics_logs_hook_8sub | memory utilization | +1.16 | [+0.90, +1.42] | 1 | Logs bounds checks dashboard |
| ➖ | quality_gate_metrics_logs_hook_6sub | memory utilization | +1.05 | [+0.79, +1.30] | 1 | Logs bounds checks dashboard |
| ➖ | otlp_ingest_logs | memory utilization | +1.00 | [+0.89, +1.12] | 1 | Logs |
| ➖ | ddot_logs | memory utilization | +0.72 | [+0.66, +0.79] | 1 | Logs |
| ➖ | ddot_metrics_sum_delta | memory utilization | +0.41 | [+0.22, +0.60] | 1 | Logs |
| ➖ | ddot_metrics_sum_cumulativetodelta_exporter | memory utilization | +0.35 | [+0.11, +0.59] | 1 | Logs |
| ➖ | uds_dogstatsd_20mb_12k_contexts_20_senders_hook_5sub | memory utilization | +0.34 | [+0.29, +0.39] | 1 | Logs |
| ➖ | ddot_metrics | memory utilization | +0.26 | [+0.06, +0.46] | 1 | Logs |
| ➖ | file_to_blackhole_0ms_latency | egress throughput | +0.07 | [-0.46, +0.59] | 1 | Logs |
| ➖ | file_to_blackhole_500ms_latency | egress throughput | +0.04 | [-0.36, +0.43] | 1 | Logs |
| ➖ | file_to_blackhole_100ms_latency | egress throughput | +0.03 | [-0.08, +0.13] | 1 | Logs |
| ➖ | quality_gate_idle_all_features | memory utilization | +0.02 | [-0.01, +0.06] | 1 | Logs bounds checks dashboard |
| ➖ | uds_dogstatsd_to_api_hook_0sub | ingress throughput | +0.01 | [-0.19, +0.22] | 1 | Logs |
| ➖ | quality_gate_metrics_logs_hook_10sub | memory utilization | +0.01 | [-0.25, +0.28] | 1 | Logs bounds checks dashboard |
| ➖ | uds_dogstatsd_to_api_hook_1sub | ingress throughput | +0.01 | [-0.20, +0.21] | 1 | Logs |
| ➖ | file_to_blackhole_1000ms_latency | egress throughput | +0.01 | [-0.43, +0.44] | 1 | Logs |
| ➖ | uds_dogstatsd_to_api_hook_5sub | ingress throughput | +0.01 | [-0.20, +0.21] | 1 | Logs |
| ➖ | uds_dogstatsd_to_api_hook_2sub | ingress throughput | +0.00 | [-0.20, +0.21] | 1 | Logs |
| ➖ | uds_dogstatsd_to_api_v3 | ingress throughput | -0.00 | [-0.20, +0.20] | 1 | Logs |
| ➖ | uds_dogstatsd_to_api | ingress throughput | -0.01 | [-0.21, +0.19] | 1 | Logs |
| ➖ | tcp_dd_logs_filter_exclude | ingress throughput | -0.01 | [-0.12, +0.10] | 1 | Logs |
| ➖ | docker_containers_memory | memory utilization | -0.02 | [-0.12, +0.08] | 1 | Logs |
| ➖ | quality_gate_idle | memory utilization | -0.05 | [-0.10, -0.01] | 1 | Logs bounds checks dashboard |
| ➖ | uds_dogstatsd_20mb_12k_contexts_20_senders_hook_2sub | memory utilization | -0.07 | [-0.12, -0.02] | 1 | Logs |
| ➖ | uds_dogstatsd_20mb_12k_contexts_20_senders | memory utilization | -0.10 | [-0.15, -0.05] | 1 | Logs |
| ➖ | uds_dogstatsd_20mb_12k_contexts_20_senders_hook_0sub | memory utilization | -0.15 | [-0.20, -0.10] | 1 | Logs |
| ➖ | quality_gate_metrics_logs_hook_4sub | memory utilization | -0.18 | [-0.44, +0.09] | 1 | Logs bounds checks dashboard |
| ➖ | quality_gate_metrics_logs_hook_2sub | memory utilization | -0.30 | [-0.56, -0.04] | 1 | Logs bounds checks dashboard |
| ➖ | ddot_metrics_sum_cumulative | memory utilization | -0.35 | [-0.51, -0.20] | 1 | Logs |
| ➖ | file_tree | memory utilization | -0.54 | [-0.58, -0.50] | 1 | Logs |
| ➖ | otlp_ingest_metrics | memory utilization | -0.61 | [-0.76, -0.46] | 1 | Logs |
| ➖ | tcp_syslog_to_blackhole | ingress throughput | -1.09 | [-1.24, -0.94] | 1 | Logs |
| ➖ | quality_gate_metrics_logs_hook_0sub | memory utilization | -1.16 | [-1.41, -0.90] | 1 | Logs bounds checks dashboard |
| ➖ | quality_gate_metrics_logs | memory utilization | -1.24 | [-1.50, -0.97] | 1 | Logs bounds checks dashboard |
| ➖ | uds_dogstatsd_20mb_12k_contexts_20_senders_hook_1sub | memory utilization | -1.70 | [-1.83, -1.57] | 1 | Logs |
Bounds Checks: ✅ Passed
| perf | experiment | bounds_check_name | replicates_passed | observed_value | links |
|---|---|---|---|---|---|
| ✅ | docker_containers_cpu | simple_check_run | 10/10 | 696 ≥ 26 | |
| ✅ | docker_containers_memory | memory_usage | 10/10 | 241.27MiB ≤ 370MiB | |
| ✅ | docker_containers_memory | simple_check_run | 10/10 | 723 ≥ 26 | |
| ✅ | file_to_blackhole_0ms_latency | memory_usage | 10/10 | 0.16GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_0ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | file_to_blackhole_1000ms_latency | memory_usage | 10/10 | 0.20GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_1000ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | file_to_blackhole_100ms_latency | memory_usage | 10/10 | 0.17GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_100ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | file_to_blackhole_500ms_latency | memory_usage | 10/10 | 0.18GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_500ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | quality_gate_idle | intake_connections | 10/10 | 3 ≤ 4 | bounds checks dashboard |
| ✅ | quality_gate_idle | memory_usage | 10/10 | 138.54MiB ≤ 147MiB | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | intake_connections | 10/10 | 3 ≤ 4 | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | memory_usage | 10/10 | 465.36MiB ≤ 495MiB | bounds checks dashboard |
| ✅ | quality_gate_logs | intake_connections | 10/10 | 4 ≤ 6 | bounds checks dashboard |
| ✅ | quality_gate_logs | memory_usage | 10/10 | 182.36MiB ≤ 195MiB | bounds checks dashboard |
| ✅ | quality_gate_logs | missed_bytes | 10/10 | 0B = 0B | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | cpu_usage | 10/10 | 354.43 ≤ 2000 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | intake_connections | 10/10 | 3 ≤ 6 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | memory_usage | 10/10 | 376.18MiB ≤ 430MiB | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | missed_bytes | 10/10 | 0B = 0B | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs_hook_0sub | cpu_usage | 10/10 | 342.94 ≤ 2000 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs_hook_0sub | intake_connections | 10/10 | 4 ≤ 6 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs_hook_0sub | memory_usage | 10/10 | 376.56MiB ≤ 475MiB | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs_hook_0sub | missed_bytes | 10/10 | 0B = 0B | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs_hook_10sub | cpu_usage | 10/10 | 352.54 ≤ 2000 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs_hook_10sub | intake_connections | 10/10 | 3 ≤ 6 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs_hook_10sub | memory_usage | 10/10 | 391.50MiB ≤ 475MiB | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs_hook_10sub | missed_bytes | 10/10 | 0B = 0B | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs_hook_2sub | cpu_usage | 10/10 | 351.65 ≤ 2000 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs_hook_2sub | intake_connections | 10/10 | 3 ≤ 6 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs_hook_2sub | memory_usage | 10/10 | 389.09MiB ≤ 475MiB | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs_hook_2sub | missed_bytes | 10/10 | 0B = 0B | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs_hook_4sub | cpu_usage | 10/10 | 354.02 ≤ 2000 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs_hook_4sub | intake_connections | 10/10 | 3 ≤ 6 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs_hook_4sub | memory_usage | 10/10 | 405.36MiB ≤ 475MiB | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs_hook_4sub | missed_bytes | 10/10 | 0B = 0B | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs_hook_6sub | cpu_usage | 10/10 | 368.66 ≤ 2000 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs_hook_6sub | intake_connections | 10/10 | 4 ≤ 6 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs_hook_6sub | memory_usage | 10/10 | 373.03MiB ≤ 475MiB | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs_hook_6sub | missed_bytes | 10/10 | 0B = 0B | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs_hook_8sub | cpu_usage | 10/10 | 334.14 ≤ 2000 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs_hook_8sub | intake_connections | 10/10 | 4 ≤ 6 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs_hook_8sub | memory_usage | 10/10 | 374.70MiB ≤ 475MiB | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs_hook_8sub | missed_bytes | 10/10 | 0B = 0B | bounds checks dashboard |
Explanation
Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
CI Pass/Fail Decision
✅ Passed. All Quality Gates passed.
- quality_gate_metrics_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs_hook_6sub, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs_hook_6sub, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs_hook_6sub, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs_hook_6sub, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_idle, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs_hook_10sub, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs_hook_10sub, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs_hook_10sub, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs_hook_10sub, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs_hook_8sub, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs_hook_8sub, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs_hook_8sub, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs_hook_8sub, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs_hook_2sub, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs_hook_2sub, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs_hook_2sub, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs_hook_2sub, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_idle_all_features, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs_hook_4sub, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs_hook_4sub, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs_hook_4sub, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs_hook_4sub, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs_hook_0sub, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs_hook_0sub, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs_hook_0sub, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs_hook_0sub, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
Introduce pkg/hook, a lightweight generic pub/sub mechanism that lets any component observe data flowing through the metrics and logs pipelines without modifying pipeline code. The hook is non-blocking: slow consumers only drop their own payloads. - pkg/hook: Hook[T] interface, noop impl, MetricView/LogView interfaces - Wire hook.Hook[hook.MetricView] into the metric aggregation pipeline (TimeSampler, CheckSampler, NoAggregationStreamWorker) - Wire hook.Hook[hook.LogView] into the logs pipeline (Processor) - Register pkg/hook as a standalone Go module Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The TimeSampler.sample() hook already publishes every DogStatsD metric before aggregation, so a dedicated pre-aggregation hook in the DogStatsD server would be redundant. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add MetricSampleSnapshot concrete struct and NewMetricSampleSnapshot constructor so producers can publish pool-safe value copies instead of pointers into recycled memory. - Add an atomic.Int32 subscriber counter; Publish returns immediately with no lock when the count is zero, eliminating RLock overhead on the hot path when no consumers are attached. - Panic in Subscribe if the consumer name is already registered, turning a silent goroutine leak into an immediate programming error. - Update doc.go example to reflect the new batch-snapshot pattern. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…SIGN.md
- views.go: add LogSampleSnapshot (immutable snapshot of a log message)
and TraceStatsView (read-only interface for aggregated trace stats entries),
cherry-picked from misteriaud/flightrecorder
- hook_test.go: add TestSubscribeWithBuffer_{NoDrops,DropOnOverflow,DefaultOnZeroSize}
to cover the WithBufferSize option, cherry-picked from misteriaud/flightrecorder
- DESIGN.md: new design document for pipeline maintainers covering the
pub/sub model, tap points, mermaid diagram, and ADP scope
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…nchmarks Adds a new fx component that subscribes hook.bench_subscriber_count times to the metrics pipeline hook at startup. Each callback discards the payload immediately with no allocation, so only the hook delivery mechanism itself is measured in regression experiments. Default is 0 (no-op: component is present but subscribes nothing). Configure via datadog.yaml: hook.bench_subscriber_count: N Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…ribers) Duplicates quality_gate_metrics_logs six times, varying only hook.bench_subscriber_count (0, 2, 4, 6, 8, 10). Together with the hookbenchsubscriber component, these cases let SMP measure whether hook delivery overhead is bounded and linear in subscriber count. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
main added secretsComp (API key refresh on 403) to httpsender.NewHTTPSender after our branch diverged. The rebase dropped it from the call site. Pass nil on this branch — no secrets backend is configured in the SMP benchmark environment, so 403 refresh is never triggered. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
initAgentDemultiplexer gained metricHook as a new last param on this branch; the no_aggregation_stream_worker test was not updated. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
main added secrets.Component to the logAgent struct and its dependencies after our branch diverged. The rebase dropped it, causing tests that construct logAgent directly to fail on the unknown field. Restore the field and wire it through deps; it is not passed to pipeline.NewProvider (which uses nil internally on this branch). Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
pkg/logs/pipeline, pkg/logs/sender, and pkg/telemetry were consolidated into larger modules in main after our branch was cut. Remove the stale require and replace directives that pointed to their now-deleted go.mod files. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
pkg/telemetry was a standalone Go module that was merged into the main module in origin/main (its go.mod was deleted). pkg/hook used its package-level NewGauge/NewCounter functions which no longer exist. Replace with direct prometheus GaugeVec/CounterVec via promauto, which is already an indirect dependency. The telemetry surface is identical. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
The logs pipeline hook tap point is a follow-up — not part of this SMP benchmark branch. Commits during the rebase partially modified comp/logs-library/pipeline/, pkg/logs/processor/processor.go, and pkg/logs/message/message.go, introducing compilation errors. Reset all these files to origin/main to eliminate the partial changes. The logHook wiring in comp/logs/agent/agentimpl/ is also reset; only the secrets.Component plumbing (which was in origin/main already) remains. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…t_test Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…videComponentConstructor ProvideComponentConstructor rebuilds the Requires struct via reflection, losing the group:"hook" struct tag. This causes dig to report "invalid embedded field: dig.In" at runtime. Use fx.Provide directly so dig sees the original struct with all tags intact. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…2,5 subscribers) Adds metrics-focused sweep variants for: - uds_dogstatsd_to_api - uds_dogstatsd_20mb_12k_contexts_20_senders (heavy load: 12k contexts, 20 senders) These target the exact code path where the hook taps (TimeSampler), without the logs pipeline noise present in quality_gate_metrics_logs. Sweep: 0, 1, 2, 5 subscribers. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Adds go benchmarks to demonstrate that the pkg/hook tap points add
negligible overhead to both pipelines.
DogStatsD (pkg/aggregator/time_sampler_bench_test.go):
BenchmarkTimeSamplerHook/{sample_only,batch32_publish}/{noop_hook,0sub,1sub,5sub}
- sample_only: ~98 ns/op, 0 allocs for all modes (hook append is free)
- batch32_publish overhead vs noop: ~0 ns (0sub), ~15 ns/sample (1sub), ~45 ns/sample (5sub)
Logs pipeline (pkg/logs/processor/processor_bench_test.go):
BenchmarkProcessorHook/{noop_hook,0sub,1sub,5sub}
- noop_hook/0sub: ~880 ns/op, 4 allocs (snapshot not built — HasSubscribers() fast-path)
- 1sub/5sub: ~985 ns/op, 5 allocs (+1 alloc for LogSampleSnapshot; +100 ns/msg)
Also integrates the logs hook tap point (ported from misteriaud/flightrecorder):
- processor.go: batch accumulator (logHookBatch, logHookBatchSize=256), flushed on
batch-full or input-channel-drain; HasSubscribers() gates snapshot construction
- message.go: restore GetTags()/GetHostname() methods needed by LogSampleSnapshot
- pipeline.go/processor_only_provider.go: pass noop hook to processor.New()
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Blogpost-style document covering the DogStatsD pipeline architecture, hook system design, benchmark methodology, and measured results. Key findings: - 0 allocs/op across all hook modes on the hot path - Idle overhead (0 subscribers): <0.3 ns — within measurement noise - Active overhead: ~7.7 ns/sample (1 sub), ~14.6 ns/sample (5 subs) amortized - As % of full parse+sample pipeline: 2.1% (1 sub) to 4.0% (5 subs) for a 4-tag metric - Shrinks to <1% for metrics with 16+ tags where parsing dominates Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…ERHEAD.md
Adds BenchmarkNoAggWorkerHook and BenchmarkCheckSamplerHook to cover all
three metrics pipeline tap points, addressing reviewer questions:
"Would it be possible to replace this with actual benchmarks?"
"Interesting is the case with no hooks at all."
"Curious to see benchmarks for the three pipelines."
Key findings:
- noop_hook and 0 subscribers are identical on all three paths (~2 ns,
one atomic read). Zero-overhead principle confirmed.
- TimeSampler (accumulator pattern): 0 allocs, +6 ns/sample amortized at 1 sub
- no-agg worker: 0 cost idle; allocates when active (2 allocs/batch, +1542 ns)
- CheckSampler: 0 cost idle; allocates when active (2 allocs/sample, +403 ns)
- no-agg + check run at much lower volume than DogStatsD, so alloc overhead
is acceptable; could be eliminated with the accumulator pattern if needed
Updates OVERHEAD.md with actual measured numbers and full three-pipeline
comparison, replacing the theoretical "10–20 ns" estimate.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Three-panel figure: A — hook overhead per pipeline × subscriber count (log scale) B — DogStatsD per-point latency breakdown: parse + sample() + hook C — sample() cost across all hook modes (within noise, 0 allocs) Run with: uv run --with matplotlib python3 pkg/hook/plot_overhead.py Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…rhead chart Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
arm64 Linux, go test -bench -benchmem -benchtime=100ms -count=50 All benchmarks: n≥49, CV < 10% Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- Data now sourced from bench_results.csv (50 runs, means ± stddev) - Error bars shown on all bars - overhead_b: legend moved to lower-right to avoid overlapping bar labels Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Each chart now has a lead-in paragraph explaining what to look for, followed by the chart itself, then the detailed tables and analysis. Old raw benchmark output block removed (data is in bench_results.csv). Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Log scale was unnecessary — the linear scale communicates the zero-overhead principle more directly: idle bars sit visibly at the baseline, TimeSampler active bars are near-zero, and no-agg/check active bars stand tall. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…otations - Labels now use a fixed offset above the bar instead of a log-scale multiplier - Labels that would exceed the ylim are suppressed - Removed "2 allocs/batch" annotation arrows (covered in the text) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
The unconditional append inside sample() was copying a 72-byte MetricSampleSnapshot struct on every DogStatsD metric even when no observer was listening. SMP profiler comparison (main vs branch with 0 subscribers) showed +12ms in (*TimeSampler).sample() from this copy alone. Gating on HasSubscribers() makes the idle path (0 subscribers) match main exactly: one atomic read, then nothing. The append only runs when someone is actually subscribed. Before: noop/0sub/1sub/5sub all ~82 ns (append unconditional) After: noop/0sub ~77 ns; 1sub ~82 ns; 5sub ~86 ns Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
df6cb5f to
453ddfc
Compare
Context
This is the SMP benchmark branch of #47976, rebased cleanly on top of
main. Its purpose is to run the SMP regression experiments measuring hook delivery overhead as a function of subscriber count — before merging the main PR.See the original PR (#47976) for the full design rationale, implementation details, and tap-point documentation.
What this branch adds on top of
misteriaud/pkg-hook-minimalcomp/hookbenchsubscriber— benchmark hook subscriber componentA new fx component that subscribes
hook.bench_subscriber_counttimes to the metrics pipeline hook at startup. Each callback discards the payload immediately with no allocation, so only the hook delivery mechanism itself is measured.Default is
0(no-op). Configure viadatadog.yaml:test/regression— overhead sweep cases (0–10 subscribers)Six copies of
quality_gate_metrics_logs, varying onlyhook.bench_subscriber_count(0, 2, 4, 6, 8, 10). Together these let SMP measure whether overhead is bounded and linear in subscriber count.quality_gate_metrics_logs_hook_0subquality_gate_metrics_logs_hook_2subquality_gate_metrics_logs_hook_4subquality_gate_metrics_logs_hook_6subquality_gate_metrics_logs_hook_8subquality_gate_metrics_logs_hook_10subRebase fixes
A few compilation errors introduced by the rebase (main diverged on
secretsCompthreading through the logs pipeline after our branch was cut) were resolved with minimal changes —nilis passed for secrets where no backend is available, matching the SMP experiment environment.Do not merge
This branch is for SMP experiments only. The production PR is #47976 (
misteriaud/pkg-hook-minimal). If results look good, that PR will be updated to incorporate any findings.