Skip to content

pkg/hook: generic pub/sub hook for pipeline observation (SMP benchmark branch)#49883

Draft
misteriaud wants to merge 48 commits intomainfrom
misteriaud/pkg-hook-minimal-smp
Draft

pkg/hook: generic pub/sub hook for pipeline observation (SMP benchmark branch)#49883
misteriaud wants to merge 48 commits intomainfrom
misteriaud/pkg-hook-minimal-smp

Conversation

@misteriaud
Copy link
Copy Markdown
Member

Context

This is the SMP benchmark branch of #47976, rebased cleanly on top of main. Its purpose is to run the SMP regression experiments measuring hook delivery overhead as a function of subscriber count — before merging the main PR.

See the original PR (#47976) for the full design rationale, implementation details, and tap-point documentation.

What this branch adds on top of misteriaud/pkg-hook-minimal

comp/hookbenchsubscriber — benchmark hook subscriber component

A new fx component that subscribes hook.bench_subscriber_count times to the metrics pipeline hook at startup. Each callback discards the payload immediately with no allocation, so only the hook delivery mechanism itself is measured.

Default is 0 (no-op). Configure via datadog.yaml:

hook:
  bench_subscriber_count: 4

test/regression — overhead sweep cases (0–10 subscribers)

Six copies of quality_gate_metrics_logs, varying only hook.bench_subscriber_count (0, 2, 4, 6, 8, 10). Together these let SMP measure whether overhead is bounded and linear in subscriber count.

Case Subscribers
quality_gate_metrics_logs_hook_0sub 0
quality_gate_metrics_logs_hook_2sub 2
quality_gate_metrics_logs_hook_4sub 4
quality_gate_metrics_logs_hook_6sub 6
quality_gate_metrics_logs_hook_8sub 8
quality_gate_metrics_logs_hook_10sub 10

Rebase fixes

A few compilation errors introduced by the rebase (main diverged on secretsComp threading through the logs pipeline after our branch was cut) were resolved with minimal changes — nil is passed for secrets where no backend is available, matching the SMP experiment environment.

Do not merge

This branch is for SMP experiments only. The production PR is #47976 (misteriaud/pkg-hook-minimal). If results look good, that PR will be updated to incorporate any findings.

@dd-octo-sts
Copy link
Copy Markdown
Contributor

dd-octo-sts Bot commented Apr 24, 2026

Go Package Import Differences

Baseline: f05ed3b
Comparison: df6cb5f

binaryosarchchange
agentlinuxamd64
+4, -0
+github.com/DataDog/datadog-agent/comp/hookbenchsubscriber/def
+github.com/DataDog/datadog-agent/comp/hookbenchsubscriber/fx
+github.com/DataDog/datadog-agent/comp/hookbenchsubscriber/impl
+github.com/DataDog/datadog-agent/pkg/hook
agentlinuxarm64
+4, -0
+github.com/DataDog/datadog-agent/comp/hookbenchsubscriber/def
+github.com/DataDog/datadog-agent/comp/hookbenchsubscriber/fx
+github.com/DataDog/datadog-agent/comp/hookbenchsubscriber/impl
+github.com/DataDog/datadog-agent/pkg/hook
agentwindowsamd64
+4, -0
+github.com/DataDog/datadog-agent/comp/hookbenchsubscriber/def
+github.com/DataDog/datadog-agent/comp/hookbenchsubscriber/fx
+github.com/DataDog/datadog-agent/comp/hookbenchsubscriber/impl
+github.com/DataDog/datadog-agent/pkg/hook
agentdarwinamd64
+4, -0
+github.com/DataDog/datadog-agent/comp/hookbenchsubscriber/def
+github.com/DataDog/datadog-agent/comp/hookbenchsubscriber/fx
+github.com/DataDog/datadog-agent/comp/hookbenchsubscriber/impl
+github.com/DataDog/datadog-agent/pkg/hook
agentdarwinarm64
+4, -0
+github.com/DataDog/datadog-agent/comp/hookbenchsubscriber/def
+github.com/DataDog/datadog-agent/comp/hookbenchsubscriber/fx
+github.com/DataDog/datadog-agent/comp/hookbenchsubscriber/impl
+github.com/DataDog/datadog-agent/pkg/hook
agentaixppc64
+5, -0
+github.com/DataDog/datadog-agent/comp/hookbenchsubscriber/def
+github.com/DataDog/datadog-agent/comp/hookbenchsubscriber/fx
+github.com/DataDog/datadog-agent/comp/hookbenchsubscriber/impl
+github.com/DataDog/datadog-agent/pkg/hook
+github.com/prometheus/client_golang/prometheus/promauto
iot-agentlinuxamd64
+5, -0
+github.com/DataDog/datadog-agent/comp/hookbenchsubscriber/def
+github.com/DataDog/datadog-agent/comp/hookbenchsubscriber/fx
+github.com/DataDog/datadog-agent/comp/hookbenchsubscriber/impl
+github.com/DataDog/datadog-agent/pkg/hook
+github.com/prometheus/client_golang/prometheus/promauto
iot-agentlinuxarm64
+5, -0
+github.com/DataDog/datadog-agent/comp/hookbenchsubscriber/def
+github.com/DataDog/datadog-agent/comp/hookbenchsubscriber/fx
+github.com/DataDog/datadog-agent/comp/hookbenchsubscriber/impl
+github.com/DataDog/datadog-agent/pkg/hook
+github.com/prometheus/client_golang/prometheus/promauto
heroku-agentlinuxamd64
+5, -0
+github.com/DataDog/datadog-agent/comp/hookbenchsubscriber/def
+github.com/DataDog/datadog-agent/comp/hookbenchsubscriber/fx
+github.com/DataDog/datadog-agent/comp/hookbenchsubscriber/impl
+github.com/DataDog/datadog-agent/pkg/hook
+github.com/prometheus/client_golang/prometheus/promauto
cluster-agentlinuxamd64
+1, -0
+github.com/DataDog/datadog-agent/pkg/hook
cluster-agentlinuxarm64
+1, -0
+github.com/DataDog/datadog-agent/pkg/hook
cluster-agent-cloudfoundrylinuxamd64
+2, -0
+github.com/DataDog/datadog-agent/pkg/hook
+github.com/prometheus/client_golang/prometheus/promauto
cluster-agent-cloudfoundrylinuxarm64
+2, -0
+github.com/DataDog/datadog-agent/pkg/hook
+github.com/prometheus/client_golang/prometheus/promauto
dogstatsdlinuxamd64
+2, -0
+github.com/DataDog/datadog-agent/pkg/hook
+github.com/prometheus/client_golang/prometheus/promauto
dogstatsdlinuxarm64
+2, -0
+github.com/DataDog/datadog-agent/pkg/hook
+github.com/prometheus/client_golang/prometheus/promauto
security-agentlinuxamd64
+2, -0
+github.com/DataDog/datadog-agent/pkg/hook
+github.com/prometheus/client_golang/prometheus/promauto
security-agentlinuxarm64
+2, -0
+github.com/DataDog/datadog-agent/pkg/hook
+github.com/prometheus/client_golang/prometheus/promauto
security-agentwindowsamd64
+2, -0
+github.com/DataDog/datadog-agent/pkg/hook
+github.com/prometheus/client_golang/prometheus/promauto
system-probelinuxamd64
+2, -0
+github.com/DataDog/datadog-agent/pkg/hook
+github.com/prometheus/client_golang/prometheus/promauto
system-probelinuxarm64
+2, -0
+github.com/DataDog/datadog-agent/pkg/hook
+github.com/prometheus/client_golang/prometheus/promauto
system-probewindowsamd64
+2, -0
+github.com/DataDog/datadog-agent/pkg/hook
+github.com/prometheus/client_golang/prometheus/promauto
system-probedarwinamd64
+2, -0
+github.com/DataDog/datadog-agent/pkg/hook
+github.com/prometheus/client_golang/prometheus/promauto
system-probedarwinarm64
+2, -0
+github.com/DataDog/datadog-agent/pkg/hook
+github.com/prometheus/client_golang/prometheus/promauto
otel-agentlinuxamd64
+1, -0
+github.com/DataDog/datadog-agent/pkg/hook
otel-agentlinuxarm64
+1, -0
+github.com/DataDog/datadog-agent/pkg/hook

@misteriaud misteriaud changed the base branch from main to misteriaud/pkg-hook-minimal April 24, 2026 15:07
@misteriaud misteriaud changed the base branch from misteriaud/pkg-hook-minimal to main April 24, 2026 15:08
@misteriaud misteriaud force-pushed the misteriaud/pkg-hook-minimal-smp branch from b08ec01 to 9b9f49d Compare April 24, 2026 15:35
@dd-octo-sts dd-octo-sts Bot added the internal Identify a non-fork PR label Apr 24, 2026
@github-actions github-actions Bot added the long review PR is complex, plan time to review it label Apr 24, 2026
@dd-octo-sts
Copy link
Copy Markdown
Contributor

dd-octo-sts Bot commented Apr 24, 2026

Files inventory check summary

File checks results against ancestor f05ed3b1:

Results for datadog-agent_7.80.0~devel.git.255.df6cb5f.pipeline.110390822-1_amd64.deb:

No change detected

@datadog-datadog-prod-us1-2

This comment has been minimized.

@dd-octo-sts
Copy link
Copy Markdown
Contributor

dd-octo-sts Bot commented Apr 24, 2026

Static quality checks

✅ Please find below the results from static quality gates
Comparison made with ancestor f05ed3b
📊 Static Quality Gates Dashboard
🔗 SQG Job

Successful checks

Info

Quality gate Change Size (prev → curr → max)
agent_deb_amd64 +61.65 KiB (0.01% increase) 739.767 → 739.828 → 750.310
agent_deb_amd64_fips +69.62 KiB (0.01% increase) 698.198 → 698.266 → 702.690
agent_heroku_amd64 +40.09 KiB (0.01% increase) 309.879 → 309.918 → 313.960
agent_msi +62.0 KiB (0.01% increase) 603.421 → 603.482 → 620.770
agent_rpm_amd64 +61.65 KiB (0.01% increase) 739.751 → 739.811 → 750.280
agent_rpm_amd64_fips +69.62 KiB (0.01% increase) 698.181 → 698.249 → 702.670
agent_rpm_arm64 +61.19 KiB (0.01% increase) 717.893 → 717.953 → 724.050
agent_rpm_arm64_fips +57.25 KiB (0.01% increase) 679.352 → 679.408 → 684.460
agent_suse_amd64 +61.65 KiB (0.01% increase) 739.751 → 739.811 → 750.280
agent_suse_amd64_fips +69.62 KiB (0.01% increase) 698.181 → 698.249 → 702.670
agent_suse_arm64 +61.19 KiB (0.01% increase) 717.893 → 717.953 → 724.050
agent_suse_arm64_fips +57.25 KiB (0.01% increase) 679.352 → 679.408 → 684.460
docker_agent_amd64 +61.65 KiB (0.01% increase) 800.226 → 800.286 → 805.870
docker_agent_arm64 +61.41 KiB (0.01% increase) 803.177 → 803.237 → 809.730
docker_agent_jmx_amd64 +61.9 KiB (0.01% increase) 991.145 → 991.206 → 996.590
docker_agent_jmx_arm64 +61.64 KiB (0.01% increase) 982.875 → 982.935 → 989.410
docker_cluster_agent_amd64 +28.13 KiB (0.01% increase) 206.250 → 206.277 → 207.600
docker_dogstatsd_amd64 +24.13 KiB (0.06% increase) 39.343 → 39.366 → 39.540
dogstatsd_deb_amd64 +20.12 KiB (0.07% increase) 30.001 → 30.020 → 30.770
dogstatsd_deb_arm64 +20.12 KiB (0.07% increase) 28.142 → 28.161 → 29.270
dogstatsd_rpm_amd64 +20.12 KiB (0.07% increase) 30.001 → 30.020 → 30.770
dogstatsd_suse_amd64 +20.12 KiB (0.07% increase) 30.001 → 30.020 → 30.770
iot_agent_deb_amd64 +40.09 KiB (0.09% increase) 44.369 → 44.408 → 44.970
iot_agent_deb_arm64 +36.12 KiB (0.09% increase) 41.361 → 41.396 → 42.560
iot_agent_deb_armhf +36.08 KiB (0.08% increase) 42.093 → 42.128 → 42.740
iot_agent_rpm_amd64 +40.09 KiB (0.09% increase) 44.369 → 44.408 → 44.970
iot_agent_suse_amd64 +40.09 KiB (0.09% increase) 44.369 → 44.408 → 44.970
4 successful checks with minimal change (< 2 KiB)
Quality gate Current Size
docker_cluster_agent_arm64 220.383 MiB
docker_cws_instrumentation_amd64 7.142 MiB
docker_cws_instrumentation_arm64 6.689 MiB
docker_dogstatsd_arm64 37.565 MiB
On-wire sizes (compressed)
Quality gate Change Size (prev → curr → max)
agent_deb_amd64 +20.23 KiB (0.01% increase) 174.839 → 174.859 → 179.160
agent_deb_amd64_fips -42.6 KiB (0.02% reduction) 166.734 → 166.693 → 174.440
agent_heroku_amd64 +12.26 KiB (0.02% increase) 74.890 → 74.902 → 80.310
agent_msi neutral 138.992 MiB → 147.550
agent_rpm_amd64 -37.87 KiB (0.02% reduction) 176.951 → 176.914 → 182.080
agent_rpm_amd64_fips -17.02 KiB (0.01% reduction) 168.083 → 168.066 → 174.140
agent_rpm_arm64 +3.9 KiB (0.00% increase) 159.371 → 159.375 → 163.610
agent_rpm_arm64_fips neutral 151.445 MiB → 156.850
agent_suse_amd64 -37.87 KiB (0.02% reduction) 176.951 → 176.914 → 182.080
agent_suse_amd64_fips -17.02 KiB (0.01% reduction) 168.083 → 168.066 → 174.140
agent_suse_arm64 +3.9 KiB (0.00% increase) 159.371 → 159.375 → 163.610
agent_suse_arm64_fips neutral 151.445 MiB → 156.850
docker_agent_amd64 +24.37 KiB (0.01% increase) 267.146 → 267.169 → 272.990
docker_agent_arm64 +42.72 KiB (0.02% increase) 254.198 → 254.240 → 261.470
docker_agent_jmx_amd64 +16.28 KiB (0.00% increase) 335.811 → 335.827 → 341.610
docker_agent_jmx_arm64 +25.19 KiB (0.01% increase) 318.837 → 318.862 → 326.050
docker_cluster_agent_amd64 +21.65 KiB (0.03% increase) 72.278 → 72.299 → 73.460
docker_cluster_agent_arm64 +6.75 KiB (0.01% increase) 67.755 → 67.761 → 68.680
docker_cws_instrumentation_amd64 neutral 2.999 MiB → 3.330
docker_cws_instrumentation_arm64 neutral 2.729 MiB → 3.090
docker_dogstatsd_amd64 +5.97 KiB (0.04% increase) 15.229 → 15.234 → 15.870
docker_dogstatsd_arm64 +7.69 KiB (0.05% increase) 14.541 → 14.548 → 14.890
dogstatsd_deb_amd64 neutral 7.939 MiB → 8.830
dogstatsd_deb_arm64 +6.26 KiB (0.09% increase) 6.817 → 6.823 → 7.750
dogstatsd_rpm_amd64 +2.82 KiB (0.03% increase) 7.946 → 7.949 → 8.840
dogstatsd_suse_amd64 +2.82 KiB (0.03% increase) 7.946 → 7.949 → 8.840
iot_agent_deb_amd64 +8.37 KiB (0.07% increase) 11.673 → 11.682 → 13.210
iot_agent_deb_arm64 +9.07 KiB (0.09% increase) 9.980 → 9.988 → 11.620
iot_agent_deb_armhf +6.41 KiB (0.06% increase) 10.184 → 10.190 → 11.780
iot_agent_rpm_amd64 +9.2 KiB (0.08% increase) 11.690 → 11.699 → 13.230
iot_agent_suse_amd64 +9.2 KiB (0.08% increase) 11.690 → 11.699 → 13.230

@cit-pr-commenter-54b7da
Copy link
Copy Markdown

cit-pr-commenter-54b7da Bot commented Apr 27, 2026

Regression Detector

Regression Detector Results

Metrics dashboard
Target profiles
Run ID: 2f547ef2-7301-4eeb-b29f-30e635bc2e07

Baseline: f05ed3b
Comparison: 9278148
Diff

Optimization Goals: ✅ No significant changes detected

Experiments ignored for regressions

Regressions in experiments with settings containing erratic: true are ignored.

perf experiment goal Δ mean % Δ mean % CI trials links
docker_containers_cpu % cpu utilization +3.13 [+0.03, +6.23] 1 Logs

Fine details of change detection per experiment

perf experiment goal Δ mean % Δ mean % CI trials links
docker_containers_cpu % cpu utilization +3.13 [+0.03, +6.23] 1 Logs
quality_gate_logs % cpu utilization +2.04 [+0.36, +3.71] 1 Logs bounds checks dashboard
quality_gate_metrics_logs_hook_8sub memory utilization +1.16 [+0.90, +1.42] 1 Logs bounds checks dashboard
quality_gate_metrics_logs_hook_6sub memory utilization +1.05 [+0.79, +1.30] 1 Logs bounds checks dashboard
otlp_ingest_logs memory utilization +1.00 [+0.89, +1.12] 1 Logs
ddot_logs memory utilization +0.72 [+0.66, +0.79] 1 Logs
ddot_metrics_sum_delta memory utilization +0.41 [+0.22, +0.60] 1 Logs
ddot_metrics_sum_cumulativetodelta_exporter memory utilization +0.35 [+0.11, +0.59] 1 Logs
uds_dogstatsd_20mb_12k_contexts_20_senders_hook_5sub memory utilization +0.34 [+0.29, +0.39] 1 Logs
ddot_metrics memory utilization +0.26 [+0.06, +0.46] 1 Logs
file_to_blackhole_0ms_latency egress throughput +0.07 [-0.46, +0.59] 1 Logs
file_to_blackhole_500ms_latency egress throughput +0.04 [-0.36, +0.43] 1 Logs
file_to_blackhole_100ms_latency egress throughput +0.03 [-0.08, +0.13] 1 Logs
quality_gate_idle_all_features memory utilization +0.02 [-0.01, +0.06] 1 Logs bounds checks dashboard
uds_dogstatsd_to_api_hook_0sub ingress throughput +0.01 [-0.19, +0.22] 1 Logs
quality_gate_metrics_logs_hook_10sub memory utilization +0.01 [-0.25, +0.28] 1 Logs bounds checks dashboard
uds_dogstatsd_to_api_hook_1sub ingress throughput +0.01 [-0.20, +0.21] 1 Logs
file_to_blackhole_1000ms_latency egress throughput +0.01 [-0.43, +0.44] 1 Logs
uds_dogstatsd_to_api_hook_5sub ingress throughput +0.01 [-0.20, +0.21] 1 Logs
uds_dogstatsd_to_api_hook_2sub ingress throughput +0.00 [-0.20, +0.21] 1 Logs
uds_dogstatsd_to_api_v3 ingress throughput -0.00 [-0.20, +0.20] 1 Logs
uds_dogstatsd_to_api ingress throughput -0.01 [-0.21, +0.19] 1 Logs
tcp_dd_logs_filter_exclude ingress throughput -0.01 [-0.12, +0.10] 1 Logs
docker_containers_memory memory utilization -0.02 [-0.12, +0.08] 1 Logs
quality_gate_idle memory utilization -0.05 [-0.10, -0.01] 1 Logs bounds checks dashboard
uds_dogstatsd_20mb_12k_contexts_20_senders_hook_2sub memory utilization -0.07 [-0.12, -0.02] 1 Logs
uds_dogstatsd_20mb_12k_contexts_20_senders memory utilization -0.10 [-0.15, -0.05] 1 Logs
uds_dogstatsd_20mb_12k_contexts_20_senders_hook_0sub memory utilization -0.15 [-0.20, -0.10] 1 Logs
quality_gate_metrics_logs_hook_4sub memory utilization -0.18 [-0.44, +0.09] 1 Logs bounds checks dashboard
quality_gate_metrics_logs_hook_2sub memory utilization -0.30 [-0.56, -0.04] 1 Logs bounds checks dashboard
ddot_metrics_sum_cumulative memory utilization -0.35 [-0.51, -0.20] 1 Logs
file_tree memory utilization -0.54 [-0.58, -0.50] 1 Logs
otlp_ingest_metrics memory utilization -0.61 [-0.76, -0.46] 1 Logs
tcp_syslog_to_blackhole ingress throughput -1.09 [-1.24, -0.94] 1 Logs
quality_gate_metrics_logs_hook_0sub memory utilization -1.16 [-1.41, -0.90] 1 Logs bounds checks dashboard
quality_gate_metrics_logs memory utilization -1.24 [-1.50, -0.97] 1 Logs bounds checks dashboard
uds_dogstatsd_20mb_12k_contexts_20_senders_hook_1sub memory utilization -1.70 [-1.83, -1.57] 1 Logs

Bounds Checks: ✅ Passed

perf experiment bounds_check_name replicates_passed observed_value links
docker_containers_cpu simple_check_run 10/10 696 ≥ 26
docker_containers_memory memory_usage 10/10 241.27MiB ≤ 370MiB
docker_containers_memory simple_check_run 10/10 723 ≥ 26
file_to_blackhole_0ms_latency memory_usage 10/10 0.16GiB ≤ 1.20GiB
file_to_blackhole_0ms_latency missed_bytes 10/10 0B = 0B
file_to_blackhole_1000ms_latency memory_usage 10/10 0.20GiB ≤ 1.20GiB
file_to_blackhole_1000ms_latency missed_bytes 10/10 0B = 0B
file_to_blackhole_100ms_latency memory_usage 10/10 0.17GiB ≤ 1.20GiB
file_to_blackhole_100ms_latency missed_bytes 10/10 0B = 0B
file_to_blackhole_500ms_latency memory_usage 10/10 0.18GiB ≤ 1.20GiB
file_to_blackhole_500ms_latency missed_bytes 10/10 0B = 0B
quality_gate_idle intake_connections 10/10 3 ≤ 4 bounds checks dashboard
quality_gate_idle memory_usage 10/10 138.54MiB ≤ 147MiB bounds checks dashboard
quality_gate_idle_all_features intake_connections 10/10 3 ≤ 4 bounds checks dashboard
quality_gate_idle_all_features memory_usage 10/10 465.36MiB ≤ 495MiB bounds checks dashboard
quality_gate_logs intake_connections 10/10 4 ≤ 6 bounds checks dashboard
quality_gate_logs memory_usage 10/10 182.36MiB ≤ 195MiB bounds checks dashboard
quality_gate_logs missed_bytes 10/10 0B = 0B bounds checks dashboard
quality_gate_metrics_logs cpu_usage 10/10 354.43 ≤ 2000 bounds checks dashboard
quality_gate_metrics_logs intake_connections 10/10 3 ≤ 6 bounds checks dashboard
quality_gate_metrics_logs memory_usage 10/10 376.18MiB ≤ 430MiB bounds checks dashboard
quality_gate_metrics_logs missed_bytes 10/10 0B = 0B bounds checks dashboard
quality_gate_metrics_logs_hook_0sub cpu_usage 10/10 342.94 ≤ 2000 bounds checks dashboard
quality_gate_metrics_logs_hook_0sub intake_connections 10/10 4 ≤ 6 bounds checks dashboard
quality_gate_metrics_logs_hook_0sub memory_usage 10/10 376.56MiB ≤ 475MiB bounds checks dashboard
quality_gate_metrics_logs_hook_0sub missed_bytes 10/10 0B = 0B bounds checks dashboard
quality_gate_metrics_logs_hook_10sub cpu_usage 10/10 352.54 ≤ 2000 bounds checks dashboard
quality_gate_metrics_logs_hook_10sub intake_connections 10/10 3 ≤ 6 bounds checks dashboard
quality_gate_metrics_logs_hook_10sub memory_usage 10/10 391.50MiB ≤ 475MiB bounds checks dashboard
quality_gate_metrics_logs_hook_10sub missed_bytes 10/10 0B = 0B bounds checks dashboard
quality_gate_metrics_logs_hook_2sub cpu_usage 10/10 351.65 ≤ 2000 bounds checks dashboard
quality_gate_metrics_logs_hook_2sub intake_connections 10/10 3 ≤ 6 bounds checks dashboard
quality_gate_metrics_logs_hook_2sub memory_usage 10/10 389.09MiB ≤ 475MiB bounds checks dashboard
quality_gate_metrics_logs_hook_2sub missed_bytes 10/10 0B = 0B bounds checks dashboard
quality_gate_metrics_logs_hook_4sub cpu_usage 10/10 354.02 ≤ 2000 bounds checks dashboard
quality_gate_metrics_logs_hook_4sub intake_connections 10/10 3 ≤ 6 bounds checks dashboard
quality_gate_metrics_logs_hook_4sub memory_usage 10/10 405.36MiB ≤ 475MiB bounds checks dashboard
quality_gate_metrics_logs_hook_4sub missed_bytes 10/10 0B = 0B bounds checks dashboard
quality_gate_metrics_logs_hook_6sub cpu_usage 10/10 368.66 ≤ 2000 bounds checks dashboard
quality_gate_metrics_logs_hook_6sub intake_connections 10/10 4 ≤ 6 bounds checks dashboard
quality_gate_metrics_logs_hook_6sub memory_usage 10/10 373.03MiB ≤ 475MiB bounds checks dashboard
quality_gate_metrics_logs_hook_6sub missed_bytes 10/10 0B = 0B bounds checks dashboard
quality_gate_metrics_logs_hook_8sub cpu_usage 10/10 334.14 ≤ 2000 bounds checks dashboard
quality_gate_metrics_logs_hook_8sub intake_connections 10/10 4 ≤ 6 bounds checks dashboard
quality_gate_metrics_logs_hook_8sub memory_usage 10/10 374.70MiB ≤ 475MiB bounds checks dashboard
quality_gate_metrics_logs_hook_8sub missed_bytes 10/10 0B = 0B bounds checks dashboard

Explanation

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

Performance changes are noted in the perf column of each table:

  • ✅ = significantly better comparison variant performance
  • ❌ = significantly worse comparison variant performance
  • ➖ = no significant change in performance

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

  1. Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.

  2. Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.

  3. Its configuration does not mark it "erratic".

CI Pass/Fail Decision

Passed. All Quality Gates passed.

  • quality_gate_metrics_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs_hook_6sub, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs_hook_6sub, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs_hook_6sub, bounds check intake_connections: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs_hook_6sub, bounds check memory_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_idle, bounds check intake_connections: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs_hook_10sub, bounds check intake_connections: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs_hook_10sub, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs_hook_10sub, bounds check memory_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs_hook_10sub, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs_hook_8sub, bounds check intake_connections: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs_hook_8sub, bounds check memory_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs_hook_8sub, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs_hook_8sub, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs_hook_2sub, bounds check intake_connections: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs_hook_2sub, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs_hook_2sub, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs_hook_2sub, bounds check memory_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
  • quality_gate_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
  • quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_idle_all_features, bounds check intake_connections: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs_hook_4sub, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs_hook_4sub, bounds check memory_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs_hook_4sub, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs_hook_4sub, bounds check intake_connections: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs_hook_0sub, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs_hook_0sub, bounds check intake_connections: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs_hook_0sub, bounds check memory_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs_hook_0sub, bounds check missed_bytes: 10/10 replicas passed. Gate passed.

misteriaud and others added 9 commits April 29, 2026 14:49
Introduce pkg/hook, a lightweight generic pub/sub mechanism that lets
any component observe data flowing through the metrics and logs pipelines
without modifying pipeline code. The hook is non-blocking: slow consumers
only drop their own payloads.

- pkg/hook: Hook[T] interface, noop impl, MetricView/LogView interfaces
- Wire hook.Hook[hook.MetricView] into the metric aggregation pipeline
  (TimeSampler, CheckSampler, NoAggregationStreamWorker)
- Wire hook.Hook[hook.LogView] into the logs pipeline (Processor)
- Register pkg/hook as a standalone Go module

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The TimeSampler.sample() hook already publishes every DogStatsD metric
before aggregation, so a dedicated pre-aggregation hook in the DogStatsD
server would be redundant.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add MetricSampleSnapshot concrete struct and NewMetricSampleSnapshot
  constructor so producers can publish pool-safe value copies instead of
  pointers into recycled memory.
- Add an atomic.Int32 subscriber counter; Publish returns immediately
  with no lock when the count is zero, eliminating RLock overhead on
  the hot path when no consumers are attached.
- Panic in Subscribe if the consumer name is already registered, turning
  a silent goroutine leak into an immediate programming error.
- Update doc.go example to reflect the new batch-snapshot pattern.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
misteriaud and others added 29 commits April 29, 2026 14:51
…SIGN.md

- views.go: add LogSampleSnapshot (immutable snapshot of a log message)
  and TraceStatsView (read-only interface for aggregated trace stats entries),
  cherry-picked from misteriaud/flightrecorder
- hook_test.go: add TestSubscribeWithBuffer_{NoDrops,DropOnOverflow,DefaultOnZeroSize}
  to cover the WithBufferSize option, cherry-picked from misteriaud/flightrecorder
- DESIGN.md: new design document for pipeline maintainers covering the
  pub/sub model, tap points, mermaid diagram, and ADP scope

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…nchmarks

Adds a new fx component that subscribes hook.bench_subscriber_count times
to the metrics pipeline hook at startup. Each callback discards the payload
immediately with no allocation, so only the hook delivery mechanism itself
is measured in regression experiments.

Default is 0 (no-op: component is present but subscribes nothing).
Configure via datadog.yaml: hook.bench_subscriber_count: N

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…ribers)

Duplicates quality_gate_metrics_logs six times, varying only
hook.bench_subscriber_count (0, 2, 4, 6, 8, 10). Together with the
hookbenchsubscriber component, these cases let SMP measure whether hook
delivery overhead is bounded and linear in subscriber count.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
main added secretsComp (API key refresh on 403) to httpsender.NewHTTPSender
after our branch diverged. The rebase dropped it from the call site.

Pass nil on this branch — no secrets backend is configured in the SMP
benchmark environment, so 403 refresh is never triggered.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
initAgentDemultiplexer gained metricHook as a new last param on this
branch; the no_aggregation_stream_worker test was not updated.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
main added secrets.Component to the logAgent struct and its dependencies
after our branch diverged. The rebase dropped it, causing tests that
construct logAgent directly to fail on the unknown field.

Restore the field and wire it through deps; it is not passed to
pipeline.NewProvider (which uses nil internally on this branch).

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
pkg/logs/pipeline, pkg/logs/sender, and pkg/telemetry were consolidated
into larger modules in main after our branch was cut. Remove the stale
require and replace directives that pointed to their now-deleted go.mod files.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
pkg/telemetry was a standalone Go module that was merged into the main
module in origin/main (its go.mod was deleted). pkg/hook used its
package-level NewGauge/NewCounter functions which no longer exist.

Replace with direct prometheus GaugeVec/CounterVec via promauto, which is
already an indirect dependency. The telemetry surface is identical.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
The logs pipeline hook tap point is a follow-up — not part of this
SMP benchmark branch. Commits during the rebase partially modified
comp/logs-library/pipeline/, pkg/logs/processor/processor.go, and
pkg/logs/message/message.go, introducing compilation errors.

Reset all these files to origin/main to eliminate the partial changes.
The logHook wiring in comp/logs/agent/agentimpl/ is also reset; only
the secrets.Component plumbing (which was in origin/main already) remains.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…t_test

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…videComponentConstructor

ProvideComponentConstructor rebuilds the Requires struct via reflection,
losing the group:"hook" struct tag. This causes dig to report
"invalid embedded field: dig.In" at runtime. Use fx.Provide directly
so dig sees the original struct with all tags intact.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…2,5 subscribers)

Adds metrics-focused sweep variants for:
- uds_dogstatsd_to_api
- uds_dogstatsd_20mb_12k_contexts_20_senders (heavy load: 12k contexts, 20 senders)

These target the exact code path where the hook taps (TimeSampler),
without the logs pipeline noise present in quality_gate_metrics_logs.
Sweep: 0, 1, 2, 5 subscribers.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Adds go benchmarks to demonstrate that the pkg/hook tap points add
negligible overhead to both pipelines.

DogStatsD (pkg/aggregator/time_sampler_bench_test.go):
  BenchmarkTimeSamplerHook/{sample_only,batch32_publish}/{noop_hook,0sub,1sub,5sub}
  - sample_only: ~98 ns/op, 0 allocs for all modes (hook append is free)
  - batch32_publish overhead vs noop: ~0 ns (0sub), ~15 ns/sample (1sub), ~45 ns/sample (5sub)

Logs pipeline (pkg/logs/processor/processor_bench_test.go):
  BenchmarkProcessorHook/{noop_hook,0sub,1sub,5sub}
  - noop_hook/0sub: ~880 ns/op, 4 allocs (snapshot not built — HasSubscribers() fast-path)
  - 1sub/5sub: ~985 ns/op, 5 allocs (+1 alloc for LogSampleSnapshot; +100 ns/msg)

Also integrates the logs hook tap point (ported from misteriaud/flightrecorder):
  - processor.go: batch accumulator (logHookBatch, logHookBatchSize=256), flushed on
    batch-full or input-channel-drain; HasSubscribers() gates snapshot construction
  - message.go: restore GetTags()/GetHostname() methods needed by LogSampleSnapshot
  - pipeline.go/processor_only_provider.go: pass noop hook to processor.New()

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Blogpost-style document covering the DogStatsD pipeline architecture,
hook system design, benchmark methodology, and measured results.

Key findings:
- 0 allocs/op across all hook modes on the hot path
- Idle overhead (0 subscribers): <0.3 ns — within measurement noise
- Active overhead: ~7.7 ns/sample (1 sub), ~14.6 ns/sample (5 subs) amortized
- As % of full parse+sample pipeline: 2.1% (1 sub) to 4.0% (5 subs) for a 4-tag metric
- Shrinks to <1% for metrics with 16+ tags where parsing dominates

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…ERHEAD.md

Adds BenchmarkNoAggWorkerHook and BenchmarkCheckSamplerHook to cover all
three metrics pipeline tap points, addressing reviewer questions:

  "Would it be possible to replace this with actual benchmarks?"
  "Interesting is the case with no hooks at all."
  "Curious to see benchmarks for the three pipelines."

Key findings:
  - noop_hook and 0 subscribers are identical on all three paths (~2 ns,
    one atomic read). Zero-overhead principle confirmed.
  - TimeSampler (accumulator pattern): 0 allocs, +6 ns/sample amortized at 1 sub
  - no-agg worker: 0 cost idle; allocates when active (2 allocs/batch, +1542 ns)
  - CheckSampler: 0 cost idle; allocates when active (2 allocs/sample, +403 ns)
  - no-agg + check run at much lower volume than DogStatsD, so alloc overhead
    is acceptable; could be eliminated with the accumulator pattern if needed

Updates OVERHEAD.md with actual measured numbers and full three-pipeline
comparison, replacing the theoretical "10–20 ns" estimate.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Three-panel figure:
  A — hook overhead per pipeline × subscriber count (log scale)
  B — DogStatsD per-point latency breakdown: parse + sample() + hook
  C — sample() cost across all hook modes (within noise, 0 allocs)

Run with: uv run --with matplotlib python3 pkg/hook/plot_overhead.py

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…rhead chart

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
arm64 Linux, go test -bench -benchmem -benchtime=100ms -count=50
All benchmarks: n≥49, CV < 10%

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- Data now sourced from bench_results.csv (50 runs, means ± stddev)
- Error bars shown on all bars
- overhead_b: legend moved to lower-right to avoid overlapping bar labels

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Each chart now has a lead-in paragraph explaining what to look for,
followed by the chart itself, then the detailed tables and analysis.
Old raw benchmark output block removed (data is in bench_results.csv).

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Log scale was unnecessary — the linear scale communicates the zero-overhead
principle more directly: idle bars sit visibly at the baseline, TimeSampler
active bars are near-zero, and no-agg/check active bars stand tall.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…otations

- Labels now use a fixed offset above the bar instead of a log-scale multiplier
- Labels that would exceed the ylim are suppressed
- Removed "2 allocs/batch" annotation arrows (covered in the text)

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
The unconditional append inside sample() was copying a 72-byte
MetricSampleSnapshot struct on every DogStatsD metric even when no
observer was listening. SMP profiler comparison (main vs branch with
0 subscribers) showed +12ms in (*TimeSampler).sample() from this copy
alone.

Gating on HasSubscribers() makes the idle path (0 subscribers) match
main exactly: one atomic read, then nothing. The append only runs when
someone is actually subscribed.

Before: noop/0sub/1sub/5sub all ~82 ns (append unconditional)
After:  noop/0sub ~77 ns; 1sub ~82 ns; 5sub ~86 ns

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@misteriaud misteriaud force-pushed the misteriaud/pkg-hook-minimal-smp branch from df6cb5f to 453ddfc Compare April 29, 2026 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant