Skip to content

[pull] master from DataDog:master#554

Merged
pull[bot] merged 6 commits into
ConnectionMaster:masterfrom
DataDog:master
May 21, 2026
Merged

[pull] master from DataDog:master#554
pull[bot] merged 6 commits into
ConnectionMaster:masterfrom
DataDog:master

Conversation

@pull

@pull pull Bot commented May 21, 2026

Copy link
Copy Markdown

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

nubtron and others added 6 commits May 21, 2026 09:15
* Fix Cilium e2e metric readiness

* Refine Cilium metric readiness wait

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* Spell out that the qa-label check refers to the Agent release cycle

* Add changelog entry

* Use the fixed changelog type for a zero-impact wording change

* Inline the long error string so ruff 0.11.10 stops complaining
* drop csi driver python check in favor of go

* drop changelog

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
* turn on public docs

* change key
…etadata.csv (#23770)

* Add missing envoy.vhost.vcluster.upstream_rq_time.99_5percentile to metadata.csv

* Add changelog entry for #23770

* Remove changelog entry (metadata-only change)

* Exercise Envoy listener immediately before E2E check scrape

Add a function-scoped exercise_envoy fixture that issues HTTP requests
to the listener right before each E2E test reads /stats. Without this,
the time between env setup (where the conftest's requests previously
lived) and the agent's check invocation can span multiple of Envoy's
5s flush windows, by which point the histogram interval values have
been reset to nan and the parser silently drops them.

Also temporarily drop the metadata entry for
envoy.vhost.vcluster.upstream_rq_time.99_5percentile to confirm CI
now reliably catches missing metadata.

* Restore conftest warm-up requests for integration tests

The integration test (test_check) relies on Envoy having processed
traffic before the check runs to assert metrics like
envoy.cluster.ext_authz.error.count. Keep the dd_environment warm-up
requests for that and have exercise_envoy re-fire just before each E2E
scrape.

* Use exercise_envoy fixture for integration tests too

Move the Envoy listener warm-up out of dd_environment and into the
function-scoped exercise_envoy fixture so it's shared by both the
integration tests (which previously relied on a side-effect inside
dd_environment) and the E2E tests. Single source of truth for "make
sure Envoy has traffic before this test runs."

* Wait for an Envoy stats flush after exercising the listener

Firing the requests immediately before the agent's scrape isn't enough —
Envoy only rolls samples into the histogram interval view at each 5s
flush, and the parser drops percentiles whose interval value is nan.
Sleep 6s so the scrape lands after the flush that captured the samples
but before the next empty flush resets them.

* Add envoy.vhost.vcluster.upstream_rq_time.99_5percentile to metadata.csv

Envoy 1.14+ emits a 99.5th percentile by default for all histograms,
including vhost.vcluster.upstream_rq_time. The other upstream_rq_time
families (cluster, cluster.external, etc.) already carry this entry;
this one was overlooked when those were added.

* Drive continuous traffic for one full Envoy flush interval

The previous single burst + 6s sleep relied on Envoy's flush cycle
aligning with the test's request time. While that landed in the safe
window in practice, the alignment isn't designed — it depends on
docker_run timing happening to be a multiple of the flush interval.
Spreading requests across the window removes that dependency: the most
recent completed flush always has samples, so the interval percentiles
are never reset to nan.

* Temporarily remove 99_5percentile metadata to validate continuous-load fixture

* Derive exercise_envoy timings from a flush-interval constant

* Restore envoy.vhost.vcluster.upstream_rq_time.99_5percentile in metadata.csv

* Document safe-scrape budget of exercise_envoy

* Move exercise_envoy to a background thread

Replace the synchronous loop+sleep fixture with a threading.Thread +
Event so requests keep firing through the entire test, including while
the agent's check is in flight. This removes the finite "safe scrape
window" the previous approach relied on — every flush window during the
test, including those that close mid-scrape, now has samples.

Also drop the 99_5percentile metadata entry temporarily to validate the
fixture continues to reliably trigger emission on master CI.

* Restore envoy.vhost.vcluster.upstream_rq_time.99_5percentile in metadata.csv
@pull pull Bot locked and limited conversation to collaborators May 21, 2026
@pull pull Bot added the ⤵️ pull label May 21, 2026
@pull pull Bot merged commit 656fac7 into ConnectionMaster:master May 21, 2026
@pull pull Bot temporarily deployed to release May 21, 2026 14:27 Inactive
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants