[pull] master from DataDog:master#554
Merged
Merged
Conversation
* Fix Cilium e2e metric readiness * Refine Cilium metric readiness wait * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* Spell out that the qa-label check refers to the Agent release cycle * Add changelog entry * Use the fixed changelog type for a zero-impact wording change * Inline the long error string so ruff 0.11.10 stops complaining
* drop csi driver python check in favor of go * drop changelog --------- Co-authored-by: Cursor <cursoragent@cursor.com>
* turn on public docs * change key
…etadata.csv (#23770) * Add missing envoy.vhost.vcluster.upstream_rq_time.99_5percentile to metadata.csv * Add changelog entry for #23770 * Remove changelog entry (metadata-only change) * Exercise Envoy listener immediately before E2E check scrape Add a function-scoped exercise_envoy fixture that issues HTTP requests to the listener right before each E2E test reads /stats. Without this, the time between env setup (where the conftest's requests previously lived) and the agent's check invocation can span multiple of Envoy's 5s flush windows, by which point the histogram interval values have been reset to nan and the parser silently drops them. Also temporarily drop the metadata entry for envoy.vhost.vcluster.upstream_rq_time.99_5percentile to confirm CI now reliably catches missing metadata. * Restore conftest warm-up requests for integration tests The integration test (test_check) relies on Envoy having processed traffic before the check runs to assert metrics like envoy.cluster.ext_authz.error.count. Keep the dd_environment warm-up requests for that and have exercise_envoy re-fire just before each E2E scrape. * Use exercise_envoy fixture for integration tests too Move the Envoy listener warm-up out of dd_environment and into the function-scoped exercise_envoy fixture so it's shared by both the integration tests (which previously relied on a side-effect inside dd_environment) and the E2E tests. Single source of truth for "make sure Envoy has traffic before this test runs." * Wait for an Envoy stats flush after exercising the listener Firing the requests immediately before the agent's scrape isn't enough — Envoy only rolls samples into the histogram interval view at each 5s flush, and the parser drops percentiles whose interval value is nan. Sleep 6s so the scrape lands after the flush that captured the samples but before the next empty flush resets them. * Add envoy.vhost.vcluster.upstream_rq_time.99_5percentile to metadata.csv Envoy 1.14+ emits a 99.5th percentile by default for all histograms, including vhost.vcluster.upstream_rq_time. The other upstream_rq_time families (cluster, cluster.external, etc.) already carry this entry; this one was overlooked when those were added. * Drive continuous traffic for one full Envoy flush interval The previous single burst + 6s sleep relied on Envoy's flush cycle aligning with the test's request time. While that landed in the safe window in practice, the alignment isn't designed — it depends on docker_run timing happening to be a multiple of the flush interval. Spreading requests across the window removes that dependency: the most recent completed flush always has samples, so the interval percentiles are never reset to nan. * Temporarily remove 99_5percentile metadata to validate continuous-load fixture * Derive exercise_envoy timings from a flush-interval constant * Restore envoy.vhost.vcluster.upstream_rq_time.99_5percentile in metadata.csv * Document safe-scrape budget of exercise_envoy * Move exercise_envoy to a background thread Replace the synchronous loop+sleep fixture with a threading.Thread + Event so requests keep firing through the entire test, including while the agent's check is in flight. This removes the finite "safe scrape window" the previous approach relied on — every flush window during the test, including those that close mid-scrape, now has samples. Also drop the 99_5percentile metadata entry temporarily to validate the fixture continues to reliably trigger emission on master CI. * Restore envoy.vhost.vcluster.upstream_rq_time.99_5percentile in metadata.csv
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )