Skip to content

[pull] master from DataDog:master#613

Merged
pull[bot] merged 5 commits into
ConnectionMaster:masterfrom
DataDog:master
Jun 22, 2026
Merged

[pull] master from DataDog:master#613
pull[bot] merged 5 commits into
ConnectionMaster:masterfrom
DataDog:master

Conversation

@pull

@pull pull Bot commented Jun 22, 2026

Copy link
Copy Markdown

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

jaypatel7-crest and others added 5 commits June 22, 2026 14:27
* Add Palo Alto Networks Cortex XSOAR integration

* Updated assets

* Resolve CI failures

* Resolve CI failures

* Resolve CI failures

* Resolve CI failures

* Resolve CI failures

* Resolve CI failures

* Address codex review comment

* Addressed review comments

* Renamed monitors

* Updated manifest

---------

Co-authored-by: savandalasaniya-crest <savan.dalasaniya@crestdata.ai>
Co-authored-by: carlos-turiegano-dd <carlos.turiegano@datadoghq.com>
Co-authored-by: Gustavo Mora <tavo.mora92@hotmail.com>
* Ignore non-package tags when auto-detecting release packages

Release branches carry sentinel *-bootstrap-* tags (e.g. dk-bootstrap-7.78.0)
that do not map to a real package. Auto-detection from tags at HEAD treated
these as unknown packages and aborted the release with 'Unknown packages'.
Auto-detect now ignores tags with no matching package, while manual package
selection still errors on unknown names.

* Shorten resolve_packages docstring

* Split resolve_packages docstring into two lines
…onotonic_count (#24125)

* fix(mysql): submit index usage metrics as monotonic_count instead of gauge

count_read/count_update/count_delete in performance_schema.table_io_waits_summary_by_index_usage
are cumulative counters. Submitting them as gauge caused mysql.index.reads/updates/deletes to
show as monotonically increasing raw totals rather than delta counts per collection interval.

* Add changelog entry for #24125

* Update mysql index metrics metadata and changelog for #24125

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix metadata.csv: use count (backend type) for monotonic_count index metrics

monotonic_count submissions map to count in the Datadog backend.
assert_metrics_using_metadata with check_submission_type=True applies
METRIC_TYPE_SUBMISSION_TO_BACKEND_MAP before comparing, so metadata.csv
must use count not monotonic_count.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix E2E test: skip usage metric assertions when collection_interval prevents emission

monotonic_count metrics with collection_interval=300 are never emitted in E2E:
check_rate=True runs the check twice but the second run skips the query
(300s not elapsed), so no delta is flushed. Use at_least=0 in E2E mode.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* Remove Mongo queryStats server-side sort

* Add changelog for Mongo queryStats sort removal

* Fix Mongo test import ordering

* Add Mongo queryStats test return type
* [flink] add OpenMetrics-based collection mode

Adds an alternative collection mode that scrapes Flink's
flink-metrics-prometheus reporter via OpenMetricsBaseCheckV2,
complementing the existing Datadog HTTP Reporter push-based mode.

Motivation: the existing push-based collection requires the
Datadog API key to live inside Flink's configuration. On Kubernetes
deployments managed by the Flink Operator, flink-conf.yaml is
mounted from a read-only ConfigMap, which makes secret injection
brittle and forces workarounds like the vals plugin in ArgoCD.

With agent-side OpenMetrics scraping the API key lives with the
Datadog Agent, which is the standard pattern for K8s deployments
and fits cleanly with External Secrets Operator + Secret refs.

Includes:
- FlinkCheck inheriting from OpenMetricsBaseCheckV2
- METRIC_MAP covering the JVM and core jobmanager / taskmanager
  metrics (representative subset; full coverage to follow once
  the approach is agreed)
- Updated configuration spec and conf.yaml.example
- Unit tests with a fixture covering the mapped metrics
- README updated to document both collection modes

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [flink] fix CI validations

- Regenerate conf.yaml.example via 'ddev validate config flink -s'
  (drops two stray lines from the openmetrics_endpoint comment block)
- Add auto-generated config_models/ via 'ddev validate models flink -s'
- Replace non-ASCII em-dashes in README.md with ASCII hyphens
- Rename changelog fragment to use the PR number ('23857.added' instead
  of 'openmetrics-collection.added') so check_pr.py can parse it

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [flink] complete the integration: full METRIC_MAP, E2E test, hostname

- Expand METRIC_MAP from a representative subset (~20) to the full
  87 metrics declared in metadata.csv (script-generated)
- Add tests/compose/docker-compose.yaml with jobmanager + taskmanager
  configured to expose Flink's flink-metrics-prometheus reporter, plus
  tests/test_e2e.py asserting core JVM and cluster metrics
- Add hatch.toml so 'ddev test' picks up the integration
- Set hostname_label + exclude_labels for 'host' in the check's
  default config: Flink's Prometheus reporter labels every series
  with 'host' to identify the source JM/TM, which collides with
  Datadog's reserved hostname tag. Promote it to the metric's
  hostname and drop it from the tag set.
- Tweak unit-test fixtures/expectations to match (gauges instead
  of monotonic counts since Flink emits all counter-like metrics as
  Prometheus gauges)

Verified locally: ddev validate all flink (passes), ddev test flink -m
unit (2 passed), and docker compose up on the new compose file with
/metrics returning Prometheus output as expected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [flink] fix lint and sync CI configuration

- Add file-level `# flake8: noqa: E501` to flink/metrics.py for the
  METRIC_MAP dict (mirrors the temporal integration's pattern).
- Regenerate .codecov.yml and .github/workflows/test-all.yml via
  `ddev validate ci --sync` so the new flink target is registered.

* [flink] fix raw Prometheus name for numRecordsOutPerSecond

Flink's Prometheus reporter emits the throughput meter as
`numRecordsOutPerSecond` (full word), per the upstream metric reference:
https://nightlies.apache.org/flink/flink-docs-stable/docs/ops/metrics/.
metadata.csv however documents the DD-side metric with the truncated
suffix `PerSec` — a pre-existing asymmetry inherited from the push
reporter. Spotted by Codex on the PR.

Map the raw `_PerSecond` keys to the truncated DD-side `_PerSec` names
for both `flink_task_numRecordsOutPerSecond` and
`flink_operator_numRecordsOutPerSecond`. Extend the unit fixture and
assertions to lock the mapping in so it can't drift again.

* [flink] use Flink's actual Prometheus reporter scope names

Flink's Prometheus reporter ignores `metrics.scope.task` and
`metrics.scope.operator` overrides -- it uses Flink's internal logical
scope names (`taskmanager_job_task`, `taskmanager_job_task_operator`)
with a hardcoded `flink_` prefix. Verified by scraping a live
docker-compose + StateMachineExample job: the old MAP keys
`flink_task_*` and `flink_operator_*` never match, silently dropping
~35 metrics.

- Rename all task-scope and operator-scope keys in METRIC_MAP to the
  long `flink_taskmanager_job_task[_operator]_*` form. DD-side names
  (right side of map) are unchanged so metadata.csv still applies.
- Drop the `metrics.scope.*` overrides from the test compose -- they
  were no-ops for the Prometheus reporter.
- Drop the equivalent block from the README OpenMetrics section and
  add a note explaining that the Prometheus reporter ignores those
  overrides. The legacy Datadog HTTP Reporter section keeps its
  scope-remap block, where it's still required.
- Update the unit fixture to mirror the raw names Flink actually
  emits so the unit assertion remains a meaningful regression guard.

Live-scrape re-validation: 0 unmatched keys for the workload covered;
the 8 still-unmatched MAP keys (commitsFailed, currentInput2Watermark,
etc.) are conditional metrics not emitted by the smoke job.

* [flink] register integration in code-coverage.datadog.yml

After rebasing onto master, the upstream removal of .codecov.yml
(Datadog Code Coverage migration, PR #23360) means the flink entry
that ddev validate ci --sync added to .codecov.yml is gone. The new
code-coverage.datadog.yml needs the same registration; without it
the CI validation reports "Code coverage config has 1 missing service".

* [flink] update test_is_logs_only fixture in datadog_checks_dev

The unit test in datadog_checks_dev/tests/tooling/test_utils.py uses
'flink' as a known-logs-only fixture to exercise the is_logs_only()
helper. After this PR adds OpenMetrics-based metric collection (with
init_config:/instances:), flink is no longer logs-only, so the test
fails. Swap the fixture to 'cisco_asa', which remains a pure-logs
integration on master.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@pull pull Bot locked and limited conversation to collaborators Jun 22, 2026
@pull pull Bot added the ⤵️ pull label Jun 22, 2026
@pull pull Bot merged commit f0bd45e into ConnectionMaster:master Jun 22, 2026
1 check passed
@pull pull Bot had a problem deploying to typo-squatting-release June 23, 2026 06:51 Failure
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants