Skip to content

[pull] master from DataDog:master#622

Merged
pull[bot] merged 5 commits into
ConnectionMaster:masterfrom
DataDog:master
Jun 25, 2026
Merged

[pull] master from DataDog:master#622
pull[bot] merged 5 commits into
ConnectionMaster:masterfrom
DataDog:master

Conversation

@pull

@pull pull Bot commented Jun 25, 2026

Copy link
Copy Markdown

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

piochelepiotr and others added 5 commits June 25, 2026 09:19
…tions (#24013)

* kafka_consumer: collect Kafka Connect connector metrics and configurations

Adds a new KafkaConnectCollector (connectors.py) that queries the Kafka
Connect REST API and Amazon MSK Connect to surface connector health,
task status, and configuration into DSM.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* kafka_consumer: add changelog entry for PR #24013

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* kafka_consumer: address review comments for Kafka Connect monitoring

- Gate connector collection on enable_cluster_monitoring flag
- Fix dedup hash: exclude collection_timestamp from hashed content
- Scope cache keys per-URL to avoid collisions across multiple Connect endpoints
- Remove kafka_connect_aws_msk_cluster_arn option (boto3 API doesn't support ARN filtering)
- Add MSK can_connect service check with aws_region tag
- Add isinstance guard for older Kafka Connect workers returning list responses
- Fix type hints and improve exception messages
- Add 15 unit tests covering dedup, metrics, service checks, and compat
- Regenerate config_models and conf.yaml.example after spec.yaml changes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* kafka_consumer: fix service_checks integration field to match manifest display name

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* kafka_consumer: fix license headers and service_checks integration name

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* kafka_consumer: sort metadata.csv connector metrics alphabetically

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* kafka_consumer: fix metadata.csv metric sort order (connector metrics in alphabetical position)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* kafka_consumer: remove connector service check, consolidate task metrics, add connectivity to heartbeat

- Replace connector.can_connect service check with connectivity status in the
  cluster monitoring heartbeat (connect_api_status field)
- Consolidate 4 task-state gauges (tasks_running/failed/paused/unassigned) into
  a single connector.tasks metric tagged with task_state
- collect() now returns dict[str, bool] (endpoint → connected)
- Heartbeat is sent after connector collection so it includes connect_api_status
- Update tests to match new API

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* kafka_consumer: add Kafka Connect connector monitoring dashboard

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* kafka_consumer: expand connector unit test coverage to 100%

Add tests covering _configure_session TLS/auth paths, _collect_rest success path,
_collect_plugins, _get_items_to_fetch/_mark_items_fetched cache logic, _fetch_oidc_token,
_refresh_oauth_token, MSK collect path with mocked boto3, _get_tags, and all exception paths.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* kafka_consumer: add Confluent Cloud Connect support

Add a _collect_confluent_cloud() method that constructs the Confluent Cloud
Connect REST API URL (api.confluent.cloud/connect/v1/environments/{env}/clusters/{cluster})
and delegates to the existing _collect_rest() path, which is already compatible
with the Confluent Cloud API response format.

New config params:
  kafka_connect_confluent_cloud_environment_id
  kafka_connect_confluent_cloud_cluster_id
  kafka_connect_confluent_cloud_url (optional, defaults to https://api.confluent.cloud)

Authenticate using kafka_connect_username (Cloud API key ID) and
kafka_connect_password (Cloud API key secret).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* kafka_consumer: remove MSK Connect support, fix Confluent Cloud guard

Drop the MSK Connect (boto3) collection path — it uses a different API,
returns fewer metrics, and is better handled in a follow-up PR. Also
fixes the caller-side guard in kafka_consumer.py which was missing the
Confluent Cloud fields, causing CC-only configs to silently produce no
metrics.

Also simplifies a convoluted OIDC test assertion flagged by ruff.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* kafka_consumer: address round-2 review comments

- Fix dashboard "Connectors by State" paused/failed series: use count:
  aggregator instead of sum: (value is always 0 for non-running connectors)
- Fix "Connectors by Type" toplist to count all connectors, not just running
- Fix e.g., comma in spec.yaml/conf.yaml.example (documentation style)
- OAuth token failure now returns endpoints as False so heartbeat reports
  connectivity failure instead of silently omitting connect_api_status
- Task state bucketing now dynamic: covers RESTARTING/DESTROYED in addition
  to RUNNING/FAILED/PAUSED/UNASSIGNED
- Wire up CONNECTOR_CONFIG_CACHE_MAX_SIZE in _get_events_to_send to bound
  the config-event cache
- Type _parse_connect_urls value parameter

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* kafka_consumer: address round-3 review comments

- Add credential redaction in _truncate_connector_config (SENSITIVE_KEY_PATTERN)
- Fix OAuth failure on Confluent Cloud-only setup: include CC key in failure dict
- Fix connect_status None sentinel: distinguish "not configured" from "crashed"
- Normalize task_state casing to lowercase in connector.task.running metric
- Add max_cache_size param to _get_events_to_send for consistency with _mark_items_fetched
- Rename self.CONFIGS_REFRESH_INTERVAL/JITTER to lowercase _configs_refresh_interval/_jitter
- Add type hints to KafkaConnectCollector.__init__
- Parameterize _parse_connect_urls signature with typed generics
- Parametrize TLS and Confluent Cloud URL construction tests
- Add test for OAuth failure on CC-only deployment
- Fix changelog entry: remove MSK Connect claim, add Confluent Cloud
- Backtick URL in kafka_connect_confluent_cloud_url description
- Add requests as explicit pyproject.toml dependency
- Remove demo docker-compose-connect-demo.yaml

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* kafka_consumer: extract shared cache helpers into EventCacheMixin

Move _get_items_to_fetch, _mark_items_fetched, _get_events_to_send,
_get_tags, and _original_cluster_id_field into a new EventCacheMixin so
ClusterMetadataCollector and KafkaConnectCollector share one copy.

Also removes the dashboard JSON (shipped too early, no product sign-off)
and renames CONFIGS_REFRESH_INTERVAL/JITTER to private _configs_refresh_*
to match the convention already used in KafkaConnectCollector.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* kafka_consumer: revert unrelated MSK KRaft note from kafka_cluster_id_override description

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* kafka_consumer: replace requests.Session with check.http in KafkaConnectCollector

Use the same internal HTTP client (check.http) that schema registry uses,
passing auth/TLS/OAuth as per-request kwargs via ChainMap override rather
than managing a separate requests.Session. Drop the requests dependency.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* kafka_consumer: capture full connector config instead of key allowlist

Remove the IMPORTANT_CONNECTOR_CONFIG_KEYS allowlist and the 30-key cap.
Send the complete connector config, redacting only sensitive key names
(passwords, secrets, keys, JAAS config) with [hidden].

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* kafka_consumer: remove Confluent Cloud Connect (moved to a follow-up PR)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Update kafka_consumer/metadata.csv

Co-authored-by: Eva Parish <eva.parish@datadoghq.com>

* kafka_consumer: fix AttributeError when cluster monitoring enabled without kafka_connect_url

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* kafka_consumer: address round-4 review findings

- Remove orphaned kafka_connect_confluent_cloud_* fields from config_models
  (instance.py, defaults.py); regenerate conf.yaml.example
- Fix broken test assertions in test_cluster_metadata.py: rename
  CONFIGS_REFRESH_INTERVAL/JITTER -> _configs_refresh_interval/jitter,
  move EVENT_CACHE_TTL import from instance attr to module constant
- Fix schema-registry credential leakage onto Connect requests: explicitly
  set auth=None and clear Authorization header when Connect has no auth,
  preventing ChainMap fallback from inheriting SR credentials
- Fix metadata.csv orientation for kafka.connector.running and
  kafka.connector.task.running from -1 (lower-is-better) to 1 (higher-is-better)
- Add EventCacheMixin._init_cache_intervals() helper; both collectors
  now call it instead of duplicating the jitter derivation formula
- Improve OAuth bail-out: fall back to existing valid token on transient
  refresh failure instead of marking all endpoints unreachable
- Extract _collect_connect_status() from check() in kafka_consumer.py
- Add integration glue tests: connect_api_status heartbeat field presence,
  collector exception degrades to empty dict
- Add multi-URL partial-failure test for KafkaConnectCollector.collect()
- Tighten bare dict annotations to dict[str, Any] in connectors.py
- Fix kafka_connect_tls_verify spec description parenthetical

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* kafka_consumer: fix remaining CONFIGS_REFRESH_INTERVAL references and formatting

- Fix two remaining self.CONFIGS_REFRESH_INTERVAL/JITTER references in
  cluster_metadata.py (schema compatibility fetch cadence) that were missed
  in the rename to _configs_refresh_interval/_configs_refresh_jitter
- Reformat cluster_metadata.py and connectors.py via ruff format

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* kafka_consumer: replace EventCacheMixin with CacheHelper composition

Replace the mixin-based cache sharing pattern with an explicit CacheHelper
object. Each collector (`ClusterMetadataCollector`, `KafkaConnectCollector`)
holds `self.cache: CacheHelper` instead of inheriting from `EventCacheMixin`,
making the dependency visible at construction time and eliminating the hidden
`_init_cache_intervals` contract that caused missed-rename bugs.

Move `_get_tags` and `_original_cluster_id_field` to standalone module
functions in `cluster_metadata.py`; `connectors.py` imports them directly.
Delete `event_cache_mixin.py` and add `cache.py` with `CacheHelper` and
`EVENT_CACHE_TTL`.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* kafka_consumer: fix missing spaces after commas in _get_tags calls

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* kafka_consumer: per-request HTTP kwargs, revert helper functions, connector metric changes

- Schema Registry HTTP settings (auth, TLS, OAuth token) now stored as
  per-request kwargs instead of mutating shared check.http.options, preventing
  credential leakage into Kafka Connect requests on the same HTTP session
- Reverted _get_tags/_original_cluster_id_field to instance methods, removing
  the module-level extraction that increased diff size unnecessarily
- Extended SENSITIVE_KEY_PATTERN to cover token, passphrase, keyfile,
  connection.url, basic.auth.user.info, _key$ suffixes, and private.key
- Removed connector.running metric; added connector_status tag on
  connector.task.count instead
- Added connector_full_class tag to connector metrics
- Removed kafka_connect_collect_task_metrics config option; task metrics
  (connector.task.running) now always collected
- Fixed cache.py linting (collapsed get_events_to_send signature)
- Removed stale inline comments

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* kafka_consumer: remove connector.count and connector.running metrics

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* kafka_consumer: keep connector.count, drop connector.running

connector.running is redundant with the connector_status tag on
connector.task.count; connector.count gives a fleet-level view not
derivable from per-connector metrics.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* kafka_consumer: sync instance.py with ddev 17.0.1 formatting

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* kafka_consumer: address review comments on Connect monitoring PR

- Move _get_tags and _original_cluster_id_field to KafkaConfig, eliminating
  duplication between ClusterMetadataCollector and KafkaConnectCollector
- Drop redundant connector_status tag from connector.task.count (connector_state
  already carries the same information)
- Remove module docstring from cache.py
- Fix backtick formatting in metadata.csv connector.tasks description
- Move make_collector fixture and SAMPLE_CONNECTORS_RESPONSE to conftest.py
- Remove section-divider comments from test_connectors.py
- Refactor sensitive-key redaction and timestamp dedup tests to call collect()
  instead of the private _emit_connector_config_events

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* kafka_consumer: fix import sort order in conftest.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* kafka_consumer: use dd_run_check in connector unit tests

Refactor test_sensitive_keys_redacted_in_config_event and
test_collection_timestamp_excluded_from_hash to run through the full
KafkaCheck.check() path via dd_run_check, and add missing hashlib import.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* kafka_consumer: fix lint issues after dd_run_check refactor

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* kafka_consumer: fix ruff formatting in cluster_metadata.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* kafka_consumer: address review comments on connector tests

- Move seed_mock_kafka_client to conftest.py alongside SAMPLE_CONNECTORS_RESPONSE
- Remove # Arrange / # Act / # Assert comments from test_connectors.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* kafka_consumer: rewrite private-method tests to go through collect()

Replace direct calls to _emit_connector_metrics, _collect_rest, and
_collect_plugins in test_connectors.py with calls to the public
collect() method. Also configure config._get_tags and
_original_cluster_id_field return values in make_collector so that
tag-asserting tests work correctly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* kafka_consumer: address round-5 review findings

- Add enable_cluster_monitoring prerequisite to kafka_connect_url
  description in spec.yaml (matches sibling schema_registry_url pattern)
- Extract _build_http_kwargs module-level helper shared by connectors.py
  and cluster_metadata.py to eliminate duplicated auth/TLS logic
- Remove stale Authorization:'' masking in _get_request_kwargs (the
  http.options ChainMap leak path it guarded was removed in this PR)
- Add word boundaries to 'token' in SENSITIVE_KEY_PATTERN so keys like
  oauth.token.endpoint.uri are no longer redacted
- Lowercase connector_state tag to match task_state casing
- Merge two sequential for-task-in-tasks loops in _emit_connector_metrics
- Add comment on eager KafkaConnectCollector construction in __init__
- Add test pinning no-token _get_request_kwargs emits no Authorization
- Add aggregator-backed test cross-checking connector metrics against metadata.csv

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* kafka_consumer: fix formatting and add backticks to task_state in metadata.csv

- Auto-format connectors.py (ruff trailing whitespace)
- Add backticks around task_state in metadata.csv description

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* kafka_consumer: drive new Connect monitoring tests through dd_run_check

Rewrite the new test_connectors.py so every test exercises the public
KafkaCheck.check() entrypoint via dd_run_check and asserts on emitted
metrics and DSM events, instead of calling collector/cache internals.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* kafka_consumer: move CONNECT_URL constant to common.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Eva Parish <eva.parish@datadoghq.com>
* Tag Kueue queues from workloadmeta

* Add Kueue tagger changelog

* test fixes

* Fix linter

* Tag LocalQueue resource metrics from workloadmeta.

* Cover LocalQueue resource tag enrichment.

* Reset Kueue tagger stub between tests.

* Cover scoped Kueue tagger enrichment.
* Wait for Tekton trigger metrics in E2E

* Add type hints to Tekton metric wait helper
…24158)

Adds account configuration for the OAuth authorization-code flow, sourced
from the canonical definitions in dd-source
(domains/web-integrations/libs/go/oauthstaticconfig + account-config schemas).

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ba EdgeConnect (#24130)

* Validate appliance IP addresses from Orchestrator response in HPE Aruba EdgeConnect

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Add changelog entry for PR #24130

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Use neutral names in non-IP appliance test fixtures

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix E2E test setup: use static appliance IP on docker network

The fake appliance now listens on port 443 inside the container and is
assigned a static IP (192.168.200.10) on the docker-compose network.
The orchestrator returns this plain IP, which passes strict
ipaddress.ip_address() validation. The conftest health check uses the
host-exposed port mapping separately.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Rework E2E appliance setup: DNS resolution, no hardcoded subnet

The fake orchestrator now resolves the appliance IP via Docker DNS
(socket.gethostbyname("dd-appliance")) at request time instead of
reading a hardcoded IP from an env var. The fake appliance listens on
port 443 inside the container so the check connects via standard HTTPS.
Removes the custom network subnet and static IP assignment from
docker-compose.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Use get_container_ip to resolve appliance container IP for E2E tests

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Remove socket fallback from fake_orch now that APPLIANCE_IP is always set

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Revert to DNS resolution for appliance IP in fake orchestrator

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Add recommendation to configure appliance_ips allowlist in README

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Extract appliance IP validation into _parse_appliances method

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
@pull pull Bot locked and limited conversation to collaborators Jun 25, 2026
@pull pull Bot added the ⤵️ pull label Jun 25, 2026
@pull pull Bot merged commit cb40f3f into ConnectionMaster:master Jun 25, 2026
1 check passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants