[pull] master from DataDog:master#622
Merged
Merged
Conversation
…tions (#24013) * kafka_consumer: collect Kafka Connect connector metrics and configurations Adds a new KafkaConnectCollector (connectors.py) that queries the Kafka Connect REST API and Amazon MSK Connect to surface connector health, task status, and configuration into DSM. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * kafka_consumer: add changelog entry for PR #24013 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * kafka_consumer: address review comments for Kafka Connect monitoring - Gate connector collection on enable_cluster_monitoring flag - Fix dedup hash: exclude collection_timestamp from hashed content - Scope cache keys per-URL to avoid collisions across multiple Connect endpoints - Remove kafka_connect_aws_msk_cluster_arn option (boto3 API doesn't support ARN filtering) - Add MSK can_connect service check with aws_region tag - Add isinstance guard for older Kafka Connect workers returning list responses - Fix type hints and improve exception messages - Add 15 unit tests covering dedup, metrics, service checks, and compat - Regenerate config_models and conf.yaml.example after spec.yaml changes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * kafka_consumer: fix service_checks integration field to match manifest display name Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * kafka_consumer: fix license headers and service_checks integration name Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * kafka_consumer: sort metadata.csv connector metrics alphabetically Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * kafka_consumer: fix metadata.csv metric sort order (connector metrics in alphabetical position) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * kafka_consumer: remove connector service check, consolidate task metrics, add connectivity to heartbeat - Replace connector.can_connect service check with connectivity status in the cluster monitoring heartbeat (connect_api_status field) - Consolidate 4 task-state gauges (tasks_running/failed/paused/unassigned) into a single connector.tasks metric tagged with task_state - collect() now returns dict[str, bool] (endpoint → connected) - Heartbeat is sent after connector collection so it includes connect_api_status - Update tests to match new API Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * kafka_consumer: add Kafka Connect connector monitoring dashboard Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * kafka_consumer: expand connector unit test coverage to 100% Add tests covering _configure_session TLS/auth paths, _collect_rest success path, _collect_plugins, _get_items_to_fetch/_mark_items_fetched cache logic, _fetch_oidc_token, _refresh_oauth_token, MSK collect path with mocked boto3, _get_tags, and all exception paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * kafka_consumer: add Confluent Cloud Connect support Add a _collect_confluent_cloud() method that constructs the Confluent Cloud Connect REST API URL (api.confluent.cloud/connect/v1/environments/{env}/clusters/{cluster}) and delegates to the existing _collect_rest() path, which is already compatible with the Confluent Cloud API response format. New config params: kafka_connect_confluent_cloud_environment_id kafka_connect_confluent_cloud_cluster_id kafka_connect_confluent_cloud_url (optional, defaults to https://api.confluent.cloud) Authenticate using kafka_connect_username (Cloud API key ID) and kafka_connect_password (Cloud API key secret). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * kafka_consumer: remove MSK Connect support, fix Confluent Cloud guard Drop the MSK Connect (boto3) collection path — it uses a different API, returns fewer metrics, and is better handled in a follow-up PR. Also fixes the caller-side guard in kafka_consumer.py which was missing the Confluent Cloud fields, causing CC-only configs to silently produce no metrics. Also simplifies a convoluted OIDC test assertion flagged by ruff. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * kafka_consumer: address round-2 review comments - Fix dashboard "Connectors by State" paused/failed series: use count: aggregator instead of sum: (value is always 0 for non-running connectors) - Fix "Connectors by Type" toplist to count all connectors, not just running - Fix e.g., comma in spec.yaml/conf.yaml.example (documentation style) - OAuth token failure now returns endpoints as False so heartbeat reports connectivity failure instead of silently omitting connect_api_status - Task state bucketing now dynamic: covers RESTARTING/DESTROYED in addition to RUNNING/FAILED/PAUSED/UNASSIGNED - Wire up CONNECTOR_CONFIG_CACHE_MAX_SIZE in _get_events_to_send to bound the config-event cache - Type _parse_connect_urls value parameter Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * kafka_consumer: address round-3 review comments - Add credential redaction in _truncate_connector_config (SENSITIVE_KEY_PATTERN) - Fix OAuth failure on Confluent Cloud-only setup: include CC key in failure dict - Fix connect_status None sentinel: distinguish "not configured" from "crashed" - Normalize task_state casing to lowercase in connector.task.running metric - Add max_cache_size param to _get_events_to_send for consistency with _mark_items_fetched - Rename self.CONFIGS_REFRESH_INTERVAL/JITTER to lowercase _configs_refresh_interval/_jitter - Add type hints to KafkaConnectCollector.__init__ - Parameterize _parse_connect_urls signature with typed generics - Parametrize TLS and Confluent Cloud URL construction tests - Add test for OAuth failure on CC-only deployment - Fix changelog entry: remove MSK Connect claim, add Confluent Cloud - Backtick URL in kafka_connect_confluent_cloud_url description - Add requests as explicit pyproject.toml dependency - Remove demo docker-compose-connect-demo.yaml Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * kafka_consumer: extract shared cache helpers into EventCacheMixin Move _get_items_to_fetch, _mark_items_fetched, _get_events_to_send, _get_tags, and _original_cluster_id_field into a new EventCacheMixin so ClusterMetadataCollector and KafkaConnectCollector share one copy. Also removes the dashboard JSON (shipped too early, no product sign-off) and renames CONFIGS_REFRESH_INTERVAL/JITTER to private _configs_refresh_* to match the convention already used in KafkaConnectCollector. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * kafka_consumer: revert unrelated MSK KRaft note from kafka_cluster_id_override description Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * kafka_consumer: replace requests.Session with check.http in KafkaConnectCollector Use the same internal HTTP client (check.http) that schema registry uses, passing auth/TLS/OAuth as per-request kwargs via ChainMap override rather than managing a separate requests.Session. Drop the requests dependency. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * kafka_consumer: capture full connector config instead of key allowlist Remove the IMPORTANT_CONNECTOR_CONFIG_KEYS allowlist and the 30-key cap. Send the complete connector config, redacting only sensitive key names (passwords, secrets, keys, JAAS config) with [hidden]. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * kafka_consumer: remove Confluent Cloud Connect (moved to a follow-up PR) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Update kafka_consumer/metadata.csv Co-authored-by: Eva Parish <eva.parish@datadoghq.com> * kafka_consumer: fix AttributeError when cluster monitoring enabled without kafka_connect_url Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * kafka_consumer: address round-4 review findings - Remove orphaned kafka_connect_confluent_cloud_* fields from config_models (instance.py, defaults.py); regenerate conf.yaml.example - Fix broken test assertions in test_cluster_metadata.py: rename CONFIGS_REFRESH_INTERVAL/JITTER -> _configs_refresh_interval/jitter, move EVENT_CACHE_TTL import from instance attr to module constant - Fix schema-registry credential leakage onto Connect requests: explicitly set auth=None and clear Authorization header when Connect has no auth, preventing ChainMap fallback from inheriting SR credentials - Fix metadata.csv orientation for kafka.connector.running and kafka.connector.task.running from -1 (lower-is-better) to 1 (higher-is-better) - Add EventCacheMixin._init_cache_intervals() helper; both collectors now call it instead of duplicating the jitter derivation formula - Improve OAuth bail-out: fall back to existing valid token on transient refresh failure instead of marking all endpoints unreachable - Extract _collect_connect_status() from check() in kafka_consumer.py - Add integration glue tests: connect_api_status heartbeat field presence, collector exception degrades to empty dict - Add multi-URL partial-failure test for KafkaConnectCollector.collect() - Tighten bare dict annotations to dict[str, Any] in connectors.py - Fix kafka_connect_tls_verify spec description parenthetical Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * kafka_consumer: fix remaining CONFIGS_REFRESH_INTERVAL references and formatting - Fix two remaining self.CONFIGS_REFRESH_INTERVAL/JITTER references in cluster_metadata.py (schema compatibility fetch cadence) that were missed in the rename to _configs_refresh_interval/_configs_refresh_jitter - Reformat cluster_metadata.py and connectors.py via ruff format Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * kafka_consumer: replace EventCacheMixin with CacheHelper composition Replace the mixin-based cache sharing pattern with an explicit CacheHelper object. Each collector (`ClusterMetadataCollector`, `KafkaConnectCollector`) holds `self.cache: CacheHelper` instead of inheriting from `EventCacheMixin`, making the dependency visible at construction time and eliminating the hidden `_init_cache_intervals` contract that caused missed-rename bugs. Move `_get_tags` and `_original_cluster_id_field` to standalone module functions in `cluster_metadata.py`; `connectors.py` imports them directly. Delete `event_cache_mixin.py` and add `cache.py` with `CacheHelper` and `EVENT_CACHE_TTL`. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * kafka_consumer: fix missing spaces after commas in _get_tags calls Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * kafka_consumer: per-request HTTP kwargs, revert helper functions, connector metric changes - Schema Registry HTTP settings (auth, TLS, OAuth token) now stored as per-request kwargs instead of mutating shared check.http.options, preventing credential leakage into Kafka Connect requests on the same HTTP session - Reverted _get_tags/_original_cluster_id_field to instance methods, removing the module-level extraction that increased diff size unnecessarily - Extended SENSITIVE_KEY_PATTERN to cover token, passphrase, keyfile, connection.url, basic.auth.user.info, _key$ suffixes, and private.key - Removed connector.running metric; added connector_status tag on connector.task.count instead - Added connector_full_class tag to connector metrics - Removed kafka_connect_collect_task_metrics config option; task metrics (connector.task.running) now always collected - Fixed cache.py linting (collapsed get_events_to_send signature) - Removed stale inline comments Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * kafka_consumer: remove connector.count and connector.running metrics Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * kafka_consumer: keep connector.count, drop connector.running connector.running is redundant with the connector_status tag on connector.task.count; connector.count gives a fleet-level view not derivable from per-connector metrics. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * kafka_consumer: sync instance.py with ddev 17.0.1 formatting Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * kafka_consumer: address review comments on Connect monitoring PR - Move _get_tags and _original_cluster_id_field to KafkaConfig, eliminating duplication between ClusterMetadataCollector and KafkaConnectCollector - Drop redundant connector_status tag from connector.task.count (connector_state already carries the same information) - Remove module docstring from cache.py - Fix backtick formatting in metadata.csv connector.tasks description - Move make_collector fixture and SAMPLE_CONNECTORS_RESPONSE to conftest.py - Remove section-divider comments from test_connectors.py - Refactor sensitive-key redaction and timestamp dedup tests to call collect() instead of the private _emit_connector_config_events Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * kafka_consumer: fix import sort order in conftest.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * kafka_consumer: use dd_run_check in connector unit tests Refactor test_sensitive_keys_redacted_in_config_event and test_collection_timestamp_excluded_from_hash to run through the full KafkaCheck.check() path via dd_run_check, and add missing hashlib import. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * kafka_consumer: fix lint issues after dd_run_check refactor Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * kafka_consumer: fix ruff formatting in cluster_metadata.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * kafka_consumer: address review comments on connector tests - Move seed_mock_kafka_client to conftest.py alongside SAMPLE_CONNECTORS_RESPONSE - Remove # Arrange / # Act / # Assert comments from test_connectors.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * kafka_consumer: rewrite private-method tests to go through collect() Replace direct calls to _emit_connector_metrics, _collect_rest, and _collect_plugins in test_connectors.py with calls to the public collect() method. Also configure config._get_tags and _original_cluster_id_field return values in make_collector so that tag-asserting tests work correctly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * kafka_consumer: address round-5 review findings - Add enable_cluster_monitoring prerequisite to kafka_connect_url description in spec.yaml (matches sibling schema_registry_url pattern) - Extract _build_http_kwargs module-level helper shared by connectors.py and cluster_metadata.py to eliminate duplicated auth/TLS logic - Remove stale Authorization:'' masking in _get_request_kwargs (the http.options ChainMap leak path it guarded was removed in this PR) - Add word boundaries to 'token' in SENSITIVE_KEY_PATTERN so keys like oauth.token.endpoint.uri are no longer redacted - Lowercase connector_state tag to match task_state casing - Merge two sequential for-task-in-tasks loops in _emit_connector_metrics - Add comment on eager KafkaConnectCollector construction in __init__ - Add test pinning no-token _get_request_kwargs emits no Authorization - Add aggregator-backed test cross-checking connector metrics against metadata.csv Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * kafka_consumer: fix formatting and add backticks to task_state in metadata.csv - Auto-format connectors.py (ruff trailing whitespace) - Add backticks around task_state in metadata.csv description Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * kafka_consumer: drive new Connect monitoring tests through dd_run_check Rewrite the new test_connectors.py so every test exercises the public KafkaCheck.check() entrypoint via dd_run_check and asserts on emitted metrics and DSM events, instead of calling collector/cache internals. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * kafka_consumer: move CONNECT_URL constant to common.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Eva Parish <eva.parish@datadoghq.com>
* Tag Kueue queues from workloadmeta * Add Kueue tagger changelog * test fixes * Fix linter * Tag LocalQueue resource metrics from workloadmeta. * Cover LocalQueue resource tag enrichment. * Reset Kueue tagger stub between tests. * Cover scoped Kueue tagger enrichment.
* Wait for Tekton trigger metrics in E2E * Add type hints to Tekton metric wait helper
…24158) Adds account configuration for the OAuth authorization-code flow, sourced from the canonical definitions in dd-source (domains/web-integrations/libs/go/oauthstaticconfig + account-config schemas). Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ba EdgeConnect (#24130) * Validate appliance IP addresses from Orchestrator response in HPE Aruba EdgeConnect Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Add changelog entry for PR #24130 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Use neutral names in non-IP appliance test fixtures Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix E2E test setup: use static appliance IP on docker network The fake appliance now listens on port 443 inside the container and is assigned a static IP (192.168.200.10) on the docker-compose network. The orchestrator returns this plain IP, which passes strict ipaddress.ip_address() validation. The conftest health check uses the host-exposed port mapping separately. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Rework E2E appliance setup: DNS resolution, no hardcoded subnet The fake orchestrator now resolves the appliance IP via Docker DNS (socket.gethostbyname("dd-appliance")) at request time instead of reading a hardcoded IP from an env var. The fake appliance listens on port 443 inside the container so the check connects via standard HTTPS. Removes the custom network subnet and static IP assignment from docker-compose. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Use get_container_ip to resolve appliance container IP for E2E tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Remove socket fallback from fake_orch now that APPLIANCE_IP is always set Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Revert to DNS resolution for appliance IP in fake orchestrator Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Add recommendation to configure appliance_ips allowlist in README Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Extract appliance IP validation into _parse_appliances method Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )