[pull] master from DataDog:master#488
Merged
Merged
Conversation
* [nifi] Scaffold NiFi integration
Bootstraps the NiFi Agent integration with:
- Configuration spec with NiFi-specific options (api_url, bulletin
collection, cardinality controls) plus HTTP template for auth/TLS
- Auto-generated config models from spec
- Manifest with events enabled, metric prefix nifi., process signatures
- Minimal metadata.csv with nifi.can_connect sentinel metric
- Stub check class and smoke test
- hatch.toml targeting NiFi 2.8
Part of AI-6668.
* Fix metadata.csv header and clarify process_groups default
- Remove extra sample_tags column from metadata.csv to match repo convention
- Clarify process_groups description: omitting defaults to root only
* [nifi] Add Docker test environment
- docker-compose.yaml: NiFi 2.8.0 with single-user auth and HTTPS
- setup-flow.sh: Creates happy-path (GenerateFlowFile→LogMessage) and
error-path (GenerateFlowFile→PutFile /nonexistent) test flows via REST API
- conftest.py: dd_environment fixture with CheckDockerLogs and WaitFor
conditions for NiFi readiness and flow setup
- common.py: Shared test constants, credentials, and instance configs
Part of AI-6668.
* Address review: use HTTP readiness check, remove dead code
- Replace CheckDockerLogs with WaitFor(wait_for_nifi) using HTTP poll
since we can't verify exact NiFi 2.x startup log messages
- NiFi returns 401 on unauthenticated requests, which confirms it's up
- Add capture=True to run_command for better CI diagnostics
- Remove unused E2E_METADATA (deferred to PR 9)
* [nifi] Add API client with token auth and health metrics
- NiFiApi class: token-based auth (POST /access/token, expects 201),
automatic retry on 401, version caching from /flow/about
- NifiCheck.check(): auth, version tagging, can_connect gauge (1/0),
cluster health (connected_node_count, total_node_count, is_healthy)
- 9 unit tests covering auth success/failure, token refresh, cluster
healthy/degraded/standalone, version caching
- Remove auth_token override from spec (HTTP template handles it)
Part of AI-6668.
* Fix auth_token and header handling from review
- Remove auth_token param (HTTP template's auth_token is a complex
reader/writer mapping, not a simple string)
- Use extra_headers instead of headers to merge with RequestsWrapper
defaults rather than overriding them
* Address py-code-reviewer and codex findings
- Fix dead code: remove raise_for_status() before explicit raise in
_authenticate(), use single raise with response= kwarg
- Fix encapsulation: move _ensure_auth() into _request() so callers
don't need to manage auth state
- Add self.log.exception() in check failure path for diagnostics
- Remove unused endpoint constants (YAGNI)
- Use int() idiom for bool-to-int gauge value
- Move imports to top level, use specific HTTPError in pytest.raises
- Replace dict-as-counter with iter()/next() pattern
* Add test for no-auth mode (agint-review finding)
* AI-6668 Add system diagnostics metrics
Collect from /system-diagnostics:
- JVM heap used/max/utilization, non-heap used
- Total/daemon threads, CPU load, available processors
- GC collection count and time (monotonic_count, tagged gc_name)
- FlowFile/content/provenance repo used/free/utilization
Includes parsing of NiFi's percentage strings (e.g., "20.0%").
4 new unit tests, 19 new metrics in metadata.csv.
* AI-6668 Add flow status and process group metrics
Collect from /flow/status (controller summary) and
/flow/process-groups/{id}/status?recursive=true:
Flow: active threads, flowfiles/bytes queued, running/stopped/
invalid/disabled processor counts.
Process group (tagged process_group_name, process_group_id):
flowfiles queued, bytes queued, bytes read/written, flowfiles
received/sent/transferred, active threads.
Recursively flattens nested process groups.
15 new metrics, 3 new unit tests.
* AI-6668 Add opt-in connection and processor metrics
Connection metrics (opt-in collect_connection_metrics, default false):
queued count/bytes, backpressure utilization, flowfiles in/out.
Tagged: connection_name, source_name, destination_name, process_group_id.
Processor metrics (opt-in collect_processor_metrics, default false):
flowfiles in/out, bytes read/written, task count, processing time,
active threads, run status.
Tagged: processor_name, processor_type, process_group_id.
Both extracted from the existing recursive process group response
(no new API calls). Cardinality controlled by max_connections and
max_processors (default 200 each).
14 new metrics, 4 new unit tests.
* AI-6668 Add bulletin events with persistent cache dedup
Poll /flow/bulletin-board and forward NiFi bulletins as Datadog
events. Dedup by tracking last-seen bulletin ID via persistent cache.
Features:
- Filter by bulletin_min_level (default WARNING)
- Skip unreadable bulletins (canRead=false)
- Cap at max_bulletins_per_cycle (default 100)
- Configurable via collect_bulletins (default true)
- Event alert_type mapped from bulletin level (ERROR->error, else warning)
- Persistent cache survives Agent restart
6 new unit tests with mocked persistent cache for test isolation.
* AI-6668 Add integration tests and fix Basic Auth conflict
Integration tests against real Docker NiFi:
- test_check: full run, verify core metrics emitted
- test_check_with_connection_metrics: opt-in connection metrics
- test_check_with_processor_metrics: opt-in processor metrics
- test_auth_failure: wrong password emits can_connect=0
- test_metadata_metrics: all emitted metrics match metadata.csv
Fix: disable RequestsWrapper's automatic Basic Auth. NiFi uses
its own token auth (POST /access/token for JWT), not HTTP Basic.
The HTTP template's username/password fields were triggering
Basic Auth headers that conflicted with Bearer token auth.
* AI-6668 Add E2E test, log config, and README
- E2E test via dd_agent_check for Agent container testing
- Log pipeline config for nifi-app.log (grok parser for NiFi log format)
- README with setup instructions, metric/event/service check docs
* Add NiFi overview dashboard
Minimal OOTB dashboard with:
- Can Connect status widget
- JVM Heap Utilization timeseries
- Running Processors count
- FlowFiles Queued timeseries
- GC Collection Rate by gc_name
- Active Threads timeseries
- Repository Utilization (flowfile + content)
* Add missing copyright headers to config_models files
* Fix timestamp coercion, parse_utilization N/A handling, and process group dedup
- Fix bulletin timestamp: use astimezone(utc) for aware datetimes instead of
no-op replace(tzinfo=dt.tzinfo)
- Fix _parse_utilization: return 0.0 for None, empty string, and 'N/A'
(NiFi returns 'N/A' when metrics are unavailable)
- Fix process group collection: track visited IDs to prevent duplicate metrics
when process_groups config lists both a parent and its descendant
- Add tests for all three fixes
* Add nifi to .ddev/config.toml manifest overrides
Registers display name, metrics prefix, and supported platforms
for the manifest-less NiFi integration per the PubPlatform guide.
* Address code review feedback: metric mappings, service check, auth, and caps
- Extract repeated metric submission into mapping tuples in constants.py
with _submit_gauges/_submit_monotonic_counts helpers
- Change can_connect from gauge to service check (aligns with RFC and
repo conventions)
- Only disable RequestsWrapper auth when NiFi credentials are provided,
preserving reverse-proxy auth configurations
- Map bulletin alert_type correctly for all severity levels (INFO/DEBUG
were incorrectly mapped to 'warning')
- Apply max_connections/max_processors caps globally across all process
groups instead of per-group
- Remove version caching in NiFiApi.get_about() so nifi_version tag
stays accurate after upgrades
- Move RUN_STATUS_MAP to constants.py alongside other metric mappings
- Add debug log for unparseable bulletin timestamps
- Expand E2E test from 4 to 48 metric assertions with
assert_all_metrics_covered() and bulletin event assertion
- Add truncation warning and INFO alert_type unit tests
- Fix all direct dict access (gc["name"], repo["identifier"]) to use
.get() with defaults
* Fix CI: sort metadata.csv, use apache_nifi integration name, relax E2E bulletin assertion
- metadata.csv was unsorted and used 'nifi' instead of 'apache_nifi'
in the integration column
- E2E bulletin event assertion was timing-dependent; replaced with
a comment noting the expectation without a hard assertion
* Add nifi to labeler.yml for integration PR label auto-detection
* Fix CI validation failures
- metadata.csv: use 'nifi' integration column (matches datadog-assets expectation)
- .ddev/config.toml: add metadata integration override to reconcile display name
- manifest.json: add required 'owner' field (agent-integrations)
- instance.py: regenerate config model with repo ddev (adds SECURE_FIELD_NAMES)
- changelog.d: rename 1.added to 23110.added to match PR number
* Fix CI validation failures: license headers, changelog name, CI config sync
- Move license headers above ABOUTME comments in all 8 Python files
(validator requires license as first lines)
- Rename changelog.d/1.added to 23110.added to match PR number
- Sync CI config: add Apache NiFi to .codecov.yml and test-all.yml
* Address review feedback: Validating state, bulletin ID reset, pg_id dedup
- Add 'Validating': -3 to RUN_STATUS_MAP so it is distinct from Invalid (-1)
- Detect bulletin ID reset after NiFi restart by comparing max board ID
against cached watermark; clear watermark when reset is detected
- Only add process group IDs to visited set when actually present, so
ID-less groups don't silently block each other
* Emit nifi.can_connect as a gauge per RFC
The RFC specifies nifi.can_connect as a gauge (1 = API reachable,
0 = unreachable) to replace service checks, which are soft-deprecated.
The implementation had drifted back to service_check; this aligns code,
tests, and metadata.csv with the RFC.
Resolves the CI validate-metrics_metadata failure: the manifest
declared nifi.can_connect as the check metric but it was missing
from metadata.csv.
* Move dashboard, logs, service_checks, manifest assets to Developer Platform
Per reviewer guidance, new integrations should manage these assets
through the Developer Platform rather than committing them to
integrations-core. The only asset kept in the repo is
assets/configuration/spec.yaml. Drops manifest.json, images/,
assets/dashboards/, assets/logs/, assets/service_checks.json, and
the stale service_checks reference in the README.
Signed-off-by: sarah-witt <sarah.witt@datadoghq.com>
…23372) * Harden dependency wheel promotion against PR code execution Signed-off-by: iliakur <ilia.kurenkov@datadoghq.com> * add changelog Signed-off-by: iliakur <ilia.kurenkov@datadoghq.com> --------- Signed-off-by: iliakur <ilia.kurenkov@datadoghq.com>
…lows (#23378) * don't use --base flag if running test agent workflow * Also cover test-agent-target * persistent context env var
* Harden XML plan parsing * Add changelog * Avoid instantiating at module level for multi threaded use * Fix formatting
* Parameterize database query in SQL Server schema collection * Lint * Changelog
* updated fci query * added tests * removed unused variable * minor changes to the test * added changelog * updated comment
…ection (#23389) * Improve robustness of database name handling in schema collection Switch `_get_databases` to use `%s` parameterized placeholders passed to `cursor.execute` instead of formatting values directly into the query string. This applies to the autodiscovery IN clause as well as the exclude/include regex filters, ensuring database names and filter patterns are always properly handled regardless of their content. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Add changelog entry for query parameterization fix Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Escape percent literal in DATABASE_INFORMATION_QUERY for parameterized execution psycopg treats % as a placeholder prefix whenever params are passed to cursor.execute. The LIKE wildcard in the base query must be %% so it renders as a literal % after substitution. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Simplify _get_databases query construction Build query and params before acquiring the connection, flatten autodiscovery nesting, and use list multiplication for placeholder generation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* Parameterize explain statement queries in postgres integration Replace dollar-quoting string interpolation with psycopg bound parameters when calling the explain function, improving robustness of query handling. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Add changelog entry for postgres explain statement fix Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )