Commit 3250ec3
committed
Add lparstats check (AIX only) (DataDog#23451)
* Add lparstats check (AIX only)
Port the lparstats check from datadog-unix-agent to integrations-core,
updated for Python 3 and datadog-checks-base.
This check collects IBM POWER LPAR performance metrics on AIX via the
`lparstat` command:
- Memory statistics (system.lpar.memory.*)
- Hypervisor call statistics (system.lpar.hypervisor.*)
- I/O memory entitlements (system.lpar.memory.entitlement.*)
- SPURR processor utilization (system.lpar.spurr.*)
The manifest explicitly declares "Supported OS::AIX" as this check
relies on lparstat which is exclusive to IBM AIX on POWER hardware.
* Fix validation issues: license headers, CHANGELOG, metadata sort, spec.yaml, labeler, ci
* Generate config models, sync conf.yaml.example, sync CI, fix lint
* Add missing [project.optional-dependencies] section to pyproject.toml
* Add basic unit tests with mocked lparstat output
* Fix test: remove dd_environment fixture (no Docker needed)
* Remove unused pytest import
* Fix test: remove assert_all_metrics_covered, check key metrics only
* Add dd_environment fixture and rebase onto master
* Fix manifest: remove invalid Supported OS::AIX tag, add auto_install and source_type_id
* Apply review nits: f-strings, x.split(), hypervisor guard, dev status 5, remove setup.cfg, skip e2e on non-AIX
* lparstats: declare SPURR .pct metrics as fraction not percent
Values emitted by collect_spurr are in the [0,1] range (e.g. 0.015),
not [0,100], so unit_name=percent was incorrect.
* lparstats: set curated_metric=core for system.lpar.memory.physb
This is the manifest's primary check metric; marking it as core aligns
with the convention used by other integrations.
* lparstats: always apply DEFAULT_TIMEOUT, not only under sudo
A hung lparstat call can block the check regardless of whether sudo is
in use; unconditionally setting the timeout is the safer default.
* lparstats: strip % from field names in collect_memory_entitlements
collect_memory already strips % from its fields; apply the same
treatment in collect_memory_entitlements so a %-suffixed column in
lparstat -m -eR output produces a valid metric name.
* lparstats: add type hints to all callables
* lparstats: replace HYPERVISOR_IDX_METRIC_MAP dict with a tuple
Contiguous integer keys 0..4 are just positional indices; a tuple
is simpler and removes the need for a dict-lookup pattern.
* lparstats: patch subprocess.run in tests instead of private _run_cmd
Coupling tests to the internal _run_cmd helper is fragile; patching the
public subprocess.run interface is more stable and matches the project
testing guidelines.
* lparstats: use dd_run_check fixture instead of check.check(instance)
* lparstats: add tests for hypervisor and memory-entitlements collectors
Both collectors had zero coverage because the default instance fixture
disables them. Add a dedicated test that enables both and asserts that
expected metrics are emitted.
* lparstats: add tag assertions for all collectors
Assert that memory/SPURR metrics carry no tags and that hypervisor
(call:<name>) and entitlement (iompn:<name>) tags are present.
* lparstats: guard os.getuid() with hasattr check
os.getuid() does not exist on Windows; wrap it so the module can be
imported on non-Unix platforms without raising AttributeError.
* lparstats: check returncode in all collectors, add lparstats.can_collect service check
Each collector now inspects the lparstat return code and skips metric
emission (with a warning) on failure instead of silently parsing empty
output. A new lparstats.can_collect service check is emitted OK when
all enabled collectors succeed and CRITICAL if any lparstat invocation
exits non-zero.
* lparstats: fix fragile SPURR actual-vs-normalized column split
The previous split used idx > len(fields)/2, which silently breaks if
lparstat -E ever changes its column count. Split at the freq column
instead (it reliably separates actual from normalized), with a fallback
warning if freq is absent.
* lparstats: extract _lparstat_rows helper to deduplicate parsing prefix
All four collectors shared the same _run_cmd → splitlines → filter →
slice pattern. _lparstat_rows(cmd, start_idx, ...) centralises it and
returns (rows, stderr, returncode) so callers can still inspect the
exit code.
* lparstats: fix curated_metric value for system.lpar.memory.physb
Valid values are cpu and memory; core is not accepted by the validator.
* lparstats: disable e2e env, remove dd_environment fixture
Set e2e-env = false in hatch.toml so CI does not try to spin up an
e2e environment for an AIX-only check that can never run on Linux CI.
Remove the now-redundant dd_environment fixture from conftest.py.
* lparstats: set owner to agent-integrations 9a1a9d61 parent e199cb7 commit 3250ec3
2 files changed
Lines changed: 2 additions & 2 deletions
Large diffs are not rendered by default.
Large diffs are not rendered by default.
0 commit comments