CleanCloud is policy-as-code. Drop a cleancloud.yaml in your repository root and it is auto-detected on every scan. Version-control it alongside your infrastructure — every exception is a git-reviewable approval.
Quick links: Exit codes · Filtering precedence · Rule IDs · CI/CD integration · Real-world examples · Troubleshooting
CleanCloud looks for a config file in this order:
--config path/to/cleancloud.yaml(CLI flag — highest priority)cleancloud.yamlin the current working directory.cleancloud/config.yamlin the current working directory- No config — all rules enabled with default settings
Filters are applied in this order (strongest first):
| Priority | Layer | What it does |
|---|---|---|
| 1 | Exceptions | Explicit human approvals — bypass all other filters |
| 2 | Tag filtering | Suppress findings on tagged/labelled resources |
| 3 | Rule enable/disable | Skip rules entirely before the scan runs |
| 4 | Rule params | Tune rule thresholds (e.g. idle_days) |
| 5 | Defaults | Global fallbacks for min_cost, confidence, override_risk_level |
| 6 | Thresholds | CI/CD exit-code policy (applied after all findings are collected) |
Exceptions are absolute. A resource in the exceptions list is never re-evaluated by any downstream filter (tag rules, min_cost, confidence, or thresholds). This invariant holds regardless of other config — exceptions represent human approvals, not cost policy.
version: 1
# ── Defaults ────────────────────────────────────────────────────────────────
# Applied to every rule unless overridden at the rule level.
defaults:
min_cost: 10 # suppress findings below $10/month (per-finding)
confidence: MEDIUM # suppress LOW confidence findings globally
# override_risk_level: HIGH # rarely used globally; prefer per-rule
# ── Tag filtering ────────────────────────────────────────────────────────────
tag_filtering:
enabled: true
mode: exclude # "exclude" suppresses findings on matched resources
ignore:
- key: env
values: [production, staging] # list of values to match
- key: cleancloud-ignore # key-only match (any value)
# ── Rule configuration ───────────────────────────────────────────────────────
rules:
aws.resource.untagged:
enabled: false
aws.rds.instance.idle:
enabled: true
min_cost: 100 # suppress findings below $100/month (overrides default)
confidence: MEDIUM # suppress LOW confidence findings for this rule
params:
idle_days: 21 # flag after 21 days idle (default: 14)
aws.sagemaker.endpoint.idle:
override_risk_level: HIGH # override `risk` field for display/reporting only
# ── Exceptions ────────────────────────────────────────────────────────────────
exceptions:
- rule_id: aws.ec2.instance.stopped
resource_id: i-0abc1234567890def
reason: "Bastion — started on demand"
- rule_id: aws.rds.instance.idle
resource_id: "db-test-*" # glob: suppresses all db-test-* resources
reason: "Test databases are ephemeral"
- rule_id: aws.ebs.unattached
resource_id: "vol-*"
account_id: "111111111111" # narrow to a specific AWS account
region: us-east-1 # narrow to a specific region
reason: "Legacy volumes in legacy account"
# ── Thresholds ─────────────────────────────────────────────────────────────────
thresholds:
fail_on_confidence: HIGH # exit 2 if any HIGH confidence finding remains
fail_on_cost: 500 # exit 2 if total estimated waste >= $500/month
fail_on_findings: false # exit 2 on any finding (usually too noisy for CI)Execution context defaults. CLI flags always take precedence — these are fallbacks when flags are omitted, letting you run cleancloud scan --config cleancloud.yaml with no other flags.
scan:
provider: aws # default provider (overridden by --provider)
regions: auto # auto-detect active regions (equivalent to --all-regions)
# regions: us-east-1 # or pin to a single region (equivalent to --region)| Field | Type | Description |
|---|---|---|
provider |
aws | azure | gcp |
Default provider. CLI --provider overrides this. |
regions |
"auto" or region string |
auto = all active regions. A string = single region. CLI --region / --all-regions override this. |
Note:
scanis execution context, not policy. It controls where to scan. Policy sections (rules,exceptions,thresholds) control what to evaluate.
Global fallbacks applied to every rule unless the rule has its own setting.
| Field | Type | Description |
|---|---|---|
min_cost |
float | Suppress findings with estimated_monthly_cost_usd below this value. Per-finding. |
confidence |
LOW | MEDIUM | HIGH | Suppress findings below this confidence level. |
override_risk_level |
LOW | MEDIUM | HIGH | Override the risk field on all findings (display only). |
Suppress findings on resources that carry specific tags or labels.
| Field | Type | Default | Description |
|---|---|---|---|
enabled |
bool | true |
Enable/disable tag filtering. |
mode |
string | "exclude" |
"exclude" suppresses matched resources. "include" (allowlist) is planned. |
ignore |
list | [] |
List of {key, values?} tag rules. values accepts a list of strings to match; omit (or leave empty) to match any value (key-only match). |
Precedence: Tag filtering runs after exceptions. Explicitly-excepted resources are not re-suppressed by a tag rule.
Enable/disable rules, tune parameters, and override confidence/cost thresholds per rule.
| Field | Type | Default | Description |
|---|---|---|---|
enabled |
bool | true |
Set false to skip this rule entirely (pre-scan). |
min_cost |
float | null |
Suppress findings below this monthly cost. Overrides defaults.min_cost. |
confidence |
LOW | MEDIUM | HIGH | null |
Minimum confidence to report. Overrides defaults.confidence. |
override_risk_level |
LOW | MEDIUM | HIGH | null |
Override the risk field on findings. Display/reporting only — does not affect fail_on_confidence. |
params |
dict | {} |
Pass named parameters to the rule function. Unknown keys or wrong types raise an error at scan start. |
Rule IDs must match exactly — a typo raises an error with a suggestion:
Unknown rule ID 'aws.rds.instnace.idle' (did you mean 'aws.rds.instance.idle'?)
See rules.md for the full list of rule IDs and their supported params.
Common params:
| Param | Rule ID | Default | Description |
|---|---|---|---|
idle_days_threshold |
aws.elbv2.alb.idle |
14 | Days of zero traffic before flagging (ALB) |
idle_days_threshold |
aws.elbv2.nlb.idle |
14 | Days of zero traffic before flagging (NLB) |
idle_days_threshold |
aws.elb.clb.idle |
14 | Days of zero traffic before flagging (CLB) |
idle_days_threshold |
aws.ec2.nat_gateway.idle |
14 | Days of zero traffic before flagging |
idle_days_threshold |
aws.rds.instance.idle |
14 | Days of no connections before flagging |
idle_days_threshold |
aws.sagemaker.endpoint.idle |
14 | Days of no observed InvokeEndpoint traffic before flagging |
idle_days_threshold |
aws.sagemaker.notebook.idle |
14 | Days of stale control-plane timestamp state before flagging |
idle_days_threshold |
aws.sagemaker.studio_app.idle |
7 | Days since the last usable Studio app activity timestamp before flagging |
long_running_hours_threshold |
aws.sagemaker.training_job.long_running |
24 | Hours before an InProgress SageMaker training job is flagged |
idle_days |
azure.ml.compute_instance.idle |
14 | Days since last control-plane activity before flagging |
idle_days |
azure.sql.database.idle |
14 | Days of no connections before flagging |
idle_days |
azure.app_service.idle |
14 | Days of zero requests before flagging |
days_unused |
azure.container_registry.unused |
90 | Days with zero successful pulls and pushes before flagging |
max_age_days |
aws.ec2.ami.old |
180 | Age in days before flagging |
max_age_days |
aws.ebs.snapshot.old |
90 | Age in days before flagging |
max_age_days |
aws.rds.snapshot.old |
90 | Age in days before flagging |
max_age_days |
aws.ec2.eni.detached |
60 | Age in days before flagging |
max_age_days |
aws.ec2.instance.stopped |
30 | Days stopped before flagging |
max_age_days |
azure.compute.snapshot.old |
90 | Age in days for the higher-confidence snapshot review band |
max_age_days |
gcp.compute.snapshot.old |
90 | Age in days before flagging |
max_age_days |
gcp.compute.vm.stopped |
30 | Days stopped before flagging |
days_unattached |
aws.ec2.elastic_ip.unattached |
30 | Days unattached before flagging |
Suppress findings for specific resources. Exceptions are absolute — they run first in the filtering pipeline and a matched finding is never re-evaluated by any downstream filter (min_cost, confidence, tag rules, or CI thresholds). This is by design: an exception represents an explicit human approval, not a policy tuning parameter.
| Field | Type | Required | Description |
|---|---|---|---|
rule_id |
string | ✅ | Exact rule ID match. |
resource_id |
string | ✅ | Glob pattern supported: *, ?, [seq]. E.g. "test-*", "*-staging". |
reason |
string | — | Human-readable justification. Recommended for auditability. |
account_id |
string | — | Narrow to a specific AWS account ID, GCP project ID, or Azure subscription ID. If omitted, matches any account. |
region |
string | — | Narrow to a specific region (e.g. us-east-1). If omitted, matches any region. |
expires_at |
string | — | ISO date YYYY-MM-DD. Exception is skipped (with a stderr warning) after this date. Prevents exception graveyard. |
Examples:
exceptions:
# Exact match
- rule_id: aws.ec2.instance.stopped
resource_id: i-0abc1234567890def
reason: "Bastion host — started on demand"
# Glob — suppress all test databases
- rule_id: aws.rds.instance.idle
resource_id: "db-test-*"
reason: "Test databases are ephemeral"
# Scoped to one account + region
- rule_id: aws.ebs.unattached
resource_id: "vol-*"
account_id: "111111111111"
region: us-west-2
reason: "Archive volumes in legacy account"Override the default scan category in config — equivalent to --category on the CLI. CLI flag takes precedence when explicitly set.
| Field | Type | Default | Description |
|---|---|---|---|
include |
list | ["hygiene"] |
Categories to run. Values: hygiene, ai, all. [hygiene, ai] is equivalent to all. |
Example:
categories:
include: [hygiene, ai] # same as: cleancloud scan --category allConfig-file equivalents of --fail-on-* CLI flags. CLI flags take precedence when both are set.
| Field | Type | Default | Description |
|---|---|---|---|
fail_on_findings |
bool | false |
Exit 2 if any findings remain after filtering. |
fail_on_confidence |
LOW | MEDIUM | HIGH | null |
Exit 2 if any finding has confidence ≥ this level. |
fail_on_cost |
float | null |
Exit 2 if total estimated_monthly_cost_usd across all findings ≥ this value. |
| Code | Meaning |
|---|---|
0 |
Scan completed, no policy violation |
1 |
Unexpected error (bug or infrastructure failure) |
2 |
Policy violation — one or more threshold conditions breached |
3 |
Permission error — insufficient IAM/RBAC permissions to complete scan |
Threshold evaluation order (all conditions are OR — first breach wins):
fail_on_findings: true— any remaining finding triggers exit 2fail_on_confidence: X— any finding with confidence ≥ X triggers exit 2fail_on_cost: X— total estimated waste ≥ X triggers exit 2
If multiple thresholds are configured, the first one that triggers determines the exit code. All conditions use OR logic — there is no AND mode.
Important: override_risk_level in rules does NOT affect fail_on_confidence. Thresholds evaluate signal strength (confidence), not the display risk label.
Multi-account scope is configured via CLI flags today. YAML-based scope configuration is planned.
# Scan all AWS Organization accounts
cleancloud scan --provider aws --org --all-regions --config cleancloud.yaml
# Scan specific accounts
cleancloud scan --provider aws --accounts 111111111111,222222222222 --config cleancloud.yamlScoping exceptions to specific accounts is one of the most powerful (and underused) features. In a 50-account org, the same resource ID prefix can exist in multiple accounts. Without account_id, an exception would suppress findings across all of them:
exceptions:
# ❌ Suppresses vol-* in ALL accounts — almost never what you want
- rule_id: aws.ebs.unattached
resource_id: "vol-*"
reason: "Archive volumes"
# ✅ Scoped to a single account + region — precise and auditable
- rule_id: aws.ebs.unattached
resource_id: "vol-*"
account_id: "111111111111" # archive account
region: us-west-2
reason: "Archive volumes in legacy account — migration planned Q3"
# ✅ Suppress all test databases across a dedicated test account
- rule_id: aws.rds.instance.idle
resource_id: "db-*"
account_id: "222222222222" # test/dev account
reason: "Dev account databases are intentionally ephemeral"Rule tuning via params also works consistently across accounts — a single cleancloud.yaml at the org root applies to all accounts in the scan.
These three concepts are distinct:
| Concept | Field | Set by | Affects |
|---|---|---|---|
| Confidence | finding.confidence |
Rule logic (signal strength) | Filtering, CI/CD thresholds, sorting |
| Risk | finding.risk |
Rule logic (cost/impact estimate) | Display only |
| override_risk_level | config override_risk_level |
Policy config | Overrides finding.risk for display — does NOT affect confidence or thresholds |
Use thresholds.fail_on_confidence (not override_risk_level) for CI/CD gates.
Run specific rule categories via CLI:
cleancloud scan --provider aws --category hygiene # default: infrastructure waste
cleancloud scan --provider aws --category ai # AI/ML waste (SageMaker, AML, Vertex)
cleancloud scan --provider aws --category all # all rulesYAML-based category configuration is planned.
Tag your prod/staging resources with env: production / env: staging and suppress them globally:
tag_filtering:
enabled: true
mode: exclude
ignore:
- key: env
values: [production, staging]
- key: cleancloud-ignore # opt-out any resource with this tag (any value)Findings on tagged resources are suppressed before thresholds are evaluated.
thresholds:
fail_on_cost: 500 # exit 2 if total estimated waste >= $500/month
fail_on_confidence: HIGH # exit 2 if any HIGH confidence finding remainsRun weekly in CI:
cleancloud scan --provider aws --org --all-regions --output json --output-file findings.jsonexceptions:
# Known keep-alive — reviewed quarterly
- rule_id: aws.ec2.instance.stopped
resource_id: i-0abc1234567890def
reason: "Bastion host — started on demand"
expires_at: "2026-12-31"
# Suppress all test databases (glob)
- rule_id: aws.rds.instance.idle
resource_id: "db-test-*"
reason: "Test databases are intentionally ephemeral"
# Scope to one account + region (avoid suppressing across all accounts)
- rule_id: aws.ebs.unattached
resource_id: "vol-*"
account_id: "111111111111"
region: us-west-2
reason: "Archive volumes — migration planned Q3"scan:
provider: aws
regions: auto
defaults:
min_cost: 10 # suppress noise below $10/month
confidence: MEDIUM # skip LOW confidence findings
rules:
aws.rds.instance.idle:
min_cost: 100 # only flag RDS instances with > $100/month estimated cost
params:
idle_days: 21 # require 21 days idle (default: 14)
aws.sagemaker.endpoint.idle:
override_risk_level: HIGH # escalate risk label for visibility in reports
aws.sagemaker.notebook.idle:
params:
idle_days_threshold: 21 # flag notebooks with stale control-plane timestamps for 21+ days
aws.sagemaker.studio_app.idle:
params:
idle_days_threshold: 10 # flag Studio apps with no usable activity signal for 10+ days
aws.sagemaker.training_job.long_running:
params:
long_running_hours_threshold: 36 # review InProgress training jobs after 36 hours
aws.resource.untagged:
enabled: false # team manages tags separately
thresholds:
fail_on_confidence: HIGH
fail_on_cost: 500Commit this to your repo root and run:
cleancloud scan --org --all-regionsCreate one config per environment — pass with --config:
configs/prod.yaml — strict:
defaults:
confidence: MEDIUM
thresholds:
fail_on_confidence: HIGH
fail_on_cost: 200configs/staging.yaml — lenient:
defaults:
min_cost: 50
thresholds:
fail_on_cost: 1000cleancloud scan --provider aws --config configs/prod.yaml --all-regions
cleancloud scan --provider aws --config configs/staging.yaml --region us-east-1Start conservative, tighten later:
defaults:
confidence: HIGH # only surface HIGH confidence findings initially
min_cost: 50 # ignore small findings while calibratingSilence a noisy rule without disabling it:
rules:
aws.resource.untagged:
confidence: HIGH # only report HIGH confidence untagged findingsDifferent thresholds for different environments: Use separate config files per environment and pass with --config:
cleancloud scan --provider aws --config configs/staging.yaml
cleancloud scan --provider aws --config configs/production.yaml