Skip to content

Commit a0ee7d2

Browse files
authored
AWS & Azure : fixed issues reported by users (#167)
1 parent 7ac8a6d commit a0ee7d2

17 files changed

Lines changed: 456 additions & 22 deletions

File tree

README.fr.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -193,7 +193,7 @@ Pas encore de compte cloud ? `cleancloud demo` affiche un exemple de sortie sans
193193
- **Détection du gaspillage IA/ML sur les 3 clouds :** endpoints, notebooks, Studio apps et training jobs SageMaker ; clusters AML Compute et instances ML ; endpoints en ligne Azure ML et services Azure AI Search ; endpoints, instances Workbench et training jobs Vertex AI. Les ressources GPU sont mises en avant comme candidats de revue à risque plus élevé. Les outils natifs n'indiquent pas toujours quoi examiner — CleanCloud le fait. Opt-in via `--category ai`
194194
- **Gouvernance policy-as-code :** `cleancloud.yaml` pour la configuration par règle, les exceptions avec dates d'expiration, les seuils de coût et de confiance, les exclusions par tag — versionné aux côtés de votre infrastructure. Chaque exception est une approbation auditée dans git.
195195
- **Application de politique (opt-in) :** `--fail-on-confidence HIGH` ou `--fail-on-cost 500` — appliquer des seuils de gaspillage en CI/CD sur un planning, géré par les équipes platform ou FinOps
196-
- **45 règles de détection sélectives et haut signal :** volumes orphelins, bases de données inactives, instances arrêtées, registres inutilisés, et plus — conçues pour éviter les faux positifs en environnements IaC, chacune avec une estimation de coût déterministe
196+
- **46 règles de détection sélectives et haut signal :** volumes orphelins, bases de données inactives, instances arrêtées, registres inutilisés, et plus — conçues pour éviter les faux positifs en environnements IaC, chacune avec une estimation de coût déterministe
197197
- **Scan multi-comptes (AWS) :** scannez des AWS Organizations entières en une exécution — fichier de config, IDs inline, ou auto-découverte via `--org`
198198
- **Scan multi-abonnements (Azure) :** scannez tous les abonnements Azure en parallèle — auto-découverte via Management Group, détail des coûts par abonnement inclus
199199
- **Scan multi-projets (GCP) :** scannez tous les projets GCP accessibles en parallèle — auto-découverte via Application Default Credentials, détail des coûts par projet inclus
@@ -278,7 +278,7 @@ L'infrastructure IA/ML inactive est la source de gaspillage cloud invisible à l
278278
| Cluster AML Compute Azure (GPU) | 600 – 15 000 $ / mois |
279279
| Instance de calcul Azure ML (GPU) | 600 – 15 000+ $ / mois |
280280
| Endpoint en ligne Azure ML (GPU) | 200 – 2 600+ $ / mois |
281-
| Azure AI Search (Standard+) | 261 – 4 028+ $ / mois |
281+
| Azure AI Search (Basic+) | 261 – 4 028+ $ / mois |
282282
| Déploiement Azure OpenAI Provisionné (PTU) | 1 460+ $ / PTU / mois |
283283
| Endpoint Vertex AI Online Prediction (GPU) | 449 – 23 000+ $ / mois |
284284
| Instance Vertex AI Workbench (GPU) | 449 – 8 000+ $ / mois |
@@ -528,7 +528,7 @@ Oui. CleanCloud n'a besoin d'accès réseau qu'aux endpoints API de votre cloud
528528

529529
## Ce que CleanCloud détecte
530530

531-
45 règles pour AWS, Azure et GCP — conservatrices, haut signal, conçues pour éviter les faux positifs en environnements IaC.
531+
46 règles pour AWS, Azure et GCP — conservatrices, haut signal, conçues pour éviter les faux positifs en environnements IaC.
532532

533533
**AWS :**
534534
- Compute : instances arrêtées 30+ jours (charges EBS continuent)
@@ -545,7 +545,7 @@ Oui. CleanCloud n'a besoin d'accès réseau qu'aux endpoints API de votre cloud
545545
- Réseau : adresses IP publiques inutilisées, Load Balancers vides (HIGH), App Gateways vides (HIGH), VNet Gateways inactives
546546
- Plateforme : App Service Plans vides (HIGH), bases de données SQL inactives (HIGH), App Services inactifs, Container Registries inutilisés
547547
- Gouvernance : ressources sans tags
548-
- IA/ML *(opt-in : `--category ai`)* : clusters de calcul AML avec capacité baseline non nulle et aucune activité depuis 14+ jours — clusters GPU flaggés risque HIGH ($600–$15K/mois) ; instances de calcul Azure ML Running sans activité depuis 14+ jours — instances GPU flaggées risque CRITICAL ($600–$15K+/mois) ; endpoints en ligne ML managés sans requête de scoring depuis 7+ jours — endpoints GPU flaggés HIGH/CRITICAL (200–2 600+$/mois) ; services AI Search (Standard+) sans requête depuis 30+ jours — facturés par SKU × réplicas × partitions (261–4 028+$/mois) ; déploiements Azure OpenAI provisionnés (PTUs) sans requête API depuis 7+ jours — facturés ~1 460 $/PTU/mois en on-demand quel que soit le trafic
548+
- IA/ML *(opt-in : `--category ai`)* : clusters de calcul AML avec capacité baseline non nulle et aucune activité depuis 14+ jours — clusters GPU flaggés risque HIGH ($600–$15K/mois) ; instances de calcul Azure ML Running sans activité depuis 14+ jours — instances GPU flaggées risque CRITICAL ($600–$15K+/mois) ; endpoints en ligne ML managés sans requête de scoring depuis 7+ jours — endpoints GPU flaggés HIGH/CRITICAL (200–2 600+$/mois) ; services AI Search (Basic+) sans requête depuis 90+ jours — facturés par SKU × réplicas × partitions (261–4 028+$/mois) ; déploiements Azure OpenAI provisionnés (PTUs) sans requête API depuis 7+ jours — facturés ~1 460 $/PTU/mois en on-demand quel que soit le trafic
549549

550550
**GCP :**
551551
- Compute : instances VM arrêtées 30+ jours (charges disque continuent) (HIGH)

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -193,7 +193,7 @@ No cloud account yet? `cleancloud demo` shows sample output without any credenti
193193
- **AI/ML waste detection across all 3 clouds:** idle SageMaker endpoints, notebook instances, Studio apps, and long-running training jobs; AML compute clusters and instances; Azure ML online endpoints and AI Search services; Vertex AI endpoints, Workbench instances, and training jobs. GPU-backed resources are highlighted as higher-risk review candidates. Native cost tools don't surface these — CleanCloud does. Opt-in via `--category ai`
194194
- **Policy-as-code governance:** `cleancloud.yaml` for per-rule config, exceptions with expiry dates, cost and confidence thresholds, tag-based exclusions — version-controlled alongside your infrastructure. Every exception is a git-reviewable approval.
195195
- **Governance enforcement (opt-in):** `--fail-on-confidence HIGH` or `--fail-on-cost 500` — enforce waste thresholds in CI/CD on a schedule, owned by platform or FinOps teams
196-
- **45 curated, high-signal detection rules:** orphaned volumes, idle databases, stopped instances, unused registries, and more — designed to avoid false positives in IaC environments, each with a deterministic cost estimate
196+
- **46 curated, high-signal detection rules:** orphaned volumes, idle databases, stopped instances, unused registries, and more — designed to avoid false positives in IaC environments, each with a deterministic cost estimate
197197
- **Multi-account scanning (AWS):** scan entire AWS Organizations in one run — config file, inline IDs, or auto-discovery via `--org`
198198
- **Multi-subscription scanning (Azure):** scan all Azure subscriptions in parallel — auto-discovery via Management Group, per-subscription cost breakdown included
199199
- **Multi-project scanning (GCP):** scan all accessible GCP projects in parallel — auto-discovery via Application Default Credentials, per-project cost breakdown included
@@ -278,7 +278,7 @@ Idle AI/ML infrastructure is the fastest-growing source of invisible cloud spend
278278
| Azure AML compute cluster (GPU) | $600 – $15,000 / month |
279279
| Azure ML Compute Instance (GPU) | $600 – $15,000+ / month |
280280
| Azure ML Online Endpoint (GPU-backed) | $200 – $2,600+ / month |
281-
| Azure AI Search (Standard+) | $261 – $4,028+ / month |
281+
| Azure AI Search (Basic+) | $261 – $4,028+ / month |
282282
| Azure OpenAI Provisioned Deployment (PTU) | $1,460+ / PTU / month |
283283
| Vertex AI Online Prediction endpoint (GPU) | $449 – $23,000+ / month |
284284
| Vertex AI Workbench instance (GPU) | $449 – $8,000+ / month |
@@ -528,7 +528,7 @@ Yes. CleanCloud only needs network access to your cloud provider's API endpoints
528528

529529
## What CleanCloud Detects
530530

531-
45 rules across AWS, Azure, and GCP — conservative, high-signal, designed to avoid false positives in IaC environments.
531+
46 rules across AWS, Azure, and GCP — conservative, high-signal, designed to avoid false positives in IaC environments.
532532

533533
**AWS:**
534534
- Compute: stopped instances 30+ days (EBS charges continue)
@@ -545,7 +545,7 @@ Yes. CleanCloud only needs network access to your cloud provider's API endpoints
545545
- Network: unused public IPs, empty load balancers (HIGH), empty App Gateways (HIGH), idle VNet Gateways
546546
- Platform: empty App Service Plans (HIGH), idle SQL databases (HIGH), idle App Services, unused Container Registries
547547
- Governance: untagged resources
548-
- AI/ML *(opt-in: `--category ai`)*: idle AML compute clusters with non-zero baseline capacity and no workload activity 14+ days — GPU clusters flagged HIGH risk ($600–$15K/month); idle Compute Instances with no control-plane activity 14+ days — GPU instances CRITICAL risk ($600–$15K+/month); idle ML managed online endpoints with zero scoring requests 7+ days — GPU-backed endpoints flagged HIGH/CRITICAL ($200–$2,600+/month); idle AI Search services (Standard+) with zero queries 30+ days — billed per SKU × replicas × partitions ($261–$4,028+/month); idle Azure OpenAI provisioned deployments (PTUs) with zero API requests 7+ days — bills ~$1,460/PTU/month on-demand regardless of traffic
548+
- AI/ML *(opt-in: `--category ai`)*: idle AML compute clusters with non-zero baseline capacity and no workload activity 14+ days — GPU clusters flagged HIGH risk ($600–$15K/month); idle Compute Instances with no control-plane activity 14+ days — GPU instances CRITICAL risk ($600–$15K+/month); idle ML managed online endpoints with zero scoring requests 7+ days — GPU-backed endpoints flagged HIGH/CRITICAL ($200–$2,600+/month); idle AI Search services (Basic+) with zero queries 90+ days — billed per SKU × replicas × partitions ($261–$4,028+/month); idle Azure OpenAI provisioned deployments (PTUs) with zero API requests 7+ days — bills ~$1,460/PTU/month on-demand regardless of traffic
549549

550550
**GCP:**
551551
- Compute: stopped instances 30+ days (disk charges continue) (HIGH)

cleancloud/doctor/aws.py

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -230,6 +230,7 @@ def run_aws_doctor(profile: Optional[str], region: Optional[str] = None) -> None
230230
info("Permissions required (attach to your IAM role or user):")
231231
info(" ec2:DescribeVolumes")
232232
info(" ec2:DescribeSnapshots")
233+
info(" ec2:DescribeSnapshotAttribute")
233234
info(" ec2:DescribeRegions")
234235
info(" ec2:DescribeAddresses")
235236
info(" ec2:DescribeNetworkInterfaces")
@@ -239,6 +240,8 @@ def run_aws_doctor(profile: Optional[str], region: Optional[str] = None) -> None
239240
info(" ec2:DescribeSecurityGroups")
240241
info(" rds:DescribeDBInstances")
241242
info(" rds:DescribeDBSnapshots")
243+
info(" rds:DescribeDBSnapshotAttributes")
244+
info(" cloudtrail:LookupEvents")
242245
info(" elasticloadbalancing:DescribeLoadBalancers")
243246
info(" elasticloadbalancing:DescribeTargetGroups")
244247
info(" logs:DescribeLogGroups")
@@ -409,6 +412,22 @@ def run_aws_doctor(profile: Optional[str], region: Optional[str] = None) -> None
409412
permissions_failed.append(("ec2:DescribeSnapshots", str(e)))
410413
warn(f"ec2:DescribeSnapshots - {e}")
411414

415+
try:
416+
_snaps = ec2.describe_snapshots(OwnerIds=["self"], MaxResults=5).get("Snapshots", [])
417+
if _snaps:
418+
ec2.describe_snapshot_attribute(
419+
SnapshotId=_snaps[0]["SnapshotId"], Attribute="createVolumePermission"
420+
)
421+
permissions_tested.append("ec2:DescribeSnapshotAttribute")
422+
success("ec2:DescribeSnapshotAttribute")
423+
except Exception as e:
424+
if "AccessDenied" in str(e) or "not authorized" in str(e).lower():
425+
permissions_failed.append(("ec2:DescribeSnapshotAttribute", str(e)))
426+
warn(f"ec2:DescribeSnapshotAttribute - {e}")
427+
else:
428+
permissions_tested.append("ec2:DescribeSnapshotAttribute")
429+
success("ec2:DescribeSnapshotAttribute")
430+
412431
try:
413432
ec2.describe_regions()
414433
permissions_tested.append("ec2:DescribeRegions")
@@ -483,6 +502,24 @@ def run_aws_doctor(profile: Optional[str], region: Optional[str] = None) -> None
483502
permissions_failed.append(("rds:DescribeDBSnapshots", str(e)))
484503
warn(f"rds:DescribeDBSnapshots - {e}")
485504

505+
try:
506+
_rds_snaps = rds.describe_db_snapshots(MaxRecords=20, SnapshotType="manual").get(
507+
"DBSnapshots", []
508+
)
509+
if _rds_snaps:
510+
rds.describe_db_snapshot_attributes(
511+
DBSnapshotIdentifier=_rds_snaps[0]["DBSnapshotIdentifier"]
512+
)
513+
permissions_tested.append("rds:DescribeDBSnapshotAttributes")
514+
success("rds:DescribeDBSnapshotAttributes")
515+
except Exception as e:
516+
if "AccessDenied" in str(e) or "not authorized" in str(e).lower():
517+
permissions_failed.append(("rds:DescribeDBSnapshotAttributes", str(e)))
518+
warn(f"rds:DescribeDBSnapshotAttributes - {e}")
519+
else:
520+
permissions_tested.append("rds:DescribeDBSnapshotAttributes")
521+
success("rds:DescribeDBSnapshotAttributes")
522+
486523
# Test ELB permissions
487524
try:
488525
elbv2 = session.client("elbv2", region_name=region)
@@ -563,6 +600,24 @@ def run_aws_doctor(profile: Optional[str], region: Optional[str] = None) -> None
563600
permissions_failed.append(("s3:GetBucketTagging", str(e)))
564601
warn(f"s3:GetBucketTagging - {e}")
565602

603+
# Test CloudTrail permissions (aws.ec2.instance.stopped — stopped-duration probe)
604+
try:
605+
from datetime import datetime, timedelta
606+
from datetime import timezone as _tz
607+
608+
cloudtrail = session.client("cloudtrail", region_name=region)
609+
_now = datetime.now(_tz.utc)
610+
cloudtrail.lookup_events(
611+
StartTime=_now - timedelta(hours=1),
612+
EndTime=_now,
613+
MaxResults=1,
614+
)
615+
permissions_tested.append("cloudtrail:LookupEvents")
616+
success("cloudtrail:LookupEvents")
617+
except Exception as e:
618+
permissions_failed.append(("cloudtrail:LookupEvents", str(e)))
619+
warn(f"cloudtrail:LookupEvents - {e}")
620+
566621
except Exception:
567622
fail("CleanCloud cannot run safely with missing read-only permissions")
568623

0 commit comments

Comments
 (0)