Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions README.fr.md
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,7 @@ Pas encore de compte cloud ? `cleancloud demo` affiche un exemple de sortie sans
- **Détection du gaspillage IA/ML sur les 3 clouds :** endpoints, notebooks, Studio apps et training jobs SageMaker ; clusters AML Compute et instances ML ; endpoints en ligne Azure ML et services Azure AI Search ; endpoints, instances Workbench et training jobs Vertex AI. Les ressources GPU sont mises en avant comme candidats de revue à risque plus élevé. Les outils natifs n'indiquent pas toujours quoi examiner — CleanCloud le fait. Opt-in via `--category ai`
- **Gouvernance policy-as-code :** `cleancloud.yaml` pour la configuration par règle, les exceptions avec dates d'expiration, les seuils de coût et de confiance, les exclusions par tag — versionné aux côtés de votre infrastructure. Chaque exception est une approbation auditée dans git.
- **Application de politique (opt-in) :** `--fail-on-confidence HIGH` ou `--fail-on-cost 500` — appliquer des seuils de gaspillage en CI/CD sur un planning, géré par les équipes platform ou FinOps
- **45 règles de détection sélectives et haut signal :** volumes orphelins, bases de données inactives, instances arrêtées, registres inutilisés, et plus — conçues pour éviter les faux positifs en environnements IaC, chacune avec une estimation de coût déterministe
- **46 règles de détection sélectives et haut signal :** volumes orphelins, bases de données inactives, instances arrêtées, registres inutilisés, et plus — conçues pour éviter les faux positifs en environnements IaC, chacune avec une estimation de coût déterministe
- **Scan multi-comptes (AWS) :** scannez des AWS Organizations entières en une exécution — fichier de config, IDs inline, ou auto-découverte via `--org`
- **Scan multi-abonnements (Azure) :** scannez tous les abonnements Azure en parallèle — auto-découverte via Management Group, détail des coûts par abonnement inclus
- **Scan multi-projets (GCP) :** scannez tous les projets GCP accessibles en parallèle — auto-découverte via Application Default Credentials, détail des coûts par projet inclus
Expand Down Expand Up @@ -278,7 +278,7 @@ L'infrastructure IA/ML inactive est la source de gaspillage cloud invisible à l
| Cluster AML Compute Azure (GPU) | 600 – 15 000 $ / mois |
| Instance de calcul Azure ML (GPU) | 600 – 15 000+ $ / mois |
| Endpoint en ligne Azure ML (GPU) | 200 – 2 600+ $ / mois |
| Azure AI Search (Standard+) | 261 – 4 028+ $ / mois |
| Azure AI Search (Basic+) | 261 – 4 028+ $ / mois |
| Déploiement Azure OpenAI Provisionné (PTU) | 1 460+ $ / PTU / mois |
| Endpoint Vertex AI Online Prediction (GPU) | 449 – 23 000+ $ / mois |
| Instance Vertex AI Workbench (GPU) | 449 – 8 000+ $ / mois |
Expand Down Expand Up @@ -528,7 +528,7 @@ Oui. CleanCloud n'a besoin d'accès réseau qu'aux endpoints API de votre cloud

## Ce que CleanCloud détecte

45 règles pour AWS, Azure et GCP — conservatrices, haut signal, conçues pour éviter les faux positifs en environnements IaC.
46 règles pour AWS, Azure et GCP — conservatrices, haut signal, conçues pour éviter les faux positifs en environnements IaC.

**AWS :**
- Compute : instances arrêtées 30+ jours (charges EBS continuent)
Expand All @@ -545,7 +545,7 @@ Oui. CleanCloud n'a besoin d'accès réseau qu'aux endpoints API de votre cloud
- Réseau : adresses IP publiques inutilisées, Load Balancers vides (HIGH), App Gateways vides (HIGH), VNet Gateways inactives
- Plateforme : App Service Plans vides (HIGH), bases de données SQL inactives (HIGH), App Services inactifs, Container Registries inutilisés
- Gouvernance : ressources sans tags
- IA/ML *(opt-in : `--category ai`)* : clusters de calcul AML avec capacité baseline non nulle et aucune activité depuis 14+ jours — clusters GPU flaggés risque HIGH ($600–$15K/mois) ; instances de calcul Azure ML Running sans activité depuis 14+ jours — instances GPU flaggées risque CRITICAL ($600–$15K+/mois) ; endpoints en ligne ML managés sans requête de scoring depuis 7+ jours — endpoints GPU flaggés HIGH/CRITICAL (200–2 600+$/mois) ; services AI Search (Standard+) sans requête depuis 30+ jours — facturés par SKU × réplicas × partitions (261–4 028+$/mois) ; déploiements Azure OpenAI provisionnés (PTUs) sans requête API depuis 7+ jours — facturés ~1 460 $/PTU/mois en on-demand quel que soit le trafic
- IA/ML *(opt-in : `--category ai`)* : clusters de calcul AML avec capacité baseline non nulle et aucune activité depuis 14+ jours — clusters GPU flaggés risque HIGH ($600–$15K/mois) ; instances de calcul Azure ML Running sans activité depuis 14+ jours — instances GPU flaggées risque CRITICAL ($600–$15K+/mois) ; endpoints en ligne ML managés sans requête de scoring depuis 7+ jours — endpoints GPU flaggés HIGH/CRITICAL (200–2 600+$/mois) ; services AI Search (Basic+) sans requête depuis 90+ jours — facturés par SKU × réplicas × partitions (261–4 028+$/mois) ; déploiements Azure OpenAI provisionnés (PTUs) sans requête API depuis 7+ jours — facturés ~1 460 $/PTU/mois en on-demand quel que soit le trafic

**GCP :**
- Compute : instances VM arrêtées 30+ jours (charges disque continuent) (HIGH)
Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,7 @@ No cloud account yet? `cleancloud demo` shows sample output without any credenti
- **AI/ML waste detection across all 3 clouds:** idle SageMaker endpoints, notebook instances, Studio apps, and long-running training jobs; AML compute clusters and instances; Azure ML online endpoints and AI Search services; Vertex AI endpoints, Workbench instances, and training jobs. GPU-backed resources are highlighted as higher-risk review candidates. Native cost tools don't surface these — CleanCloud does. Opt-in via `--category ai`
- **Policy-as-code governance:** `cleancloud.yaml` for per-rule config, exceptions with expiry dates, cost and confidence thresholds, tag-based exclusions — version-controlled alongside your infrastructure. Every exception is a git-reviewable approval.
- **Governance enforcement (opt-in):** `--fail-on-confidence HIGH` or `--fail-on-cost 500` — enforce waste thresholds in CI/CD on a schedule, owned by platform or FinOps teams
- **45 curated, high-signal detection rules:** orphaned volumes, idle databases, stopped instances, unused registries, and more — designed to avoid false positives in IaC environments, each with a deterministic cost estimate
- **46 curated, high-signal detection rules:** orphaned volumes, idle databases, stopped instances, unused registries, and more — designed to avoid false positives in IaC environments, each with a deterministic cost estimate
- **Multi-account scanning (AWS):** scan entire AWS Organizations in one run — config file, inline IDs, or auto-discovery via `--org`
- **Multi-subscription scanning (Azure):** scan all Azure subscriptions in parallel — auto-discovery via Management Group, per-subscription cost breakdown included
- **Multi-project scanning (GCP):** scan all accessible GCP projects in parallel — auto-discovery via Application Default Credentials, per-project cost breakdown included
Expand Down Expand Up @@ -278,7 +278,7 @@ Idle AI/ML infrastructure is the fastest-growing source of invisible cloud spend
| Azure AML compute cluster (GPU) | $600 – $15,000 / month |
| Azure ML Compute Instance (GPU) | $600 – $15,000+ / month |
| Azure ML Online Endpoint (GPU-backed) | $200 – $2,600+ / month |
| Azure AI Search (Standard+) | $261 – $4,028+ / month |
| Azure AI Search (Basic+) | $261 – $4,028+ / month |
| Azure OpenAI Provisioned Deployment (PTU) | $1,460+ / PTU / month |
| Vertex AI Online Prediction endpoint (GPU) | $449 – $23,000+ / month |
| Vertex AI Workbench instance (GPU) | $449 – $8,000+ / month |
Expand Down Expand Up @@ -528,7 +528,7 @@ Yes. CleanCloud only needs network access to your cloud provider's API endpoints

## What CleanCloud Detects

45 rules across AWS, Azure, and GCP — conservative, high-signal, designed to avoid false positives in IaC environments.
46 rules across AWS, Azure, and GCP — conservative, high-signal, designed to avoid false positives in IaC environments.

**AWS:**
- Compute: stopped instances 30+ days (EBS charges continue)
Expand All @@ -545,7 +545,7 @@ Yes. CleanCloud only needs network access to your cloud provider's API endpoints
- Network: unused public IPs, empty load balancers (HIGH), empty App Gateways (HIGH), idle VNet Gateways
- Platform: empty App Service Plans (HIGH), idle SQL databases (HIGH), idle App Services, unused Container Registries
- Governance: untagged resources
- AI/ML *(opt-in: `--category ai`)*: idle AML compute clusters with non-zero baseline capacity and no workload activity 14+ days — GPU clusters flagged HIGH risk ($600–$15K/month); idle Compute Instances with no control-plane activity 14+ days — GPU instances CRITICAL risk ($600–$15K+/month); idle ML managed online endpoints with zero scoring requests 7+ days — GPU-backed endpoints flagged HIGH/CRITICAL ($200–$2,600+/month); idle AI Search services (Standard+) with zero queries 30+ days — billed per SKU × replicas × partitions ($261–$4,028+/month); idle Azure OpenAI provisioned deployments (PTUs) with zero API requests 7+ days — bills ~$1,460/PTU/month on-demand regardless of traffic
- AI/ML *(opt-in: `--category ai`)*: idle AML compute clusters with non-zero baseline capacity and no workload activity 14+ days — GPU clusters flagged HIGH risk ($600–$15K/month); idle Compute Instances with no control-plane activity 14+ days — GPU instances CRITICAL risk ($600–$15K+/month); idle ML managed online endpoints with zero scoring requests 7+ days — GPU-backed endpoints flagged HIGH/CRITICAL ($200–$2,600+/month); idle AI Search services (Basic+) with zero queries 90+ days — billed per SKU × replicas × partitions ($261–$4,028+/month); idle Azure OpenAI provisioned deployments (PTUs) with zero API requests 7+ days — bills ~$1,460/PTU/month on-demand regardless of traffic

**GCP:**
- Compute: stopped instances 30+ days (disk charges continue) (HIGH)
Expand Down
55 changes: 55 additions & 0 deletions cleancloud/doctor/aws.py
Original file line number Diff line number Diff line change
Expand Up @@ -230,6 +230,7 @@ def run_aws_doctor(profile: Optional[str], region: Optional[str] = None) -> None
info("Permissions required (attach to your IAM role or user):")
info(" ec2:DescribeVolumes")
info(" ec2:DescribeSnapshots")
info(" ec2:DescribeSnapshotAttribute")
info(" ec2:DescribeRegions")
info(" ec2:DescribeAddresses")
info(" ec2:DescribeNetworkInterfaces")
Expand All @@ -239,6 +240,8 @@ def run_aws_doctor(profile: Optional[str], region: Optional[str] = None) -> None
info(" ec2:DescribeSecurityGroups")
info(" rds:DescribeDBInstances")
info(" rds:DescribeDBSnapshots")
info(" rds:DescribeDBSnapshotAttributes")
info(" cloudtrail:LookupEvents")
info(" elasticloadbalancing:DescribeLoadBalancers")
info(" elasticloadbalancing:DescribeTargetGroups")
info(" logs:DescribeLogGroups")
Expand Down Expand Up @@ -409,6 +412,22 @@ def run_aws_doctor(profile: Optional[str], region: Optional[str] = None) -> None
permissions_failed.append(("ec2:DescribeSnapshots", str(e)))
warn(f"ec2:DescribeSnapshots - {e}")

try:
_snaps = ec2.describe_snapshots(OwnerIds=["self"], MaxResults=5).get("Snapshots", [])
if _snaps:
ec2.describe_snapshot_attribute(
SnapshotId=_snaps[0]["SnapshotId"], Attribute="createVolumePermission"
)
permissions_tested.append("ec2:DescribeSnapshotAttribute")
success("ec2:DescribeSnapshotAttribute")
except Exception as e:
if "AccessDenied" in str(e) or "not authorized" in str(e).lower():
permissions_failed.append(("ec2:DescribeSnapshotAttribute", str(e)))
warn(f"ec2:DescribeSnapshotAttribute - {e}")
else:
permissions_tested.append("ec2:DescribeSnapshotAttribute")
success("ec2:DescribeSnapshotAttribute")

try:
ec2.describe_regions()
permissions_tested.append("ec2:DescribeRegions")
Expand Down Expand Up @@ -483,6 +502,24 @@ def run_aws_doctor(profile: Optional[str], region: Optional[str] = None) -> None
permissions_failed.append(("rds:DescribeDBSnapshots", str(e)))
warn(f"rds:DescribeDBSnapshots - {e}")

try:
_rds_snaps = rds.describe_db_snapshots(MaxRecords=20, SnapshotType="manual").get(
"DBSnapshots", []
)
if _rds_snaps:
rds.describe_db_snapshot_attributes(
DBSnapshotIdentifier=_rds_snaps[0]["DBSnapshotIdentifier"]
)
permissions_tested.append("rds:DescribeDBSnapshotAttributes")
success("rds:DescribeDBSnapshotAttributes")
except Exception as e:
if "AccessDenied" in str(e) or "not authorized" in str(e).lower():
permissions_failed.append(("rds:DescribeDBSnapshotAttributes", str(e)))
warn(f"rds:DescribeDBSnapshotAttributes - {e}")
else:
permissions_tested.append("rds:DescribeDBSnapshotAttributes")
success("rds:DescribeDBSnapshotAttributes")

# Test ELB permissions
try:
elbv2 = session.client("elbv2", region_name=region)
Expand Down Expand Up @@ -563,6 +600,24 @@ def run_aws_doctor(profile: Optional[str], region: Optional[str] = None) -> None
permissions_failed.append(("s3:GetBucketTagging", str(e)))
warn(f"s3:GetBucketTagging - {e}")

# Test CloudTrail permissions (aws.ec2.instance.stopped — stopped-duration probe)
try:
from datetime import datetime, timedelta
from datetime import timezone as _tz

cloudtrail = session.client("cloudtrail", region_name=region)
_now = datetime.now(_tz.utc)
cloudtrail.lookup_events(
StartTime=_now - timedelta(hours=1),
EndTime=_now,
MaxResults=1,
)
permissions_tested.append("cloudtrail:LookupEvents")
success("cloudtrail:LookupEvents")
except Exception as e:
permissions_failed.append(("cloudtrail:LookupEvents", str(e)))
warn(f"cloudtrail:LookupEvents - {e}")

except Exception:
fail("CleanCloud cannot run safely with missing read-only permissions")

Expand Down
Loading
Loading