cleancloud-io
diff --git a/‎README.fr.md‎
Lines changed: 5 additions & 4 deletions b/‎README.fr.md‎
Lines changed: 5 additions & 4 deletions
diff --git a/‎README.md‎
Lines changed: 5 additions & 4 deletions b/‎README.md‎
Lines changed: 5 additions & 4 deletions
diff --git a/‎cleancloud/demo/command.py‎
Lines changed: 15 additions & 5 deletions b/‎cleancloud/demo/command.py‎
Lines changed: 15 additions & 5 deletions
diff --git a/‎cleancloud/demo/findings.py‎
Lines changed: 58 additions & 0 deletions b/‎cleancloud/demo/findings.py‎
Lines changed: 58 additions & 0 deletions
diff --git a/‎cleancloud/doctor/azure.py‎
Lines changed: 120 additions & 0 deletions b/‎cleancloud/doctor/azure.py‎
Lines changed: 120 additions & 0 deletions
diff --git a/‎cleancloud/doctor/runner.py‎
Lines changed: 8 additions & 2 deletions b/‎cleancloud/doctor/runner.py‎
Lines changed: 8 additions & 2 deletions
diff --git a/‎cleancloud/output/summary.py‎
Lines changed: 2 additions & 1 deletion b/‎cleancloud/output/summary.py‎
Lines changed: 2 additions & 1 deletion
@@ -30,7 +30,7 @@ C'est CleanCloud. Scannez vos environnements AWS, Azure et GCP, obtenez des find
 | Hygiène multi-comptes / multi-abonnements / multi-projets | ❌ | ✅ | ✅ |
 | Application planifiée et CI/CD (codes de sortie) | ❌ | ❌ | ✅ |
 
-- **31 règles de détection sélectives et haut signal :** volumes orphelins, bases de données inactives, instances arrêtées, registres inutilisés, et plus — conçues pour éviter les faux positifs en environnements IaC, chacune avec une estimation de coût déterministe. Les règles IA/ML (SageMaker) sont opt-in via `--category ai`
+- **32 règles de détection sélectives et haut signal :** volumes orphelins, bases de données inactives, instances arrêtées, registres inutilisés, et plus — conçues pour éviter les faux positifs en environnements IaC, chacune avec une estimation de coût déterministe. Les règles IA/ML (SageMaker, Azure ML) sont opt-in via `--category ai`
 - **Gouvernance et application de politique (opt-in) :** `--fail-on-confidence HIGH` ou `--fail-on-cost 100` — appliquer des seuils de gaspillage sur un planning, géré par les équipes platform ou FinOps
 - **Scan multi-comptes (AWS) :** scannez des AWS Organizations entières en une exécution — fichier de config, IDs inline, ou auto-découverte via `--org`
 - **Scan multi-abonnements (Azure) :** scannez tous les abonnements Azure en parallèle — auto-découverte via Management Group, détail des coûts par abonnement inclus
@@ -171,7 +171,7 @@ Pas encore de compte cloud ? `cleancloud demo` affiche un exemple de sortie sans
 | Flag | Fonction |
 |---|---|
 | `--provider aws\|azure\|gcp` | Fournisseur cloud à scanner *(obligatoire)* |
-| `--category hygiene\|ai\|all` | Catégorie de règles : `hygiene` (défaut), `ai` (SageMaker, AWS uniquement) ou `all` (hygiene + IA) |
+| `--category hygiene\|ai\|all` | Catégorie de règles : `hygiene` (défaut), `ai` (SageMaker sur AWS, AML Compute sur Azure) ou `all` (hygiene + IA) |
 | `--region REGION` | Scanner une seule région |
 | `--all-regions` | Toutes les régions actives — AWS/Azure uniquement |
 | **AWS multi-comptes** | |
@@ -328,7 +328,7 @@ Pour des exemples de sortie complets incluant `doctor`, JSON, CSV et markdown :
 
 ## Ce que CleanCloud détecte
 
-30 règles pour AWS, Azure et GCP — conservatives, haut signal, conçues pour éviter les faux positifs en environnements IaC.
+32 règles pour AWS, Azure et GCP — conservatives, haut signal, conçues pour éviter les faux positifs en environnements IaC.
 
 **AWS :**
 - Compute : instances arrêtées 30+ jours (charges EBS continuent)
@@ -345,6 +345,7 @@ Pour des exemples de sortie complets incluant `doctor`, JSON, CSV et markdown :
 - Réseau : adresses IP publiques inutilisées, Load Balancers vides (HIGH), App Gateways vides (HIGH), VNet Gateways inactives
 - Plateforme : App Service Plans vides (HIGH), bases de données SQL inactives (HIGH), App Services inactifs, Container Registries inutilisés
 - Gouvernance : ressources sans tags
+- IA/ML *(opt-in : `--category ai`)* : clusters de calcul AML avec capacité baseline non nulle et aucune activité depuis 14+ jours — clusters GPU flaggés risque HIGH ($600–$15K/mois)
 
 **GCP :**
 - Compute : instances VM arrêtées 30+ jours (charges disque continuent) (HIGH)
@@ -583,7 +584,7 @@ Guide complet : [Configuration GCP →](docs/gcp.md)
 
 **Policy-as-code** — `cleancloud.yaml` avec packs de règles, exceptions par équipe, et seuils de coût en config — la principale demande de gouvernance FinOps pour 2025/2026
 
-**Plus de règles IA/ML** — clusters de calcul Azure ML inactifs, endpoints Vertex AI inactifs, instances de notebook SageMaker inutilisées, artefacts d'entraînement orphelins
+**Plus de règles IA/ML** — endpoints Vertex AI inactifs, instances de notebook SageMaker inutilisées, artefacts d'entraînement orphelins
 
 **Plus de règles AWS** — lacunes de cycle de vie S3, Redshift inactif, fuite de coût NAT Gateway (services internes routant via NAT au lieu de VPC endpoints — S3, DynamoDB, ECR, SSM), VPC endpoints inutilisés
 
 
@@ -30,7 +30,7 @@ That's CleanCloud. Scan your AWS, Azure, and GCP environments, get specific acti
 | Multi-account / multi-subscription / multi-project | ❌ | ✅ | ✅ |
 | CI/CD and scheduled enforcement (exit codes) | ❌ | ❌ | ✅ |
 
-- **31 curated, high-signal detection rules:** orphaned volumes, idle databases, stopped instances, unused registries, and more — designed to avoid false positives in IaC environments, each with a deterministic cost estimate. AI/ML rules (SageMaker) are opt-in via `--category ai`
+- **32 curated, high-signal detection rules:** orphaned volumes, idle databases, stopped instances, unused registries, and more — designed to avoid false positives in IaC environments, each with a deterministic cost estimate. AI/ML rules (SageMaker, Azure ML) are opt-in via `--category ai`
 - **Governance enforcement (opt-in):** `--fail-on-confidence HIGH` or `--fail-on-cost 100` — enforce waste thresholds on a schedule, owned by platform or FinOps teams
 - **Multi-account scanning (AWS):** scan entire AWS Organizations in one run — config file, inline IDs, or auto-discovery via `--org`
 - **Multi-subscription scanning (Azure):** scan all Azure subscriptions in parallel — auto-discovery via Management Group, per-subscription cost breakdown included
@@ -217,7 +217,7 @@ Run:
 | Flag | What it does |
 |---|---|
 | `--provider aws\|azure\|gcp` | Cloud provider to scan *(required)* |
-| `--category hygiene\|ai\|all` | Rule category: `hygiene` (default), `ai` (SageMaker, AWS-only), or `all` (hygiene + AI) |
+| `--category hygiene\|ai\|all` | Rule category: `hygiene` (default), `ai` (SageMaker on AWS, AML Compute on Azure), or `all` (hygiene + AI) |
 | `--region REGION` | Scan a single region |
 | `--all-regions` | Scan all active regions — AWS/Azure only |
 | **AWS multi-account** | |
@@ -338,7 +338,7 @@ For full output examples including `doctor`, JSON, CSV, and markdown: [`docs/exa
 
 ## What CleanCloud Detects
 
-31 rules across AWS, Azure, and GCP — conservative, high-signal, designed to avoid false positives in IaC environments.
+32 rules across AWS, Azure, and GCP — conservative, high-signal, designed to avoid false positives in IaC environments.
 
 **AWS:**
 - Compute: stopped instances 30+ days (EBS charges continue)
@@ -355,6 +355,7 @@ For full output examples including `doctor`, JSON, CSV, and markdown: [`docs/exa
 - Network: unused public IPs, empty load balancers (HIGH), empty App Gateways (HIGH), idle VNet Gateways
 - Platform: empty App Service Plans (HIGH), idle SQL databases (HIGH), idle App Services, unused Container Registries
 - Governance: untagged resources
+- AI/ML *(opt-in: `--category ai`)*: idle AML compute clusters with non-zero baseline capacity and no workload activity 14+ days — GPU clusters flagged HIGH risk ($600–$15K/month)
 
 **GCP:**
 - Compute: stopped instances 30+ days (disk charges continue) (HIGH)
@@ -593,7 +594,7 @@ Full setup guide: [GCP setup →](docs/gcp.md)
 
 **Policy-as-code** — `cleancloud.yaml` with rule packs, per-team exceptions, and cost thresholds in config — the top FinOps governance ask for 2025/2026
 
-**More AI/ML waste rules** — Azure ML compute clusters idle, Vertex AI endpoints idle, SageMaker notebook instances running unused, orphaned training artifacts
+**More AI/ML waste rules** — Vertex AI endpoints idle, SageMaker notebook instances running unused, orphaned training artifacts
 
 **More AWS rules** — S3 lifecycle gaps, Redshift idle, NAT Gateway cost leakage (internal services routing through NAT instead of VPC endpoints — S3, DynamoDB, ECR, SSM), unused VPC endpoints
 
 
@@ -7,6 +7,7 @@
     ALL_FINDINGS,
     AWS_AI_FINDINGS,
     AWS_FINDINGS,
+    AZURE_AI_FINDINGS,
     AZURE_FINDINGS,
     GCP_FINDINGS,
 )
@@ -25,7 +26,7 @@
     "--category",
     type=click.Choice(["hygiene", "ai"]),
     default="hygiene",
-    help="Rule category to demo: hygiene (default) or ai (SageMaker)",
+    help="Rule category to demo: hygiene (default) or ai (SageMaker on AWS, AML Compute on Azure)",
 )
 def demo(provider: Optional[str], category: str):
     """Show realistic sample findings without cloud credentials."""
@@ -36,9 +37,18 @@ def demo(provider: Optional[str], category: str):
     click.echo("=" * 60)
 
     if category == "ai":
-        findings = AWS_AI_FINDINGS
-        regions = ["us-east-1"]
-        region_mode = "explicit"
+        if provider == "aws":
+            findings = AWS_AI_FINDINGS
+            regions = ["us-east-1"]
+            region_mode = "explicit"
+        elif provider == "azure":
+            findings = AZURE_AI_FINDINGS
+            regions = ["East US"]
+            region_mode = "all"
+        else:
+            findings = AWS_AI_FINDINGS + AZURE_AI_FINDINGS
+            regions = ["us-east-1", "East US"]
+            region_mode = "all"
     elif provider == "aws":
         findings = AWS_FINDINGS
         regions = ["us-east-1", "us-west-2", "eu-west-1"]
@@ -60,7 +70,7 @@ def demo(provider: Optional[str], category: str):
 
     summary = build_summary(findings)
     summary["scanned_at"] = datetime.now(timezone.utc).isoformat()
-    summary["provider"] = provider or ("aws" if category == "ai" else "mixed")
+    summary["provider"] = provider or "mixed"
     summary["regions_scanned"] = regions
 
     _print_summary(summary, region_selection_mode=region_mode)
 
@@ -615,6 +615,64 @@
 
 ALL_FINDINGS: List[Finding] = AWS_FINDINGS + AZURE_FINDINGS + GCP_FINDINGS
 
+AZURE_AI_FINDINGS: List[Finding] = [
+    Finding(
+        provider="azure",
+        rule_id="azure.aml.compute.idle",
+        resource_type="azure.aml.compute",
+        resource_id=(
+            "/subscriptions/29d91ee0-922f-483a-a81f-1a5eff4ecfa2"
+            "/resourceGroups/rg-ml-platform"
+            "/providers/Microsoft.MachineLearningServices"
+            "/workspaces/ml-platform-prod"
+            "/computes/gpu-train-cluster"
+        ),
+        region="East US",
+        title="Idle Azure ML Compute Cluster (Baseline Capacity Waste for 21 Days)",
+        summary=(
+            "AML compute cluster 'gpu-train-cluster' in workspace 'ml-platform-prod' "
+            "is configured to keep 2 node(s) always running (min_node_count=2) but no "
+            "workload activity was observed for 21 days — baseline capacity waste."
+        ),
+        reason="AML compute cluster has min_node_count=2 with no workload activity for 21 days",
+        risk=RiskLevel.HIGH,
+        confidence=ConfidenceLevel.HIGH,
+        detected_at=_NOW,
+        details={
+            "cluster_name": "gpu-train-cluster",
+            "workspace_name": "ml-platform-prod",
+            "resource_group": "rg-ml-platform",
+            "vm_size": "Standard_NC6s_v3",
+            "min_node_count": 2,
+            "is_gpu": True,
+            "age_days": 21,
+            "idle_window_days": 21,
+            "idle_days_threshold": 14,
+            "estimated_monthly_cost": "~$4,406/month",
+            "cost_estimate_type": "mapped",
+        },
+        evidence=Evidence(
+            signals_used=[
+                "Cluster configured with non-zero baseline capacity but no workload observed for 21 days (Azure Monitor: Active Nodes)",
+                "Baseline cost driver: min_node_count=2 (always-on compute — billed continuously)",
+                "Compute type: AmlCompute",
+                "Cluster age: 21 days",
+                "VM size: Standard_NC6s_v3",
+                "GPU cluster with no workload — high-cost idle state",
+            ],
+            signals_not_checked=[
+                "Scheduled or periodic training jobs",
+                "Jobs submitted outside the observation window",
+                "Planned future usage",
+                "Cluster configured with min_node_count for warm-start latency",
+                "Cluster reserved for interactive development",
+            ],
+            time_window="21 days",
+        ),
+        estimated_monthly_cost_usd=4406.0,
+    ),
+]
+
 AWS_AI_FINDINGS: List[Finding] = [
     Finding(
         provider="aws",
 
@@ -301,6 +301,9 @@ def run_azure_doctor() -> None:
     info("    - Microsoft.Insights/metrics/read")
     info("    - Microsoft.Resources/subscriptions/read")
     info("    - Microsoft.Resources/resources/read")
+    info("  AI/ML rules (opt-in via --category ai):")
+    info("    - Microsoft.MachineLearningServices/workspaces/read")
+    info("    - Microsoft.MachineLearningServices/workspaces/computes/read")
 
     # Summary
     info("")
@@ -317,5 +320,122 @@ def run_azure_doctor() -> None:
 
     info("")
     success("AZURE ENVIRONMENT READY FOR CLEANCLOUD")
+    info("")
+    info("Tip: To also validate AI/ML permissions (Azure ML rules), run:")
+    info("  cleancloud doctor --provider azure --category ai")
+    info("=" * 70)
+    info("")
+
+
+def run_azure_ai_doctor(subscription_id: str = None) -> None:
+    """Validate Azure permissions for --category ai (Azure ML compute rules)."""
+    info("")
+    info("=" * 70)
+    info("AZURE AI/ML PERMISSION VALIDATION")
+    info("=" * 70)
+    info("")
+    info("Validating permissions for: cleancloud scan --provider azure --category ai")
+    info("")
+
+    try:
+        from azure.mgmt.machinelearningservices import AzureMachineLearningWorkspaces
+
+        credential = DefaultAzureCredential()
+    except Exception as e:
+        fail(f"Azure authentication failed — configure credentials and re-run doctor: {e}")
+        return
+
+    # Resolve a subscription to test against
+    try:
+        sub_client = SubscriptionClient(credential)
+        subscriptions = list(sub_client.subscriptions.list())
+        if not subscriptions:
+            fail("No accessible Azure subscriptions found")
+            return
+        test_sub = subscription_id or subscriptions[0].subscription_id
+        success(f"Using subscription: {test_sub}")
+    except Exception as e:
+        fail(f"Failed to list subscriptions: {e}")
+        return
+
+    info("")
+    info("Permission Checks")
+    info("-" * 70)
+
+    permissions_tested = []
+    permissions_failed = []
+
+    # Check: Microsoft.MachineLearningServices/workspaces/read
+    try:
+        ml_client = AzureMachineLearningWorkspaces(
+            credential=credential,
+            subscription_id=test_sub,
+        )
+        workspaces = list(ml_client.workspaces.list_by_subscription())
+        permissions_tested.append("Microsoft.MachineLearningServices/workspaces/read")
+        success(
+            f"Microsoft.MachineLearningServices/workspaces/read "
+            f"({len(workspaces)} workspace(s) found)"
+        )
+    except Exception as e:
+        permissions_failed.append(("Microsoft.MachineLearningServices/workspaces/read", str(e)))
+        warn(f"Microsoft.MachineLearningServices/workspaces/read — {e}")
+        workspaces = []
+
+    # Check: Microsoft.MachineLearningServices/workspaces/computes/read
+    if workspaces:
+        try:
+            ws = workspaces[0]
+            rg = ws.id.split("/")[ws.id.lower().split("/").index("resourcegroups") + 1]
+            list(ml_client.compute.list(rg, ws.name))
+            permissions_tested.append("Microsoft.MachineLearningServices/workspaces/computes/read")
+            success("Microsoft.MachineLearningServices/workspaces/computes/read")
+        except Exception as e:
+            permissions_failed.append(
+                ("Microsoft.MachineLearningServices/workspaces/computes/read", str(e))
+            )
+            warn(f"Microsoft.MachineLearningServices/workspaces/computes/read — {e}")
+    else:
+        info(
+            "  Skipping computes/read check — no workspaces found to test against "
+            "(permission may still be present)"
+        )
+
+    # Check: Microsoft.Insights/metrics/read (already required by hygiene rules)
+    try:
+        from azure.mgmt.monitor import MonitorManagementClient
+
+        monitor = MonitorManagementClient(credential=credential, subscription_id=test_sub)
+        # A lightweight call — list metric definitions for a subscription-level scope
+        monitor.metric_definitions.list(
+            f"/subscriptions/{test_sub}",
+        )
+        permissions_tested.append("Microsoft.Insights/metrics/read")
+        success("Microsoft.Insights/metrics/read")
+    except Exception as e:
+        permissions_failed.append(("Microsoft.Insights/metrics/read", str(e)))
+        warn(f"Microsoft.Insights/metrics/read — {e}")
+
+    info("")
+    info("=" * 70)
+    total = len(permissions_tested) + len(permissions_failed)
+    info(f"Permissions: {len(permissions_tested)}/{total} passed")
+
+    if permissions_failed:
+        info("")
+        for perm, _ in permissions_failed:
+            warn(f"  missing: {perm}")
+        info("")
+        info("Assign the AI role to your service principal:")
+        info("  az role definition create --role-definition security/azure/ai-readonly-role.json")
+        info('  az role assignment create --assignee <APP_ID> --role "CleanCloudAIReadOnly" \\')
+        info("    --scope /subscriptions/<SUBSCRIPTION_ID>")
+        info("Then re-run: cleancloud doctor --provider azure --category ai")
+        info("")
+        warn("AZURE AI/ML PERMISSIONS INCOMPLETE")
+    else:
+        info("")
+        success("AZURE AI/ML PERMISSIONS READY")
+        info("Run: cleancloud scan --provider azure --category ai")
     info("=" * 70)
     info("")
@@ -2,7 +2,7 @@
 from typing import Optional
 
 from cleancloud.doctor.aws import run_aws_ai_doctor, run_aws_doctor
-from cleancloud.doctor.azure import run_azure_doctor
+from cleancloud.doctor.azure import run_azure_ai_doctor, run_azure_doctor
 from cleancloud.doctor.common import DoctorError, info, success
 from cleancloud.doctor.gcp import run_gcp_doctor
 
@@ -66,7 +66,13 @@ def run_doctor(
                     info("   The --region parameter is only used for AWS provider")
                     info("")
 
-                run_azure_doctor()
+                if category == "ai":
+                    run_azure_ai_doctor()
+                elif category == "all":
+                    run_azure_doctor()
+                    run_azure_ai_doctor()
+                else:
+                    run_azure_doctor()
                 results[p] = {"status": "passed", "error": None}
 
             elif p == "gcp":
 
@@ -143,6 +143,7 @@ def _print_summary(summary: dict, region_selection_mode: str = None, multi_accou
             missing = skipped.get("missing_permissions", "")
             # Strip verbose prefix if present
             missing = missing.replace("Missing required IAM permissions: ", "")
+            missing = missing.replace("Missing required permissions: ", "")
             click.echo(f"  - {rule_name}")
             if missing:
                 click.echo(f"      needs: {missing}")
@@ -162,7 +163,7 @@ def _print_summary(summary: dict, region_selection_mode: str = None, multi_accou
             )
         if has_azure:
             click.echo(
-                "  Azure: https://github.com/cleancloud-io/cleancloud/blob/main/security/azure-readonly-role.json"
+                "  Azure: https://github.com/cleancloud-io/cleancloud/tree/main/security/azure/"
             )
             click.echo(
                 "  Run 'cleancloud doctor --provider azure' to validate permissions after updating."