|
| 1 | +--- |
| 2 | +name: terraform-gcp-skill |
| 3 | +description: "Architect, provision, and troubleshoot production-grade Google Cloud infrastructure using Terraform and OpenTofu. Use to design landing zones (Shared VPCs, Folders), deploy core services (GKE, Cloud Run, Cloud SQL), implement IAM least-privilege, and manage GCS-backed state. Enforces Google’s Cloud Foundation Fabric patterns and rigorous validation protocols to ensure secure, idempotent, and scalable deployments across environments." |
| 4 | +version: "1.0.0" |
| 5 | +--- |
| 6 | + |
| 7 | +# Terraform GCP Skill |
| 8 | + |
| 9 | +This skill provides expert-level guidance for architecting, deploying, and managing GCP infrastructure using Terraform. |
| 10 | + |
| 11 | +## 🎯 When to Use This Skill |
| 12 | +- Designing GCP Landing Zones (Projects, Folders, Shared VPCs). |
| 13 | +- Provisioning Google Cloud resources (GKE, Cloud Run, Cloud SQL, Spanner). |
| 14 | +- Setting up remote state management via **GCS Backend**. |
| 15 | +- Implementing IAM least-privilege via Terraform. |
| 16 | +- Troubleshooting `terraform plan` discrepancies in GCP. |
| 17 | + |
| 18 | +## 🏗️ Core Architecture: GCS Backend & State |
| 19 | +**Always** use a GCS backend for state. Local state is strictly for local prototyping and must never be committed. |
| 20 | + |
| 21 | +### Standard Backend Configuration |
| 22 | +```hcl |
| 23 | +terraform { |
| 24 | + backend "gcs" { |
| 25 | + bucket = "tf-state-${var.project_id}" |
| 26 | + prefix = "terraform/state/${var.environment}" |
| 27 | + } |
| 28 | +} |
| 29 | +``` |
| 30 | + |
| 31 | +Note: The GCS bucket must have Object Versioning enabled to allow recovery from accidental state corruption or overlapping writes. |
| 32 | + |
| 33 | +## 🛠️ Execution Protocol (Safety First) |
| 34 | +The Agent must follow this lifecycle for every infrastructure change to ensure idempotency and prevent production outages: |
| 35 | + |
| 36 | +1. Initialize (`terraform init`): |
| 37 | + |
| 38 | + - Verify backend connectivity. |
| 39 | + |
| 40 | + - Ensure provider plugins are downloaded and versions are locked in `.terraform.lock.hcl`. |
| 41 | + |
| 42 | +2. Validate (`terraform validate` & `tflint`): |
| 43 | + |
| 44 | + - Check for internal HCL consistency. |
| 45 | + |
| 46 | + - Run `tflint` with the Google plugin to catch cloud-specific deprecated fields or non-optimal configurations. |
| 47 | + |
| 48 | +3. Plan (`terraform plan -out=tfplan`): |
| 49 | + |
| 50 | + - Generate a speculative execution plan. |
| 51 | + |
| 52 | + - Mandatory Step: The Agent must summarize the plan for the user, specifically highlighting any Destroy actions. |
| 53 | + |
| 54 | +4. Apply (`terraform apply tfplan`): |
| 55 | + |
| 56 | + - Strict Constraint: Only execute the application after the user provides explicit manual confirmation. |
| 57 | + |
| 58 | +## 🔍 Live Provider Documentation Lookup |
| 59 | +To prevent hallucinations regarding new fields or deprecated attributes, the Agent must verify the grounded truth of the provider schema. |
| 60 | + |
| 61 | +Schema Inspection via CLI |
| 62 | +The primary command for retrieving the machine-readable schema of the currently initialized providers is: |
| 63 | + |
| 64 | +```bash |
| 65 | +$ terraform providers schema -json |
| 66 | +``` |
| 67 | + |
| 68 | + |
| 69 | +## 🛡️ GCP-Specific Standards |
| 70 | +1. Resource Naming & "Main" Pattern |
| 71 | +To maintain a clean module interface, use the main identifier for singleton resources (the primary resource the module is named after). Use underscores for identifiers and hyphens for names. |
| 72 | + |
| 73 | + ```hcl |
| 74 | + resource "google_compute_network" "main" { |
| 75 | + name = "${var.prefix}-vpc" |
| 76 | + auto_create_subnetworks = false |
| 77 | + } |
| 78 | + ``` |
| 79 | + |
| 80 | +2. IAM Management |
| 81 | + - Avoid google_project_iam_policy: This resource is authoritative and replaces the entire IAM policy for the project. It is the #1 cause of accidental lockouts. |
| 82 | + - Avoid `google_project_iam_policy`: This resource is authoritative for the entire project and is a common cause of accidental lockouts. |
| 83 | + |
| 84 | + - Prefer `google_project_iam_member` or `google_project_iam_binding`: |
| 85 | + - `google_project_iam_member` is additive and safely grants a role to a single member. |
| 86 | + - `google_project_iam_binding` is authoritative for a single role. It's useful for managing all members of a role, but be aware it overwrites existing members for that role. |
| 87 | + - Shared VPC: Always distinguish between Host projects (where the network lives) and Service projects (where resources consume the network). |
| 88 | + - Private Google Access: Subnets should always have private_ip_google_access = true. |
| 89 | + - Workload Identity: Prefer GKE Workload Identity over static Service Account JSON keys. |
| 90 | + |
| 91 | +## 📂 Directory Structure |
| 92 | +Follow this standard to ensure compatibility with Antigravity (AGY) discovery and Google best practices: |
| 93 | + |
| 94 | +``` |
| 95 | +. |
| 96 | +├── main.tf # Entry point / Resource definitions |
| 97 | +├── variables.tf # Typed variables with units and descriptions |
| 98 | +├── outputs.tf # Resource ID outputs (no direct input pass-through) |
| 99 | +├── versions.tf # Provider version pinning |
| 100 | +├── network.tf # (Optional) Grouped networking resources |
| 101 | +├── examples/ # Example usage for modules |
| 102 | +├── files/ # Static files (startup scripts, etc.) |
| 103 | +├── templates/ # .tftpl templates |
| 104 | +├── scripts/ # Scripts called by Terraform |
| 105 | +└── helpers/ # Scripts NOT called by Terraform |
| 106 | +``` |
| 107 | +## ⚠️ Anti-Patterns (Do NOT do these) |
| 108 | + - ❌ Hardcoded IDs: Never hardcode Project IDs. Use variables or data "google_project" sources. |
| 109 | + |
| 110 | + - ❌ Service Account Keys: Never generate or store .json keys. Use Workload Identity Federation or the default metadata server. |
| 111 | + |
| 112 | + - ❌ Hardcoded IDs: Never hardcode Project IDs. Use variables or a `data "google_project"` source. |
| 113 | + |
| 114 | + - ❌ Broad Scopes: Avoid cloud-platform scopes for GKE nodes; use fine-grained IAM roles instead. |
| 115 | + |
| 116 | +## 🧪 Testing Strategy |
| 117 | + - Static Analysis: Use checkov, trivy, or terrascan to catch insecure GCP configurations (e.g., public GCS buckets). |
| 118 | + |
| 119 | + - Integration Testing: Use terraform test (v1.6+) to assert that GCP labels and network tags are correctly applied to resources before full deployment. |
0 commit comments