Skip to content

Commit 3a55fe4

Browse files
committed
docs: add full IaC migration guide to terraform/main.tf
Covers step-by-step migration order, resource-by-resource import vs create breakdown, risks (Cloud SQL replacement detection, secret version timing, VPC CIDR conflicts, IAM propagation lag, local state loss), and what stays intact vs what changes for the running application. https://claude.ai/code/session_01SRRzCWrpwgMpdYFurMVn7m
1 parent cdb3ddb commit 3a55fe4

1 file changed

Lines changed: 156 additions & 0 deletions

File tree

terraform/main.tf

Lines changed: 156 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,159 @@
1+
# =============================================================================
2+
# MIGRATION GUIDE: Moving from Manual / CI-managed Infrastructure to Full IaC
3+
# =============================================================================
4+
#
5+
# CURRENT STATE
6+
# -------------
7+
# The following resources exist in GCP but are managed manually or by the CI
8+
# pipeline (not by Terraform):
9+
# - Cloud SQL instance: querypal-db (manually created)
10+
# - Cloud SQL database: querypal (manually created)
11+
# - Cloud Run services: querypal-backend/frontend (deployed by CI)
12+
# - Secrets: stored in GitHub Secrets (not in Secret Manager)
13+
#
14+
# WHAT TERRAFORM NOW MANAGES
15+
# ---------------------------
16+
# This configuration manages the infrastructure layer only. The CI pipeline
17+
# (.github/workflows/google-cloudrun-docker.yml) continues to own image builds
18+
# and Cloud Run deployments.
19+
#
20+
# Resource | Terraform action | App impact
21+
# --------------------------|--------------------|---------------------------
22+
# VPC connector | CREATE (new) | None until next CI deploy
23+
# Secret Manager secrets | CREATE (new) | None until values added
24+
# Cloud Run service account | CREATE (new) | None until next CI deploy
25+
# IAM bindings | CREATE (new) | None
26+
# Cloud SQL instance | IMPORT (existing) | Zero — instance untouched
27+
# Cloud SQL database | IMPORT (existing) | Zero — database untouched
28+
#
29+
# =============================================================================
30+
# MIGRATION STEPS — run once, in this order
31+
# =============================================================================
32+
#
33+
# STEP 1 — Verify Terraform config matches your actual Cloud SQL instance.
34+
#
35+
# Before importing, check that database.tf reflects the real instance:
36+
# - tier (e.g. db-f1-micro)
37+
# - database_version (POSTGRES_15)
38+
# - region (europe-west1)
39+
# - backup settings, flags, IP config
40+
#
41+
# To inspect the current instance:
42+
# gcloud sql instances describe querypal-db --format=json
43+
#
44+
# If anything in database.tf does not match, fix it BEFORE importing.
45+
# After import, any mismatch will show as a planned change. Most settings
46+
# (tier, flags, backup) can be modified in-place with no downtime. Changing
47+
# database_version would require recreation — avoid it.
48+
#
49+
# STEP 2 — Import existing Cloud SQL resources into Terraform state.
50+
#
51+
# This registers the existing instance with Terraform without touching it.
52+
# No data is moved, no connections are interrupted, the application keeps
53+
# running throughout.
54+
#
55+
# cd terraform
56+
# cp terraform.tfvars.example terraform.tfvars
57+
# terraform init
58+
# ./import.sh
59+
#
60+
# After import, run `terraform plan`. The plan should show no changes (or
61+
# only safe in-place updates to settings you deliberately changed in the
62+
# config). If you see a resource scheduled for REPLACEMENT, stop and fix the
63+
# config — do not apply until the plan is clean.
64+
#
65+
# STEP 3 — Apply Terraform to create new resources.
66+
#
67+
# terraform apply
68+
#
69+
# This creates: VPC connector, Secret Manager secrets (empty), Cloud Run SA,
70+
# and all IAM bindings. Nothing here touches the running application.
71+
#
72+
# The VPC connector takes 2–5 minutes to provision. The apply will wait.
73+
#
74+
# STEP 4 — Populate Secret Manager with the actual secret values.
75+
#
76+
# The secrets created in Step 3 are empty shells. Cloud Run will refuse to
77+
# start if it tries to mount a secret with no versions. Populate them NOW,
78+
# before triggering a new CI deployment:
79+
#
80+
# for SECRET_ID in \
81+
# querypal-azure-tenant-id \
82+
# querypal-azure-client-id \
83+
# querypal-azure-client-secret \
84+
# querypal-gemini-api-key \
85+
# querypal-db-user \
86+
# querypal-db-pass; do
87+
# echo -n "Enter value for ${SECRET_ID}: "
88+
# read -rs VALUE
89+
# echo
90+
# echo -n "${VALUE}" | gcloud secrets versions add "${SECRET_ID}" --data-file=-
91+
# done
92+
#
93+
# Verify each secret has at least one version:
94+
# gcloud secrets versions list querypal-gemini-api-key
95+
#
96+
# STEP 5 — Trigger a CI deployment (push to production branch).
97+
#
98+
# The updated workflow now uses:
99+
# --set-secrets (reads from Secret Manager instead of env vars)
100+
# --vpc-connector (attaches both services to the VPC)
101+
# --ingress=internal (backend becomes unreachable from public internet)
102+
# --service-account (uses the new dedicated Cloud Run SA)
103+
#
104+
# The frontend will be publicly reachable as before. The backend will only
105+
# accept traffic that arrives through the VPC connector (from the frontend
106+
# nginx proxy). Direct requests to the backend Cloud Run URL from the internet
107+
# will receive a 403 from Google's frontend.
108+
#
109+
# Monitor the deployment:
110+
# gcloud run services describe querypal-backend --region=europe-west1
111+
# gcloud run services describe querypal-frontend --region=europe-west1
112+
#
113+
# STEP 6 — Clean up GitHub Secrets (optional but recommended).
114+
#
115+
# Once the application is verified working with Secret Manager, delete the
116+
# now-unused GitHub Secrets from the repository settings:
117+
# AZURE_TENANT_ID, AZURE_CLIENT_ID, AZURE_CLIENT_SECRET,
118+
# GEMINI_API_KEY, DB_USER, DB_PASS
119+
#
120+
# =============================================================================
121+
# RISKS AND THINGS TO WATCH FOR
122+
# =============================================================================
123+
#
124+
# Cloud SQL import divergence
125+
# If `terraform plan` after import shows resource replacement (destroy +
126+
# create) for the Cloud SQL instance, do NOT apply. Terraform cannot recreate
127+
# a Cloud SQL instance in-place — it would destroy the instance and all data.
128+
# deletion_protection = true in database.tf will block the destroy, but it is
129+
# safer to fix the config discrepancy first.
130+
#
131+
# Secret versions must exist before deployment
132+
# If Step 5 runs before Step 4 completes, Cloud Run will fail to start
133+
# because the secret mount has no versions. The previous revision stays live
134+
# (Cloud Run only switches traffic after the new revision is healthy), so the
135+
# application continues to work — but the deploy will time out. Fix: add the
136+
# missing secret version, then re-deploy.
137+
#
138+
# VPC connector CIDR must not overlap existing subnets
139+
# The connector reserves 10.8.0.0/28. If your VPC already has a subnet in
140+
# that range, change vpc_connector_cidr in variables.tf before applying.
141+
# Check existing ranges:
142+
# gcloud compute networks subnets list --filter="region:europe-west1"
143+
#
144+
# Cloud Run SA permissions propagate with eventual consistency
145+
# IAM bindings may take up to 60 seconds to take effect after `terraform
146+
# apply`. If the first CI deploy fails with a permission error immediately
147+
# after applying Terraform, wait a minute and retry.
148+
#
149+
# Terraform state is local by default
150+
# The state file (terraform.tfstate) is gitignored. If you lose it, you lose
151+
# the link between Terraform and the real GCP resources, and Terraform will
152+
# try to recreate everything. Enable the GCS backend (commented out below)
153+
# before running in a team or CI environment.
154+
#
155+
# =============================================================================
156+
1157
terraform {
2158
required_version = ">= 1.5"
3159

0 commit comments

Comments
 (0)