Error: 403 Request had insufficient authentication scopes.
[reason: "ACCESS_TOKEN_SCOPE_INSUFFICIENT"
domain: "googleapis.com"
metadata {
key: "service"
value: "generativelanguage.googleapis.com"
}
Your service account on Cloud Run is missing the required IAM role to access Vertex AI. When USE_VERTEX_AI=true, the backend needs the roles/aiplatform.user permission.
For Windows (PowerShell):
.\fix-vertex-ai-permissions.ps1 -ProjectId "legalmind-486106"For Linux/Mac (Bash):
chmod +x fix-vertex-ai-permissions.sh
./fix-vertex-ai-permissions.sh legalmind-486106If the scripts don't work, run these commands directly:
-
Enable required APIs:
gcloud services enable aiplatform.googleapis.com --project=legalmind-486106 gcloud services enable generativeai.googleapis.com --project=legalmind-486106
-
Grant Vertex AI User role:
gcloud projects add-iam-policy-binding legalmind-486106 \ --member="serviceAccount:legalmind-backend@legalmind-486106.iam.gserviceaccount.com" \ --role="roles/aiplatform.user"
-
Restart your Cloud Run service:
gcloud run services update legalmind-backend \ --region=us-central1 \ --update-env-vars="USE_VERTEX_AI=true"
- Backend goes offline after a period of inactivity
- App becomes inaccessible
- Cloud Run service shows "inactive"
- Cold starts: Cloud Run scales to zero when inactive
- Memory issues: Container using too much memory (crashes on startup)
- Startup timeout: Initialization taking too long
- Lifespan issues: Improper async handling in startup/shutdown
Update Dockerfile to use Python 3.11 slim (already done):
FROM python:3.11-slimThe backend uses async initialization that can timeout. Check backend/api/app_new.py:
Current settings in main_new.py:
port = int(os.environ.get("PORT", 8000))
uvicorn.run(
"api.app_new:app",
host="0.0.0.0",
port=port,
reload=False, # Never use reload=True in production!
log_level="warning",
)Cloud Run needs a health check. Your API should expose a /health endpoint:
The endpoint should be added to backend/api/endpoints_new.py:
@router.get("/health", tags=["Health"])
async def health():
"""Health check endpoint for Cloud Run."""
return {
"status": "ok",
"service": "legalmind-backend",
"timestamp": datetime.utcnow().isoformat()
}Add health check configuration:
gcloud run deploy legalmind-backend \
--image gcr.io/legalmind-486106/legalmind-backend:latest \
--platform managed \
--region us-central1 \
--allow-unauthenticated \
--http2 \
--timeout 60 \
--memory 1Gi \
--cpu 1 \
--min-instances 1 \
--health-startup-port 8000 \
--health-startup-initial-delay 120 \
--health-startup-timeout 30 \
--health-startup-failure-threshold 5Key parameters:
--min-instances 1: Keep at least 1 instance warm to prevent cold starts--memory 1Gi: Allocate enough memory--cpu 1: Allocate enough CPU--health-startup-*: Configure Startup Health Check
- Frontend gets connection refused
- API returns 503 or connection timeout
- Backend URL returns no response
-
Check service status:
gcloud run services describe legalmind-backend --region=us-central1
-
Check logs:
gcloud run services logs read legalmind-backend --region=us-central1 --limit=50 -
Test endpoint directly:
curl https://legalmind-backend-<id>.us-central1.run.app/health
-
Check CORS settings: Ensure
ALLOWED_ORIGINSincludes your frontend URL:gcloud run services update legalmind-backend \ --set-env-vars="ALLOWED_ORIGINS=https://legalmind-frontend-<id>.us-central1.run.app"
-
Increase timeout:
gcloud run deploy legalmind-backend --timeout 60
-
Increase memory:
gcloud run deploy legalmind-backend --memory 1Gi
-
Check CORS configuration in
backend/api/app_new.py:origins = settings.allowed_origins.split(",") app.add_middleware( CORSMiddleware, allow_origins=origins if settings.allowed_origins else ["*"], allow_credentials=True, allow_methods=["*"], allow_headers=["*"], )
- Service account has
roles/aiplatform.userrole - APIs enabled:
aiplatform.googleapis.com,generativeai.googleapis.com - Environment variable:
USE_VERTEX_AI=true - Environment variable:
GOOGLE_CLOUD_PROJECT=legalmind-486106 - Health check endpoint available at
/health -
--min-instances 1set on Cloud Run -
--memory 1Gior higher allocated -
--timeout 60or higher set - CORS configured for frontend URL
- Logs checked for startup errors
# Build locally
docker build -t gcr.io/legalmind-486106/legalmind-backend:latest .
# Push to registry
docker push gcr.io/legalmind-486106/legalmind-backend:latest
# Deploy to Cloud Run with proper config
gcloud run deploy legalmind-backend \
--image gcr.io/legalmind-486106/legalmind-backend:latest \
--platform managed \
--region us-central1 \
--allow-unauthenticated \
--memory 1Gi \
--cpu 1 \
--timeout 60 \
--min-instances 1 \
--set-env-vars "GOOGLE_CLOUD_PROJECT=legalmind-486106,USE_VERTEX_AI=true,DEBUG=false"gcloud run services logs read legalmind-backend --region us-central1 --followgcloud run services describe legalmind-backend --region=us-central1gcloud logging read "resource.type=cloud_run_revision AND resource.labels.service_name=legalmind-backend" \
--limit=50 \
--format=json- Check logs:
gcloud run services logs read legalmind-backend --limit=100 - Verify permissions:
gcloud projects get-iam-policy legalmind-486106 --filter="members:legalmind-backend@legalmind-486106.iam.gserviceaccount.com" - Test locally first:
python backend/main_new.py