|
| 1 | +# Database Seeding Guide |
| 2 | + |
| 3 | +This guide explains how to populate the ControlPlane database with default data (locations, tenants, policies, etc.). |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +The SDK provides seed functions to initialize the database with commonly needed default data. All seed functions are **idempotent** - they safely check for existing data before creating new records, so they can be called multiple times without side effects. |
| 8 | + |
| 9 | +## Available Seeds |
| 10 | + |
| 11 | +### 1. **Locations** (`seed_locations`) |
| 12 | +- **Purpose**: Populate available cloud regions and extended locations (edge zones) |
| 13 | +- **Includes**: 18 standard Azure regions + 6 extended locations (CDN edge zones) |
| 14 | +- **Source**: `DEFAULT_LOCATIONS` constant from `itl_controlplane_sdk.core.models.base.constants` |
| 15 | +- **Tables**: `locations`, `extended_locations` |
| 16 | + |
| 17 | +### 2. **Default Tenant** (`seed_default_tenant`) |
| 18 | +- **Purpose**: Create the default "ITL" tenant for resource scoping |
| 19 | +- **ID**: `ITL` (from `DEFAULT_TENANT`) |
| 20 | +- **Table**: `tenants` |
| 21 | +- **Required before**: Other seeds that need tenant_id |
| 22 | + |
| 23 | +### 3. **Management Groups** (`seed_default_management_groups`) |
| 24 | +- **Purpose**: Create standard management group hierarchy |
| 25 | +- **Includes**: |
| 26 | + - Root |
| 27 | + - Infrastructure |
| 28 | + - Workloads |
| 29 | + - Platform |
| 30 | +- **Table**: `management_groups` |
| 31 | +- **Requires**: Default tenant to exist |
| 32 | + |
| 33 | +### 4. **Policies** (`seed_default_policies`) |
| 34 | +- **Purpose**: Create foundational governance policies |
| 35 | +- **Includes**: |
| 36 | + - Enforce Encryption at Rest |
| 37 | + - Require RBAC |
| 38 | + - Enforce Resource Tagging |
| 39 | + - Audit Logging |
| 40 | +- **Table**: `policies` |
| 41 | +- **Requires**: Default tenant to exist |
| 42 | + |
| 43 | +## Usage |
| 44 | + |
| 45 | +### From Python Code |
| 46 | + |
| 47 | +```python |
| 48 | +from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession |
| 49 | +from sqlalchemy.orm import sessionmaker |
| 50 | +from itl_controlplane_sdk.core.services.seed import SeedService |
| 51 | + |
| 52 | +# Create async database session |
| 53 | +database_url = "postgresql+asyncpg://user:pass@localhost/dbname" |
| 54 | +engine = create_async_engine(database_url) |
| 55 | +async_session = sessionmaker(engine, class_=AsyncSession, expire_on_commit=False) |
| 56 | + |
| 57 | +async with async_session() as session: |
| 58 | + # Seed all data |
| 59 | + results = await SeedService.seed_all(session) |
| 60 | + |
| 61 | + # Or seed individual components |
| 62 | + await SeedService.seed_default_tenant(session) |
| 63 | + await SeedService.seed_locations(session) |
| 64 | + await SeedService.seed_default_management_groups(session) |
| 65 | + await SeedService.seed_default_policies(session) |
| 66 | +``` |
| 67 | + |
| 68 | +### From CLI |
| 69 | + |
| 70 | +```bash |
| 71 | +# Seed all data at once |
| 72 | +python -m itl_controlplane_sdk.cli.seed all |
| 73 | + |
| 74 | +# Seed individual components |
| 75 | +python -m itl_controlplane_sdk.cli.seed tenants |
| 76 | +python -m itl_controlplane_sdk.cli.seed locations |
| 77 | +python -m itl_controlplane_sdk.cli.seed management-groups |
| 78 | +python -m itl_controlplane_sdk.cli.seed policies |
| 79 | + |
| 80 | +# With custom options |
| 81 | +python -m itl_controlplane_sdk.cli.seed \ |
| 82 | + --database-url postgresql+asyncpg://user:pass@localhost/dbname \ |
| 83 | + --tenant-id MyTenant \ |
| 84 | + all |
| 85 | +``` |
| 86 | + |
| 87 | +### Environment Variables |
| 88 | + |
| 89 | +- `DATABASE_URL`: PostgreSQL connection string |
| 90 | + - Default: `postgresql+asyncpg://controlplane:controlplane@localhost:5432/controlplane` |
| 91 | +- Tenant ID: Hard-coded to `ITL` (DEFAULT_TENANT_ID) |
| 92 | + |
| 93 | +### Return Values |
| 94 | + |
| 95 | +All seed functions return a dictionary with operation status: |
| 96 | + |
| 97 | +```python |
| 98 | +{ |
| 99 | + "created": 18, # Number of new records created |
| 100 | + "skipped": 6, # Number of existing records (not duplicated) |
| 101 | + "total": 24 # Total records processed |
| 102 | +} |
| 103 | +``` |
| 104 | + |
| 105 | +## Execution Order |
| 106 | + |
| 107 | +Seed functions should be called in this order (dependencies): |
| 108 | + |
| 109 | +1. **Tenant** → Creates default tenant (other seeds depend on this) |
| 110 | +2. **Locations** → Independent, can run anytime |
| 111 | +3. **Management Groups** → Depends on tenant |
| 112 | +4. **Policies** → Depends on tenant |
| 113 | + |
| 114 | +The `seed_all()` function handles this ordering automatically. |
| 115 | + |
| 116 | +## Integration Examples |
| 117 | + |
| 118 | +### In Provider Initialization |
| 119 | + |
| 120 | +```python |
| 121 | +# In Core Provider startup |
| 122 | +from itl_controlplane_sdk.core.services.seed import SeedService |
| 123 | + |
| 124 | +async def startup(): |
| 125 | + # Initialize database first |
| 126 | + await alembic_upgrade() |
| 127 | + |
| 128 | + # Then seed with defaults |
| 129 | + async with get_session() as session: |
| 130 | + results = await SeedService.seed_all(session) |
| 131 | + logger.info(f"Database seeded: {results}") |
| 132 | + |
| 133 | + # Then start provider services |
| 134 | + await start_services() |
| 135 | +``` |
| 136 | + |
| 137 | +### In Migration Hooks |
| 138 | + |
| 139 | +```python |
| 140 | +# In alembic env.py after migrations |
| 141 | +def run_migrations_online() -> None: |
| 142 | + # ... existing migration code ... |
| 143 | + |
| 144 | + with connectable.begin() as connection: |
| 145 | + context.configure(connection=connection, target_metadata=target_metadata) |
| 146 | + |
| 147 | + with context.begin_transaction(): |
| 148 | + context.run_migrations() |
| 149 | + |
| 150 | + # Seed defaults after migrations |
| 151 | + if environment == "development": |
| 152 | + asyncio.run(seed_defaults(connection)) |
| 153 | +``` |
| 154 | + |
| 155 | +### In Docker Entrypoint |
| 156 | + |
| 157 | +```bash |
| 158 | +#!/bin/bash |
| 159 | +set -e |
| 160 | + |
| 161 | +# Run migrations |
| 162 | +alembic upgrade head |
| 163 | + |
| 164 | +# Seed initial data |
| 165 | +python -m itl_controlplane_sdk.cli.seed all |
| 166 | + |
| 167 | +# Start services |
| 168 | +python -m provider.main |
| 169 | +``` |
| 170 | + |
| 171 | +## Data Consistency |
| 172 | + |
| 173 | +All seed functions maintain data consistency: |
| 174 | + |
| 175 | +- **Idempotency**: Check record existence before insert |
| 176 | +- **Foreign Keys**: Respect all FK constraints |
| 177 | +- **Transactions**: Commit atomically or rollback on error |
| 178 | +- **Timestamps**: Set created_at and updated_at |
| 179 | + |
| 180 | +## Example Output |
| 181 | + |
| 182 | +``` |
| 183 | +INFO - Starting database seed process... |
| 184 | +INFO - ✓ Created default tenant: ITL |
| 185 | +INFO - ✓ Seeded 24 locations (0 already existed) |
| 186 | +INFO - ✓ Seeded 4 management groups (0 already existed) |
| 187 | +INFO - ✓ Seeded 4 policies (0 already existed) |
| 188 | +INFO - ✓ Database seeding completed successfully |
| 189 | +
|
| 190 | +Database seed results: |
| 191 | + tenant: {'created': 1, 'skipped': 0} |
| 192 | + locations: {'created': 24, 'skipped': 0, 'total': 24} |
| 193 | + management_groups: {'created': 4, 'skipped': 0} |
| 194 | + policies: {'created': 4, 'skipped': 0} |
| 195 | +``` |
| 196 | + |
| 197 | +## Troubleshooting |
| 198 | + |
| 199 | +### Connection Issues |
| 200 | + |
| 201 | +``` |
| 202 | +sqlalchemy.exc.OperationalError: (asyncpg.exceptions.CannotConnectNowError) |
| 203 | +``` |
| 204 | + |
| 205 | +**Solution**: Ensure PostgreSQL is running and DATABASE_URL is correct: |
| 206 | + |
| 207 | +```bash |
| 208 | +# Test connection |
| 209 | +python -c "import asyncpg; asyncio.run(asyncpg.connect('postgresql://...'))" |
| 210 | +``` |
| 211 | + |
| 212 | +### Foreign Key Violations |
| 213 | + |
| 214 | +``` |
| 215 | +sqlalchemy.exc.IntegrityError: (asyncpg.exceptions.IntegrityConstraintViolationError) |
| 216 | +FOREIGN KEY violation |
| 217 | +``` |
| 218 | + |
| 219 | +**Solution**: Ensure tenant exists before seeding other data. Use `seed_all()` which handles ordering. |
| 220 | + |
| 221 | +### Already Exists Errors |
| 222 | + |
| 223 | +Seed functions check for existing records and skip them. No error should occur on second run. |
| 224 | + |
| 225 | +## Summary |
| 226 | + |
| 227 | +| Seed Function | Records | Depends On | Purpose | |
| 228 | +|---|---|---|---| |
| 229 | +| `seed_default_tenant` | 1 tenant | - | Create default "ITL" tenant | |
| 230 | +| `seed_locations` | 24 locations | - | Populate available regions/zones | |
| 231 | +| `seed_management_groups` | 4 groups | Tenant | Create default MG hierarchy | |
| 232 | +| `seed_default_policies` | 4 policies | Tenant | Create baseline governance policies | |
| 233 | + |
| 234 | +Use `seed_all()` to run all seeds in correct order with proper dependency handling. |
0 commit comments