|
| 1 | +# Tenant Lifecycle Automation |
| 2 | + |
| 3 | +Goal: automate tenant provisioning, activation, and health verification so new tenants are production-ready with minimal manual steps while preserving multi-tenant safety, auditing, and observability. |
| 4 | + |
| 5 | +## Scope (In) |
| 6 | +- Create/activate tenant triggers background provisioning workflow. |
| 7 | +- Per-tenant database creation (or schema), migrations, and seed data. |
| 8 | +- Default identity bootstrap (admin user/roles/permissions) tied to tenant. |
| 9 | +- Health verification and status reporting. |
| 10 | +- Idempotent, retryable orchestration with audit and telemetry. |
| 11 | +- Admin endpoints/UX to view workflow state and retry/re-run steps. |
| 12 | + |
| 13 | +## Non-Goals (Out) |
| 14 | +- Full-feature feature-flag platform. |
| 15 | +- Billing/usage metering. |
| 16 | +- Cross-cloud infrastructure automation (K8s, DNS, CDN). |
| 17 | + |
| 18 | +## Personas |
| 19 | +- Platform Admin: initiates tenant creation, monitors status, retries failed steps. |
| 20 | +- Tenant Admin: receives bootstrap credentials, validates app access post-provision. |
| 21 | +- SRE/DevOps: monitors health, investigates failed jobs, tunes resilience. |
| 22 | + |
| 23 | +## High-Level Flow |
| 24 | +1) Admin issues `CreateTenant` (or activates an existing tenant). |
| 25 | +2) System enqueues a provisioning job (Hangfire) keyed by TenantId + correlation. |
| 26 | +3) Workflow steps (all idempotent): |
| 27 | + - Validate tenant metadata (provider, connection string template, validity). |
| 28 | + - Create tenant database/schema (or ensure exists) using provider-specific strategy. |
| 29 | + - Apply EF Core migrations for each enabled module (Multitenancy, Identity, Auditing, etc.). |
| 30 | + - Seed baseline data (roles, permissions, admin user with reset token, root tenant data if applicable). |
| 31 | + - Warm caches if enabled (e.g., permissions). |
| 32 | + - Emit audit + telemetry events for each step. |
| 33 | +4) Mark tenant as `Active` when all steps succeed; surface status via API. |
| 34 | +5) On failure: capture error, mark status `Failed`, allow retry/resume from failed step. |
| 35 | + |
| 36 | +## Functional Requirements |
| 37 | +- Provisioning job: |
| 38 | + - Runs as Hangfire background job; supports manual trigger and automatic trigger on create/activate. |
| 39 | + - Stores per-step status, timestamps, and error messages (persisted per tenant). |
| 40 | + - Uses correlation/trace IDs; logs to OpenTelemetry. |
| 41 | + - Supports cancellation and exponential backoff retries. |
| 42 | +- Database orchestration: |
| 43 | + - Provider-aware strategies (PostgreSQL initial target; hooks for SQL Server). |
| 44 | + - Option to create database if missing; else validate connectivity. |
| 45 | + - Runs module migrations in deterministic order; stops on first failure. |
| 46 | +- Seeding: |
| 47 | + - Seeds Identity admin user, default roles/permissions, and tenant metadata. |
| 48 | + - Issues one-time admin credential or password reset token for Tenant Admin. |
| 49 | + - Seeds demo data optionally (flag). |
| 50 | +- Status surface: |
| 51 | + - API to fetch provisioning status history per tenant. |
| 52 | + - Health check should include tenant provisioning status (ready/degraded/failed). |
| 53 | +- Safety & idempotency: |
| 54 | + - All steps re-runnable without corrupting state (check-before-create). |
| 55 | + - Guard against concurrent provisioning for same tenant. |
| 56 | + - Respect tenant validity/activation flags. |
| 57 | + |
| 58 | +## Operational/Observability Requirements |
| 59 | +- Emit structured logs with TenantId, correlationId, step name, duration, outcome. |
| 60 | +- Create OpenTelemetry spans for each step (db create, migrate, seed, cache warm). |
| 61 | +- Publish audit events for lifecycle changes (Requested, Started, StepFailed, Completed). |
| 62 | +- Expose metrics: provision_duration_seconds, provision_step_failures_total, active_tenants. |
| 63 | + |
| 64 | +## Security Requirements |
| 65 | +- No secrets in logs/audits; hash/scrub credentials. |
| 66 | +- Bootstrap credentials delivered via secure channel (email with reset token or out-of-band). |
| 67 | +- Enforce tenant isolation during provisioning (context scopes, connection string guards). |
| 68 | +- Authorization: only platform admins can trigger or retry provisioning. |
| 69 | + |
| 70 | +## Acceptance Criteria (Happy Path) |
| 71 | +- Creating a tenant triggers a job that: |
| 72 | + - Creates/validates DB, applies migrations for all enabled modules, seeds identity/admin, warms caches. |
| 73 | + - Marks tenant Active and Ready; status endpoint shows completed steps with durations. |
| 74 | +- Audit trail shows Requested -> Started -> Completed with TenantId and correlationId. |
| 75 | +- Metrics and traces include the provisioning spans and surface in health checks. |
| 76 | + |
| 77 | +## Failure/Recovery Criteria |
| 78 | +- If migrations fail, status is Failed with error details; job can be retried and resumes idempotently. |
| 79 | +- Double-submit provisioning for same tenant does not run concurrent workflows (dedupe/lock). |
| 80 | +- Partial seeds are safe to re-run (no duplicate roles/users; admin user upsert). |
| 81 | +- Health check reports degraded for tenants with failed provisioning; improves after successful retry. |
| 82 | + |
| 83 | +## Progress Update (Current State) |
| 84 | +- Provisioning workflow implemented with persisted status/steps and 202 responses on tenant creation; retry endpoint available. |
| 85 | +- Background provisioning via Hangfire, with inline fallback when Hangfire/storage is unavailable (dev-friendly). |
| 86 | +- Startup hosted services: tenant catalog migrate/seed (root tenant) and optional auto-provision enqueue. |
| 87 | +- Provider-aware TenantDbContextFactory to select PostgreSQL via appsettings. |
| 88 | +- Audit pipeline fixed to stamp tenant/user on events; audit sink writes per-tenant batches. |
0 commit comments