diff --git a/gateway/spec/architecture/architecture.md b/gateway/spec/architecture/architecture.md
index 32195c092..a87318098 100644
--- a/gateway/spec/architecture/architecture.md
+++ b/gateway/spec/architecture/architecture.md
@@ -2,109 +2,562 @@
## Overview
-Envoy-based gateway system with Go xDS control plane for dynamic API configuration, policy enforcement, and traffic management. Supports both single-instance deployments with SQLite and scalable cloud deployments.
+The API Platform Gateway is an Envoy-based, AI-ready API gateway with a Go xDS control plane. Everything beyond basic routing — authentication, rate limiting, transformation, AI guardrails, MCP handling — is implemented as composable, versioned **policies**.
+
+Policies are not built into the runtime. They are compiled (Go) or installed (Python) into a Gateway Runtime image at build time by the **Gateway Builder**, and pushed to the runtime at deploy time over xDS. This means a Gateway Runtime image is always a self-contained, reproducible artifact: a fixed Envoy version, a fixed policy set, a fixed SDK version.
+
+### Control Plane vs Data Plane
+
+There are two layers in the wider API Platform. The Gateway as a whole is a **Data Plane** product — it terminates client traffic and forwards it to upstream services. The **Control Plane** is the WSO2 **Platform API**, which is a separate, optional central management surface that one or more independent Gateways can register with.
+
+Inside a single Gateway, the **Gateway Controller** acts as an internal control plane for its own **Gateway Runtime** instances (it pushes xDS to them) — but the Controller itself is part of the Data Plane deployment. A Gateway can run with or without a Platform API in front of it.
+
+## Top-Level Architecture
+
+```mermaid
+graph TB
+ subgraph CP["Control Plane (optional)"]
+ PlatformAPI["Platform API
central management for
1..N gateways"]
+ end
+
+ subgraph DP["Data Plane"]
+ subgraph GW1["Gateway A"]
+ C1["Gateway Controller
REST :9090
Envoy xDS :18000
Policy xDS :18001"]
+ subgraph RT1["Gateway Runtime"]
+ R1["Router (Envoy)
:8080 / :8443"]
+ PE1["Policy Engine (Go)"]
+ Py1["Python Executor
(if Python policies)"]
+ end
+ end
+
+ subgraph GW2["Gateway B (distinct gateway, not a replica)"]
+ C2["Gateway Controller"]
+ RT2["Gateway Runtime"]
+ end
+
+ Builder["Gateway Builder
(image build time only)"]
+ end
+
+ Backend[("Upstream
Backends / LLMs
MCP Servers")]
+ DB[("Shared PostgreSQL
rows scoped by gateway_id")]
+ Redis[(Redis
distributed rate-limit)]
+ Client["Client"]
+
+ C1 -. "REST: manifest + version
WebSocket: events + heartbeat" .-> PlatformAPI
+ C2 -. "REST + WebSocket" .-> PlatformAPI
+ PlatformAPI -. "WebSocket: deploy / undeploy" .-> C1
+ PlatformAPI -. "WebSocket" .-> C2
+
+ Builder -- "produces
controller + runtime images" --> GW1
+ Builder -- "produces images" --> GW2
+
+ C1 --- DB
+ C2 --- DB
+
+ C1 -- "Envoy xDS
(LDS/RDS/CDS/EDS/SDS)" --> R1
+ C1 -- "Policy xDS +
RouteConfig +
APIKey +
Subscription +
LazyResource" --> PE1
+ R1 <-- "ext_proc
over UDS" --> PE1
+ PE1 <-- "gRPC
over UDS" --> Py1
+
+ Client --> R1
+ R1 --> Backend
+ PE1 -. "rate-limit state
(optional)" .-> Redis
+
+ classDef cp fill:#e1f5ff,stroke:#01579b
+ classDef dp fill:#fff3e0,stroke:#e65100
+ classDef ext fill:#f3e5f5,stroke:#4a148c
+ class CP cp
+ class DP,GW1,GW2,RT1,RT2 dp
+ class Backend,DB,Redis,Client ext
+```
+
+A single Gateway is composed of two deployable units, released as a matched version pair — the controller's policy YAMLs and the runtime's compiled policy-engine binary come from the same builder run:
+
+| Unit | Contains |
+| ----------------------- | --------------------------------------------------------------------------------------- |
+| **Gateway Controller** | REST API, Envoy xDS server, Policy xDS server, policy definitions, persistence |
+| **Gateway Runtime** | Envoy + Policy Engine binary (with policies linked in) + Python Executor + Python deps |
+
+The **Gateway Builder** is a build-time tool that produces both images. End users do not run the builder unless they want to ship a custom policy set; the default WSO2-published images are pre-built. When custom images are needed, the **CLI** is the primary user-facing entry point — it wraps the builder in a Docker container, supplies the policy manifest, and produces both images locally.
+
+### Multi-Gateway Database Sharing
+
+Each Gateway is identified by a unique `gateway_id`. Multiple **distinct Gateways** (not just replicas of one Gateway) can point at the **same shared database** — every persistent row is scoped by `gateway_id`, so two gateways sharing a PostgreSQL instance see only their own APIs, subscriptions, API keys, and events. This is independent of the multi-replica EventHub sync described later (which is for replicas of the *same* gateway_id).
+
+```mermaid
+graph LR
+ subgraph DB["Shared PostgreSQL"]
+ T1["artifacts
{gateway_id, uuid, ...}"]
+ T2["api_keys
{gateway_id, ...}"]
+ T3["events
{gateway_id, event_id, ...}"]
+ end
+
+ GA["Gateway A
gateway_id=gw-a"]
+ GB["Gateway B
gateway_id=gw-b"]
+
+ GA <--> DB
+ GB <--> DB
+```
+
+---
## Components
-### Gateway-Controller (Port 9090 REST, 18000 xDS)
-- REST API server accepting YAML/JSON API configurations using Gin router.
-- Validation layer providing field-level error messages with structured reporting.
-- xDS v3 server implementing State-of-the-World protocol for Envoy configuration.
-- SQLite database for persistent storage (`./data/gateway.db`) with WAL mode.
-- In-memory cache for fast configuration access with thread-safe operations.
-
-### Router (Envoy Proxy, Port 8080)
-- Envoy Proxy 1.35.3 routing HTTP traffic to backend services.
-- Bootstrap configuration connecting to Gateway-Controller xDS server.
-- JSON-formatted access logs to stdout for observability.
-- Zero-downtime configuration updates via xDS protocol.
-
-### Policy Engine (Standard Tier)
-- Authentication policies: API Key, OAuth, JWT validation.
-- Authorization policies: RBAC, scope validation.
-- Traffic management policies: Header modification, request/response transformation.
-
-### Rate Limiter (Standard Tier)
-- Distributed rate limiting with Redis backend.
-- Quota management and throttling.
-- Spike arrest and burst protection.
-
-### Database
-- SQLite database file (`./data/gateway.db`).
-- Schema with `deployments` table storing configurations as JSON TEXT.
-- Composite unique constraint on `(name, version)`.
-- Indexes on frequently queried fields: `name+version`, `status`, `context`, `kind`.
-- Migration path to PostgreSQL/MySQL for cloud deployments.
-
-## Container Structure
+### 1. Gateway Controller
+
+The Gateway's internal control plane — it manages and pushes configuration to its own Gateway Runtime(s). A single Go binary that:
+
+- Serves the **Management REST API** for create/read/update/delete of all gateway resources.
+- Serves an **Admin/debug API** — config dump, health, xDS sync status.
+- Runs an **Envoy xDS server** implementing the State-of-the-World v3 protocol (LDS, RDS, CDS, EDS, SDS).
+- Runs a separate **Policy xDS server** that pushes policy chains, route configs, API keys, subscriptions, and lazy resources to the Policy Engine.
+- Persists all state in **SQLite** (default, WAL mode) or **PostgreSQL** (HA deployments).
+- Optionally connects to the **Platform API** (REST for the manifest/version push, WebSocket for the live event channel) for centralized multi-gateway management.
+
+#### Resource Kinds
+
+The controller manages a typed set of API and policy resources, each with its own validator:
+
+| Kind | Purpose |
+| --------------------- | ---------------------------------------------------------------------- |
+| `RestApi` | HTTP/REST API definition (operations, upstream, policies) |
+| `WebSubApi` | Event-driven WebSub API (Kafka-backed, async; served by event-gateway) |
+| `LlmProviderTemplate` | Reusable template for an LLM vendor (OpenAI, Anthropic, Bedrock, …) |
+| `LlmProvider` | A configured LLM provider instance |
+| `LlmProxy` | A multi-provider AI gateway endpoint with model routing & guardrails |
+| `Mcp` | Model Context Protocol proxy |
+| `Certificate` | Trusted-cert and listener-cert management for upstream/downstream TLS |
+| `SubscriptionPlan` | Quota/rate plan definition |
+| `Subscription` | Plan binding to an Application; carries billing IDs for analytics |
+| `ApiKey` | Per-API key issuance; stored and pushed to runtime as SHA-256 hashes only |
+| `Application` | Logical consumer that owns API keys and subscriptions (synced from Platform API) |
+| `Secret` | Secret storage with AES-GCM encryption at rest |
+| `Policy` | Installed policy definitions compiled into the runtime image (read-only) |
+
+Each resource has its **own database table** keyed by an immutable **UUID** primary identifier. All cross-references (subscription → plan, API key → API, events, analytics) carry the UUID, so renames and re-deployments stay valid. Resources additionally carry a URL-friendly `handle` and a human-readable `displayName`, both unique per gateway and kind.
+
+Resource versions for `RestApi`, `WebSubApi`, `Mcp`, `LlmProvider`, and `LlmProxy` use a `vMAJOR.MINOR` form. Patch versions are intentionally not exposed — a backend bug fix should never force consumers to migrate. Policies follow a different scheme: their patch versions are visible to operators so security and bug fixes can be pinned at deploy time.
+
+Operations on these kinds are converted at handler time to a kind-agnostic **`RuntimeDeployConfig`** before being snapshotted into xDS. This keeps the xDS translators and Policy Engine free of per-kind branching: a `RestApi`, an `LlmProvider`, and a `WebSubApi` all reach the runtime as the same intermediate shape.
+
+A REST/LLM API may declare a **main** upstream and an optional **sandbox** upstream, selected per request via header or path convention — both upstreams share the same policy chain. Resources may also carry arbitrary `metadata.labels` (string→string map) for analytics, routing, and operational metadata; labels are propagated into the runtime context and into emitted analytics events.
+
+#### Multi-Replica Sync (EventHub)
+
+A single Gateway can run multiple Controller replicas (same `gateway_id`, same DB) for HA. Replicas stay in sync through a DB-backed **event hub**: each mutation writes a row to an events table; every replica polls the table on a short interval and applies events to its in-memory caches and xDS snapshots. This avoids the need for a separate message broker.
+
+Events are also scoped by `gateway_id`, so a replica only consumes events for its own gateway — a different gateway sharing the same database is invisible to it.
+
+### 2. Gateway Runtime
+
+A single OCI image that bundles three processes managed by a shared entrypoint:
+
+```mermaid
+graph LR
+ subgraph Container["Gateway Runtime Container"]
+ Entry["entrypoint
(process manager)"]
+ Router["Envoy
:8080 / :8443 / :9901"]
+ PE["Policy Engine (Go)
ext_proc :9001"]
+ PyExec["Python Executor
(only if Python policies exist)"]
+ UDS1[/"policy-engine.sock"/]
+ UDS2[/"python-executor.sock"/]
+ end
+
+ Entry -- "starts in order: 1 (if Python policies)" --> PyExec
+ Entry -- "starts in order: 2" --> PE
+ Entry -- "starts in order: 3" --> Router
+
+ Router <-- "ext_proc gRPC" --> UDS1
+ UDS1 --- PE
+ PE <-- "Execute gRPC" --> UDS2
+ UDS2 --- PyExec
+```
+
+The entrypoint starts the **Python Executor** (only if any Python policies are present), waits for the **Policy Engine** to come up, then starts **Envoy**. If any one process exits, the entrypoint terminates the rest and the container restarts.
+
+#### Router (Envoy)
+
+A standard upstream Envoy build. The bootstrap is minimal — an admin listener, ADS pointing at the controller's xDS port, and a placeholder cluster. All listeners, routes, clusters, endpoints, and TLS secrets are pushed dynamically by the controller. The Router speaks `ext_proc` to the Policy Engine over a UDS for every request/response on configured routes.
+
+Body-processing mode is decided **per request, per chain**: the Policy Engine sends back a `mode_override` that puts Envoy in `SKIP` mode when no policy in the chain needs the body, and in `BUFFERED` mode only when one does. This keeps headers-only policies (auth, header rewrite, routing) on the fast path while still allowing body-aware policies (transformation, guardrails) to opt into buffering.
+
+#### Policy Engine (Go)
+
+The Policy Engine is the heart of the data plane. It:
+
+- Receives Envoy `ext_proc` streams on a UDS.
+- Maintains an in-memory map of **PolicyChains** keyed by route, kept in sync via xDS streams (`PolicyChainConfig`, `RouteConfig`, `APIKeyConfig`, `SubscriptionConfig`, `LazyResourceConfig`) from the controller.
+- For each request, looks up the chain (route key resolution is pluggable via `PolicyChainResolver`), builds an execution context, runs the **request** policies, then on the response path runs the **response** policies.
+- Translates per-policy `Action`s (header set/remove, immediate response, dynamic metadata, body replacement, host rewrite, …) into Envoy ext_proc responses.
+- Exposes Prometheus metrics on `:9003` and an admin/debug API on `:9002` (config dump with secret redaction, health).
+
+Policies are **compiled in** at image build time — the engine has zero built-in policies; the gateway-builder generates a `plugin_registry.go` that wires them into the binary. From the engine's runtime perspective, all policies (request-phase, response-phase, body-requiring or not) are uniform plugins implementing the SDK policy interfaces.
+
+#### Python Executor
+
+Optional gRPC sidecar process for Python policies. It is a Python 3 process that:
+
+- Listens on a Unix Domain Socket — or on TCP for local debugging.
+- Loads all installed Python policies from a builder-generated registry.
+- Serves `Execute` RPCs from the Go Policy Engine; the Go side translates each policy invocation into a gRPC request/response.
+- Uses a single event loop with bounded worker concurrency, configurable from the entrypoint.
+
+```mermaid
+sequenceDiagram
+ participant Envoy as Router (Envoy)
+ participant PE as Policy Engine (Go)
+ participant Py as Python Executor (Python)
+
+ Envoy->>PE: ext_proc HeadersRequest (UDS)
+ PE->>PE: lookup PolicyChain by route key
+ loop for each policy in chain
+ alt Go policy
+ PE->>PE: invoke in-process
+ else Python policy
+ PE->>Py: Execute(...) (UDS gRPC)
+ Py->>Py: load + run Python policy
+ Py-->>PE: ExecuteResponse(actions)
+ end
+ end
+ PE-->>Envoy: ext_proc Response (mode_override, headers/body actions)
+ Envoy->>Envoy: apply mutations
+ Envoy->>Backend: forward request
+```
+
+Python dependencies are installed into the runtime image at build time from a **locked requirements file** produced by the builder. The SDK ships from PyPI by default, or can be installed from the monorepo for local development.
+
+### 3. Gateway Builder
+
+A build-time Go tool that produces both the gateway-runtime and gateway-controller images. It is invoked from the gateway-runtime `Dockerfile` and runs a six-phase pipeline:
+
+```mermaid
+flowchart LR
+ A["build.yaml
(policy manifest)"] --> P1
+ SP["system policies"] --> P1
+ P1["1. Discovery
resolve policy refs"] --> P2
+ P2["2. Validation
schema + ID checks"] --> P3
+ P3["3. Code generation"] --> P4
+ P4["4. Compilation
go build policy-engine"] --> P5
+ P5["5. Dockerfile generation"] --> P6
+ P6["6. Manifest emission
build info + policy YAMLs"]
+```
+
+The two output images use an **extend base image** pattern: the gateway-runtime image is built on an Envoy base plus the freshly compiled `policy-engine` binary plus Python dependencies; the gateway-controller image is the controller base plus the policy-definition YAMLs extracted from the builder output.
+
+The canonical policy set for the current gateway version is declared in [`gateway/build.yaml`](../../build.yaml), covering auth, rate limiting, AI guardrails, AI traffic management, MCP, mediation, and subscription policies. Refer to that file for the authoritative list and pinned versions.
+
+---
+
+## xDS Streams Between Controller and Runtime
+
+The controller drives the runtime through several independent xDS channels. Envoy and the Policy Engine connect to different gRPC ports on the controller:
+
+```mermaid
+graph LR
+ subgraph Controller["Gateway Controller"]
+ EnvoyXDS["Envoy xDS server
(SotW, ADS)"]
+ PolicyXDS["Policy xDS server"]
+ end
+
+ subgraph Runtime["Gateway Runtime"]
+ Envoy["Envoy"]
+ PE["Policy Engine"]
+ end
+
+ EnvoyXDS -- "LDS / RDS / CDS / EDS / SDS" --> Envoy
+ PolicyXDS -- "policy chains" --> PE
+ PolicyXDS -- "per-route metadata" --> PE
+ PolicyXDS -- "API key state (atomic replace)" --> PE
+ PolicyXDS -- "subscription state" --> PE
+ PolicyXDS -- "lazy resources" --> PE
+```
+
+Notable properties of these streams:
+
+- **Envoy xDS** uses State-of-the-World: a full LDS/RDS/CDS/EDS/SDS snapshot is published per change. Each route carries only a stable `route_name` in metadata.
+- **Per-route metadata** (api name, version, kind, …) is delivered to the Policy Engine at deploy time as a separate stream, not parsed per request — this avoids per-request protobuf metadata unmarshaling in the data path.
+- **API keys** are indexed by **SHA-256 hash** of the raw key — the runtime never sees plaintext. Keys are swapped atomically per snapshot so auth never gaps during rotation.
+- **Subscriptions** carry active plan limits and billing IDs needed by analytics.
+- **Lazy resources** is a generic channel for resources that should be loaded on first use rather than at startup.
+
+The Policy Engine exposes its current xDS resource versions on the controller's admin API, so integration tests and operators can gate readiness on a known sync version.
+
+---
+
+## Request Lifecycle
+
+```mermaid
+sequenceDiagram
+ participant Client
+ participant Envoy as Router (Envoy)
+ participant PE as Policy Engine
+ participant Py as Python Executor
+ participant Upstream as Backend / LLM / MCP
+
+ Client->>Envoy: HTTP request
+ Envoy->>PE: ext_proc HeadersRequest (UDS)
+ dynamic_metadata.route_name
+ PE->>PE: resolve route_name → RouteConfig + PolicyChain
+ PE->>PE: decide body mode (SKIP / BUFFERED)
from chain RequiresRequestBody
+ alt body required
+ PE-->>Envoy: continue with BUFFERED
+ Envoy->>PE: BodyRequest (full body)
+ else headers only
+ PE-->>Envoy: continue with SKIP
+ end
+ loop request policies
+ PE->>PE: execute Go policy
+ opt policy is Python
+ PE->>Py: Execute (UDS gRPC)
+ Py-->>PE: ExecuteResponse
+ end
+ end
+ PE-->>Envoy: HeadersResponse (header/body/immediate-response actions)
+ Envoy->>Upstream: forward (possibly with host-rewrite, mutated headers/body)
+ Upstream-->>Envoy: response
+ Envoy->>PE: ext_proc ResponseHeaders / ResponseBody
+ loop response policies
+ PE->>PE: execute
+ end
+ PE-->>Envoy: response mutations
+ Envoy-->>Client: HTTP response
+ PE->>PE: publish analytics event (async)
+```
+
+Short-circuits are honoured at every step: an auth policy may emit an `ImmediateResponse` action and the chain ends without ever touching the upstream.
+---
+
+## Configuration Management
+
+### Layered Configuration
+
+All three runtime processes (controller, policy engine, python executor) share the same configuration model:
+
+```
+CLI flags > env vars > TOML config file > built-in defaults
+```
+
+A single TOML file covers every section needed across the three processes.
+
+### Artifact Templating
+
+Resource YAMLs (RestApi, LlmProvider, etc.) are rendered through **Go templates** before validation. The available helpers cover the things artifacts actually need: resolving a value from the gateway secret store, reading an environment variable, supplying a default, requiring a value to be present, and marking a value as sensitive so admin endpoints redact it.
+
+Example:
+
+```yaml
+spec:
+ upstream:
+ main:
+ url: '{{ env "BACKEND_URL" | default "https://api.example.com" }}'
+ auth:
+ type: bearer
+ token: '{{ secret "BACKEND_TOKEN" | redact }}'
+```
+
+Rendering errors are typed and surfaced as HTTP 400s by the management API.
+
+### Secrets
+
+Secrets are stored encrypted at rest in the controller database using **AES-GCM**. The `secret` template function resolves to the decrypted value at render time. Values marked sensitive are masked in downstream admin config-dump endpoints.
+
+---
+
+## Deployment Modes
+
+### Mutable Mode (default)
+
+Configurations are managed at runtime through the Management REST API or through the Platform API (control-plane push). The gateway's database is the source of truth; changes are persisted, replicated to peer controllers via EventHub, and pushed to runtimes via xDS.
+
+### Immutable Mode
+
+For GitOps-style and Kubernetes-native deployments, the controller can run in **immutable mode**. When enabled:
+
+- On startup, the controller walks an artifacts directory and applies all YAML resources via the same service layer the REST handlers use, in dependency order. Any failure aborts startup.
+- The SQLite database file is **deleted on startup** to guarantee a fresh, reproducible state. Postgres is rejected — immutable mode is SQLite-only.
+- All write methods on the management API return `405 Method Not Allowed`; read endpoints remain available.
+
+This mode is the recommended path for Kubernetes ConfigMap-based deployments and for baking a fully-formed gateway into a custom container image.
+
+### Standalone Distribution
+
+A `make` target produces a standalone zip containing the controller binary, the runtime image references, and a self-contained Docker Compose file for installation outside the monorepo.
+
+### Platform-API Control Plane Mode
+
+The Gateway can run standalone (configurations submitted directly to the Controller REST API) or it can register with a central **Platform API** — the system's actual Control Plane — using a combination of REST and WebSocket. The Platform API can manage **multiple, independent gateways** at once.
+
+#### Authentication
+
+Both channels authenticate with the same **gateway registration token**, sent as an HTTP header on:
+
+1. The WebSocket upgrade request — this also serves as the registration handshake (the WebSocket dial *is* the register call).
+2. Every REST request the gateway makes to the Platform API.
+
+A `401 Unauthorized` from either channel is treated as a **permanent failure** — the gateway exits rather than reconnecting. Other permanent statuses (forbidden, not-found, conflict, unprocessable) cause the same exit-on-failure behaviour so a misconfigured gateway doesn't loop forever against a control plane that will never accept it.
+
+#### Channels
+
+| Channel | Direction | Used for |
+| -------------- | -------------------- | ---------------------------------------------------------------------------------------------- |
+| **REST** (HTTPS) | Gateway → Platform API | Well-known discovery; **manifest + version push** on every connect, carrying gateway version, functionality type, and the list of installed policy definitions |
+| **WebSocket** | bidirectional | Long-lived event channel — deploy / undeploy / API key / subscription events pushed down; heartbeat |
+
+The platform may reject the manifest if version or policy set is incompatible — also treated as a permanent failure.
+
+#### Custom Policy Sync
+
+The manifest push carries every policy installed in the gateway, each entry tagged with a `managedBy` field — `"wso2"` for built-in policies and `"customer"` for policies added via `ap gateway image build`. System policies (those whose name is prefixed `wso2_apip_sys_`) are filtered out at the gateway before the manifest is sent; they are an internal concern of the data plane and the Platform API has no need to know about them. Customer-managed entries include the policy's full `parameters` and `systemParameters` JSON-Schema blocks; for WSO2-managed entries those are dropped on the platform side because the schema is already known centrally.
+
+The Platform API persists the manifest into a `gateways.manifest` column on receipt, but does **not** automatically promote customer-managed policies into the catalogue the Console uses for attachment. That step is **Console-triggered** — the Console calls `POST /api/v1/gateway-custom-policies/sync` with the gateway, policy name, and version. The service looks up the stored manifest, verifies the entry's `managedBy == "customer"`, and writes the extracted definition into the org-scoped `gateway_custom_policies` table. Only after this Console sync is a custom policy attachable to APIs through the Console UI.
+
+#### Deployment Acknowledgement
+
+Deployments pushed from the Platform API are not fire-and-forget. After the gateway applies (or fails to apply) a deployment or undeployment, it sends an acknowledgement back over the same WebSocket carrying the originating deployment ID, the action, and a terminal status (`success`/`failed` with an optional error code). Acknowledgements are sent for every WebSocket-pushed resource type — REST APIs, LLM providers, LLM proxies, MCP proxies, and WebSub APIs.
+
+The Platform API drives its own internal in-flight state machine off these acks — that intermediate state is platform-side; the gateway only reports the terminal outcome.
+
+#### Startup Sync
+
+Because WebSocket events can be missed while a gateway is down, every gateway runs a **background reconciliation** with the Platform API on startup:
+
+1. The gateway fetches the platform's expected deployment set for its `gateway_id` over REST.
+2. It diffs that set against its own local state.
+3. Missing or stale deployments are pulled and applied; orphaned local deployments are removed.
+
+The diff is computed **gateway-side** — gateways scale out far more than the Platform API, so doing it server-side would create a fan-out bottleneck. The sync is **asynchronous**: the gateway begins serving traffic immediately and reconciles in the background. Any WebSocket event arriving mid-sync naturally wins via deployment-ID ordering — operations are idempotent.
+
+```mermaid
+graph TB
+ subgraph CP["Control Plane"]
+ Mgr["Platform API
Deployment Manager"]
+ end
+
+ subgraph DP["Data Plane"]
+ subgraph GW1["Gateway A (gateway_id=gw-a)"]
+ C1["Controller"]
+ R1["Runtime"]
+ end
+
+ subgraph GW2["Gateway B (gateway_id=gw-b)"]
+ C2["Controller"]
+ R2["Runtime"]
+ end
+ end
+
+ C1 -- "REST
POST /gateways/{id}/manifest
(version, policies)" --> Mgr
+ Mgr <-- "WebSocket
register · deploy · undeploy
health · heartbeat" --> C1
+ C2 -- "REST" --> Mgr
+ Mgr <-- "WebSocket" --> C2
+ C1 --> R1
+ C2 --> R2
```
-+-------------------------------------------------------------+
-| Gateway-Controller (container) |
-| +-------------------+ +-------------------+ |
-| | REST API Server | -> | Validation Layer | |
-| | (Port 9090) | +-------------------+ |
-| +-------------------+ | |
-| | v |
-| | +----------------------+ |
-| | | SQLite + In-Memory | |
-| | | Cache | |
-| | +----------------------+ |
-| | | |
-| v v |
-| +-------------------+ +-------------------+ |
-| | xDS Translator | -> | xDS v3 Server | |
-| +-------------------+ | (Port 18000) | |
-| +-------------------+ |
-+-------------------------------------------------------------+
- |
- | xDS gRPC
- v
-+-------------------------------------------------------------+
-| Router (Envoy container) |
-| +-------------------+ +-------------------+ |
-| | Envoy Proxy | -> | Backend Services | |
-| | (Port 8080) | +-------------------+ |
-| +-------------------+ |
-+-------------------------------------------------------------+
+
+### Event Gateway
+
+For event-driven (`WebSubApi`) traffic the same Gateway Controller drives a separate **event-gateway runtime** instead of the Envoy-based runtime. The controller's REST API, persistence, xDS streams, and `RuntimeDeployConfig` translator are reused unchanged; only the data-plane runtime differs.
+
+> **TODO**: A dedicated architecture document for the event-gateway runtime does not yet exist. Add `event-gateway/spec/architecture/architecture.md` covering the WebSub subscription flow, Kafka delivery model, and runtime/controller interaction.
+
+### CLI
+
+The `ap` CLI is the local-user equivalent of the Platform API — it talks to the Gateway Controller's **management REST API** directly to deploy, list, update, and undeploy artifacts. There is no separate channel: every CLI operation maps 1:1 to a REST call against the management API, just like the Platform API and the Gateway Operator. It also offers a `kubectl apply`-style bulk apply of a directory of artifact YAMLs, and a wrapper that runs the gateway-builder in Docker to produce custom runtime + controller images locally.
+
+The CLI is therefore the same kind of REST-API client as the Platform API and the Gateway Operator — just driven interactively from a developer's machine.
+
+### Kubernetes Integration
+
+On Kubernetes the gateway is deployed and managed by the **Gateway Operator**. The operator is a *client* of the Gateway Controller's REST API — it does not bypass the controller — and supports two reconciliation flows side by side:
+
+| Flow | CRDs | Behavior |
+| ----------------------- | ----------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------- |
+| **WSO2 CRD flow** | `ApiGateway`, `RestApi`, `LlmProvider`, `LlmProxy`, `Mcp`, `WebSubApi`, `ApiKey`, `Subscription`, `SubscriptionPlan`, `Certificate`, `Secret` | Operator deploys the gateway via Helm and POSTs each CR's spec to the controller's management REST API. CRs mirror controller resource kinds 1:1. |
+| **Kubernetes Gateway API flow** | `Gateway`, `HTTPRoute`, `APIPolicy` | Operator deploys the gateway from a `Gateway` CR and translates `HTTPRoute` + `APIPolicy` into the controller's `RestApi` shape via the same REST API. |
+
+Both flows converge on the same REST API of the same Gateway Controller — the Kubernetes layer is just another producer alongside CLI users, Platform API, and immutable-mode file artifacts.
+
+---
+
+## High Availability
+
+HA is configured **per gateway** (per `gateway_id`). Two HA gateways can still share the same physical PostgreSQL because every row is scoped by `gateway_id`.
+
+```mermaid
+graph TB
+ LB["Load Balancer"]
+
+ LB --> R1[Gateway Runtime 1]
+ LB --> R2[Gateway Runtime 2]
+ LB --> R3[Gateway Runtime 3]
+
+ subgraph Controllers["Gateway Controller replicas
(same gateway_id)"]
+ C1[Controller 1] <-->|EventHub poll| DB
+ C2[Controller 2] <-->|EventHub poll| DB
+ end
+
+ R1 -. "Envoy xDS + Policy xDS" .-> C1
+ R2 -. "Envoy xDS + Policy xDS" .-> C2
+ R3 -. "Envoy xDS + Policy xDS" .-> C1
+
+ R1 --- Redis[(Redis
shared rate-limit)]
+ R2 --- Redis
+ R3 --- Redis
+
+ DB[("PostgreSQL
rows scoped by gateway_id")]
```
-## Integration Points
-
-- **API Developers** → Gateway-Controller: Submit API configurations via REST API.
-- **Router** ← Gateway-Controller: Receives xDS configuration updates via gRPC.
-- **Backend Services** ← Router: Forwards HTTP requests based on API configurations.
-- **Platform API** → Gateway: Orchestrates API deployments to gateways.
-- **Portals/CLI** → Platform API → Gateway: Indirect configuration management.
-
-## Deployment Tiers
-
-### Basic Gateway
-- Components: Gateway-Controller (memory-only), Router, Policy Engine.
-- No persistence (configurations lost on restart).
-- Basic rate limiting built into Router.
-- Use case: Development, testing, 14-day trial.
-
-### Standard Gateway
-- Components: All Basic + Rate Limiter + Redis + SQLite.
-- Persistent storage with SQLite (configurable to PostgreSQL/MySQL).
-- Advanced distributed rate limiting.
-- Use case: Production, enterprise deployments.
-
-## Data Flow
-
-### API Configuration Lifecycle
-1. User submits API config (YAML/JSON) to REST API (port 9090).
-2. Gateway-Controller validates configuration structure and fields.
-3. Configuration persisted to SQLite and cached in memory.
-4. xDS translator generates Envoy configuration from API config.
-5. xDS server pushes new snapshot to Router via gRPC (port 18000).
-6. Router applies configuration gracefully (zero downtime).
-
-### Runtime Request Flow
-1. HTTP request arrives at Router (port 8080).
-2. Router matches request to API configuration (method, path, context).
-3. Policy Engine evaluates policies (auth, rate limit, etc.).
-4. Request forwarded to backend service upstream URL.
-5. Response returned to client.
-6. Access log written to stdout in JSON format.
+- **Controller replicas** of one gateway share a PostgreSQL database and a `gateway_id`. They use the DB-backed EventHub to keep their in-memory caches and xDS snapshots in sync — no separate broker required.
+- **Runtime replicas** are stateless. Each connects to one controller's xDS streams. Configuration is reconstructed entirely from xDS — restart is safe.
+- **Other gateways** with a different `gateway_id` can share the same PostgreSQL instance without interfering — their data, events, and xDS state are isolated by ID.
+- **Distributed rate limiting** uses Redis as the shared counter store for the `advanced-ratelimit` policy. Without Redis, rate limiting is per-replica.
+- **Certificate rotation** is hot-reloaded by the controller (no restart required) and republished via SDS.
+
+---
+
+## Observability
+
+- **Metrics**: All three processes expose Prometheus metrics. The policy engine emits per-request, per-policy, and per-chain metrics — request count, latency histograms, action counts, chain length, xDS connection state, snapshot version, body mode distribution.
+
+- **Tracing**: OpenTelemetry tracing in both Envoy and the Policy Engine. The default exporter points at an `otel-collector` sidecar that fans out to Jaeger or any OTLP backend. The Policy Engine creates a child span per policy execution and links across the ext_proc boundary using a propagated request ID.
+
+- **Logging**: Structured logs from both Go and Python processes, with consistent per-process prefixes so a single `docker logs` stream stays readable.
+
+- **Analytics**: Per-request events are published to configurable sinks (Moesif, gRPC ALS) asynchronously.
+
+---
+
+## Key Architectural Decisions
+
+| Decision | Why |
+| ---------------------------------------------------------------- | ------------------------------------------------------------------------------ |
+| Policy Engine has **zero built-in policies**; all linked at build time | Reproducibility, security review surface, custom-policy support without a plugin loader |
+| **Go templates** for artifact field interpolation | Composable, gives typed render errors with clear messages |
+| **`RuntimeDeployConfig`** as kind-agnostic intermediate | Frees xDS translator and Policy Engine from per-kind branching |
+| **RouteConfig delivered via xDS** (not extracted from request metadata) | Avoids per-request protobuf metadata unmarshal in the data path |
+| **Per-chain body mode** with `mode_override` | Headers-only chains skip body buffering entirely |
+| **Atomic API-key replacement** on every xDS snapshot | No auth gap during xDS key rotation |
+| **UDS** between Router ↔ Policy Engine ↔ Python Executor | Lowest-latency local IPC; no port management; security via filesystem perms |
+| **Optional Python Executor**, started only if Python policies exist | Zero overhead for Go-only deployments |
+| **Single Dockerfile** for all three runtime processes | One artifact to scan, sign, and ship; matching Python versions guarantee C-ext compatibility |
+| **EventHub via DB polling** for controller multi-replica sync | Avoids adding Kafka/Redis/etc. as a hard dependency |
+| **`gateway_id` scoping on every persistent row** | Lets multiple distinct gateways share one PostgreSQL without interference |
+| **Immutable mode wipes SQLite on boot** | Guarantees the file artifacts are the single source of truth |
+| **Extend-base-image** custom builds via `gateway-builder` | Custom policy sets compose cleanly on top of WSO2-published base images |
+
+---
+
+## Versioning and Compatibility
+
+- The gateway and its Management/Admin REST APIs follow independent version tracks.
+- The Envoy version is pinned in the runtime Dockerfile.
+- Policies are pinned by minor-version in the build manifest and resolved against the Go-module / pip-package references in the policy manifest. The Policy Engine and Controller key on the policy **major** version to allow forward-compatible minor upgrades without re-deployment.
+- The runtime reports its version on the Platform API connection; the platform can enforce a manifest version match or verification flag before accepting a deployment.
+
+---
+
+## Document Status
+
+- **Document Version**: 2.0
+- **Last Updated**: 2026-05-20
+- **Applies To**: Gateway `1.2.0-SNAPSHOT`
+- **Status**: Active