|
| 1 | +# Plugin Manager Multi-Tenancy Architecture |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +The plugin subsystem supports **context-scoped isolation** through a shared `TenantPluginManagerFactory`. Each resolved context gets its own `TenantPluginManager` instance with an independently merged plugin configuration, while the default `__global__` context continues to serve non-context-aware call sites. |
| 6 | + |
| 7 | +The factory is intentionally **context-agnostic**. The identifier passed to `get_manager()` can represent a virtual server, tenant, tool, user, or another scoping key. The factory does not interpret the value; it only uses it to: |
| 8 | +- look up an existing cached manager, |
| 9 | +- fetch optional configuration overrides via `get_config_from_db(context_id)`, |
| 10 | +- build and initialize a `TenantPluginManager`, |
| 11 | +- cache the result for reuse. |
| 12 | + |
| 13 | +In the current gateway wiring, the primary runtime usage is: |
| 14 | +- `get_plugin_manager()` or `get_plugin_manager("__global__")` for shared/global plugin execution |
| 15 | +- `get_plugin_manager(server_id)` for server-scoped execution in services such as tools, prompts, and resources |
| 16 | + |
| 17 | +--- |
| 18 | + |
| 19 | +## Plugin Configuration Lifecycle |
| 20 | + |
| 21 | +### Startup: YAML is the source of truth |
| 22 | + |
| 23 | +Plugins must be **declared in the YAML configuration file at startup**. The YAML defines which plugins exist, their default `mode`, `priority`, and plugin-specific `config` keys. This base configuration is loaded once by `TenantPluginManagerFactory` and is **immutable at runtime** — it cannot be changed without a restart. |
| 24 | + |
| 25 | +```yaml |
| 26 | +plugins/config.yaml |
| 27 | + └─ plugin A (mode: enforce, priority: 10, config: {...}) |
| 28 | + └─ plugin B (mode: permissive, priority: 20, config: {...}) |
| 29 | +``` |
| 30 | +
|
| 31 | +Only plugins listed in the YAML participate in any context. There is no mechanism to introduce entirely new plugins at runtime. |
| 32 | +
|
| 33 | +### Runtime: per-context overrides via `PluginConfigOverride` |
| 34 | + |
| 35 | +For each context (e.g. a virtual server), the factory may apply a list of `PluginConfigOverride` objects on top of the base YAML config. An override can: |
| 36 | + |
| 37 | +- **change a plugin's `mode`** (e.g. promote from `permissive` to `enforce` for a specific server) |
| 38 | +- **change a plugin's `priority`** (re-order execution within the chain) |
| 39 | +- **add or replace keys in the plugin's `config` dict** (deep custom configuration) |
| 40 | + |
| 41 | +Overrides are **additive and selective**: only the fields explicitly set in an override are applied; everything else inherits the YAML base. A plugin not mentioned in the override list is used as-is. |
| 42 | + |
| 43 | +```yaml |
| 44 | +PluginConfigOverride |
| 45 | + └─ name: "plugin A" |
| 46 | + └─ mode: permissive # overrides YAML value for this context |
| 47 | + └─ config: {threshold: 5} # merged on top of base config |
| 48 | +``` |
| 49 | + |
| 50 | +### Fetch hook: `get_config_from_db` |
| 51 | + |
| 52 | +`get_config_from_db(context_id)` is the extension point that translates a context identifier into a list of `PluginConfigOverride` objects fetched from persistent storage. |
| 53 | + |
| 54 | +The base implementation always returns `None` (no overrides). Subclasses override this method to query the database — or any other store — for the per-context plugin settings associated with `context_id`. |
| 55 | + |
| 56 | +The returned overrides are passed directly to `_merge_tenant_config`, which walks the base config's plugin list and applies each override: per-plugin `config` dicts are shallow-merged (override keys win), and optional `mode` and `priority` fields replace the base values when present. The result is a new `Config` object used to construct an isolated `TenantPluginManager` for that context. |
| 57 | + |
| 58 | +In summary: `get_config_from_db` is the seam between the factory and your persistence layer — override it to make per-context plugin configuration dynamic. |
| 59 | + |
| 60 | +```mermaid |
| 61 | +flowchart TD |
| 62 | + Y["YAML config file\n(loaded at startup, immutable)"] |
| 63 | + DB["Persistence layer\n(DB, config store, etc.)"] |
| 64 | + H["get_config_from_db(context_id)\n[override this in subclass]"] |
| 65 | + O["list[PluginConfigOverride]\n(mode / priority / config keys)"] |
| 66 | + M["_merge_tenant_config()"] |
| 67 | + C["Merged Config\n(per-context)"] |
| 68 | + T["TenantPluginManager\n(per-context instance)"] |
| 69 | +
|
| 70 | + Y --> M |
| 71 | + DB --> H |
| 72 | + H --> O |
| 73 | + O --> M |
| 74 | + M --> C |
| 75 | + C --> T |
| 76 | +``` |
| 77 | + |
| 78 | +--- |
| 79 | + |
| 80 | +## Architecture Summary |
| 81 | + |
| 82 | +```mermaid |
| 83 | +flowchart TD |
| 84 | + APP["Application lifespan\n(mcpgateway.main)"] |
| 85 | + F["TenantPluginManagerFactory\n(singleton, holds base YAML config)"] |
| 86 | + G["TenantPluginManager\ncontext = '__global__'\n(backward-compat global manager)"] |
| 87 | + S1["TenantPluginManager\ncontext = 'server-id-1'"] |
| 88 | + S2["TenantPluginManager\ncontext = 'server-id-2'"] |
| 89 | + DB["get_config_from_db(context_id)\n(fetch per-context overrides)"] |
| 90 | + YAML["plugins/config.yaml\n(base plugin config)"] |
| 91 | +
|
| 92 | + APP -->|"init_plugin_manager_factory()"| F |
| 93 | + YAML -->|"loaded once at startup"| F |
| 94 | + F -->|"eager: get_manager()"| G |
| 95 | + F -->|"lazy: get_manager(server_id)"| S1 |
| 96 | + F -->|"lazy: get_manager(server_id)"| S2 |
| 97 | + F <-->|"override fetch"| DB |
| 98 | +``` |
| 99 | + |
| 100 | +### Main components |
| 101 | + |
| 102 | +- **`PluginManager`**: legacy Borg-style manager with shared state across instances. |
| 103 | +- **`TenantPluginManager`**: `PluginManager` subclass that disables Borg behavior and keeps fully independent state per instance. |
| 104 | +- **`TenantPluginManagerFactory`**: async-safe cache/factory for per-context managers. |
| 105 | +- **`get_plugin_manager()`**: global accessor in `mcpgateway.plugins.framework.__init__` that returns a context manager from the singleton factory when plugins are enabled. |
| 106 | + |
| 107 | +--- |
| 108 | + |
| 109 | +## Current Runtime Behavior |
| 110 | + |
| 111 | +### Startup |
| 112 | + |
| 113 | +At startup, `mcpgateway.main.lifespan()`: |
| 114 | + |
| 115 | +1. enables the plugin subsystem when configured, |
| 116 | +2. initializes the global `TenantPluginManagerFactory` with: |
| 117 | + - YAML config path, |
| 118 | + - plugin timeout, |
| 119 | + - hook payload policies, |
| 120 | + - optional observability provider, |
| 121 | +3. calls `await get_plugin_manager()` to resolve the default `__global__` manager, |
| 122 | +4. leaves additional context-specific managers to be created lazily on first use. |
| 123 | + |
| 124 | +This means the factory is initialized eagerly, but most tenant/server managers are initialized on demand. |
| 125 | + |
| 126 | +### Request-time resolution |
| 127 | + |
| 128 | +Services that support context scoping call `get_plugin_manager(server_id)` and receive: |
| 129 | +- a cached `TenantPluginManager`, or |
| 130 | +- a newly built and initialized one for that context. |
| 131 | + |
| 132 | +Call sites that do not provide a context ID continue to use the default global manager. |
| 133 | + |
| 134 | +--- |
| 135 | + |
| 136 | +## Core Types |
| 137 | + |
| 138 | +### `PluginManager` |
| 139 | + |
| 140 | +`PluginManager` remains the base implementation and still uses the Borg pattern. |
| 141 | + |
| 142 | +| Property | Current behavior | |
| 143 | +| --- | --- | |
| 144 | +| State model | Shared `__dict__` across instances | |
| 145 | +| Primary role | Legacy/global compatibility | |
| 146 | +| Initialization | Loads YAML config and shares registry/executor state | |
| 147 | +| Reset path | `PluginManager.reset()` clears shared Borg state | |
| 148 | + |
| 149 | +### `TenantPluginManager` |
| 150 | + |
| 151 | +`TenantPluginManager` inherits the public API from `PluginManager` but bypasses the Borg initialization path. |
| 152 | + |
| 153 | +| Property | Current behavior | |
| 154 | +| --- | --- | |
| 155 | +| State model | Independent per instance | |
| 156 | +| Config source | Either a `Config` object or YAML path | |
| 157 | +| Registry | Dedicated `PluginInstanceRegistry` per manager | |
| 158 | +| Executor | Dedicated `PluginExecutor` per manager | |
| 159 | +| Locking | Own async init/shutdown lock per manager | |
| 160 | + |
| 161 | +`enable_borg()` is overridden as a no-op, so tenant managers do not share state. |
| 162 | + |
| 163 | +### `TenantPluginManagerFactory` |
| 164 | + |
| 165 | +Defined in `mcpgateway/plugins/framework/manager.py`. |
| 166 | + |
| 167 | +| Method | Current behavior | |
| 168 | +| --- | --- | |
| 169 | +| `get_manager(context_id=None)` | Returns cached manager or creates one; defaults to `__global__` | |
| 170 | +| `_build_manager(context_id)` | Fetches overrides, merges config, initializes manager, swaps cache entry | |
| 171 | +| `_merge_tenant_config(overrides)` | Applies per-plugin override values on top of base YAML config | |
| 172 | +| `reload_tenant(context_id)` | Evicts cached manager, rebuilds it, and shuts down the old one | |
| 173 | +| `shutdown()` | Cancels in-flight builds and shuts down all cached managers | |
| 174 | +| `get_config_from_db(context_id)` | Extension hook; returns `None` in the base implementation — **subclass to enable DB-backed overrides** | |
| 175 | + |
| 176 | +--- |
| 177 | + |
| 178 | +## Accessor Layer |
| 179 | + |
| 180 | +The public accessor lives in `mcpgateway/plugins/framework/__init__.py`. |
| 181 | + |
| 182 | +| Function | Current behavior | |
| 183 | +| --- | --- | |
| 184 | +| `enable_plugins(toggle)` | Enables or disables the plugin subsystem globally | |
| 185 | +| `init_plugin_manager_factory(...)` | Creates the singleton factory explicitly during startup | |
| 186 | +| `get_plugin_manager(server_id="__global__")` | Returns a context manager when plugins are enabled and the factory exists | |
| 187 | +| `shutdown_plugin_manager_factory()` | Shuts down the factory and clears the singleton reference | |
| 188 | +| `reset_plugin_manager_factory()` | Clears the singleton reference for tests | |
| 189 | + |
| 190 | +### Important clarification |
| 191 | + |
| 192 | +The accessor **does not lazy-initialize the factory**. If the factory was not initialized during startup, `get_plugin_manager()` returns `None`. |
| 193 | + |
| 194 | +--- |
| 195 | + |
| 196 | +## Configuration Merge Model |
| 197 | + |
| 198 | +Each context starts from the base YAML plugin config and optionally applies a list of `PluginConfigOverride` objects returned by `get_config_from_db(context_id)`. |
| 199 | + |
| 200 | +Only plugins already present in the base config participate in the merge. There is no mechanism to introduce new plugins at runtime; the YAML is the canonical plugin registry. |
| 201 | + |
| 202 | +For each matching plugin: |
| 203 | + |
| 204 | +- `config` is shallow-merged: `{**base.config, **override.config}` — override keys win |
| 205 | +- `mode` is replaced only if provided in the override |
| 206 | +- `priority` is replaced only if provided in the override |
| 207 | + |
| 208 | +Plugins not mentioned in the override list remain unchanged. Passing `None` overrides means: use the base config as-is. |
| 209 | + |
| 210 | +```mermaid |
| 211 | +flowchart LR |
| 212 | + B["Base YAML config\n(plugin A, plugin B, plugin C)"] |
| 213 | + O["PluginConfigOverride list\nfrom get_config_from_db(context_id)"] |
| 214 | + M["_merge_tenant_config()"] |
| 215 | + R["Merged context Config\n(used to build TenantPluginManager)"] |
| 216 | +
|
| 217 | + B -->|"all plugins"| M |
| 218 | + O -->|"selective overrides\nmode / priority / config keys"| M |
| 219 | + M --> R |
| 220 | +``` |
| 221 | + |
| 222 | +--- |
| 223 | + |
| 224 | +## Concurrency Model |
| 225 | + |
| 226 | +Manager creation is deduplicated per context through `_inflight`. |
| 227 | + |
| 228 | +When multiple coroutines ask for the same context manager concurrently: |
| 229 | + |
| 230 | +1. the first caller acquires the lock and creates `_build_manager(context_id)` as an `asyncio.Task`, |
| 231 | +2. the task is stored in `_inflight[context_id]` and the lock is released, |
| 232 | +3. later callers acquiring the lock find the existing task and await it, |
| 233 | +4. once the task completes, `_build_manager` stores the result in `_managers` under the lock, |
| 234 | +5. `get_manager` re-checks `_managers` after the await to pick up any replacement triggered by a concurrent `reload_tenant`, |
| 235 | +6. the task is removed from `_inflight` in a `finally` block. |
| 236 | + |
| 237 | +This ensures only one initialization path runs per context at a time, and concurrent callers share the result rather than racing to build duplicate managers. |
| 238 | + |
| 239 | +```mermaid |
| 240 | +sequenceDiagram |
| 241 | + participant C1 as Caller 1 |
| 242 | + participant C2 as Caller 2 |
| 243 | + participant F as Factory (lock) |
| 244 | + participant T as _build_manager task |
| 245 | +
|
| 246 | + C1->>F: get_manager("server-1") |
| 247 | + F->>F: cache miss → create task |
| 248 | + F->>T: asyncio.create_task(_build_manager) |
| 249 | + F-->>C1: release lock, await task |
| 250 | + C2->>F: get_manager("server-1") |
| 251 | + F->>F: cache miss → inflight task found |
| 252 | + F-->>C2: release lock, await same task |
| 253 | + T-->>F: initialize manager, store in _managers |
| 254 | + T-->>C1: return manager |
| 255 | + T-->>C2: return same manager |
| 256 | +``` |
| 257 | + |
| 258 | +--- |
| 259 | + |
| 260 | +## Reload and Shutdown Semantics |
| 261 | + |
| 262 | +### Reload |
| 263 | + |
| 264 | +`reload_tenant(context_id)`: |
| 265 | + |
| 266 | +1. acquires the lock and removes the cached manager for the context, |
| 267 | +2. cancels any existing in-flight build task for the same context, |
| 268 | +3. creates a fresh `_build_manager` task and stores it in `_inflight`, |
| 269 | +4. releases the lock and shuts down the old manager outside it, |
| 270 | +5. awaits the new task and returns the rebuilt manager. |
| 271 | + |
| 272 | +### Shutdown |
| 273 | + |
| 274 | +`shutdown()`: |
| 275 | + |
| 276 | +1. snapshots cached managers and in-flight tasks under the lock, |
| 277 | +2. clears both caches atomically, |
| 278 | +3. cancels all in-flight tasks, |
| 279 | +4. awaits their completion (collecting exceptions), |
| 280 | +5. shuts down each cached manager. |
| 281 | + |
| 282 | +This keeps teardown orderly without leaving active manager instances behind. |
| 283 | + |
| 284 | +--- |
| 285 | + |
| 286 | +## Backward Compatibility |
| 287 | + |
| 288 | +The current design preserves compatibility in a few important ways: |
| 289 | + |
| 290 | +- `PluginManager` still exists for Borg-based shared-state behavior. |
| 291 | +- `TenantPluginManager` keeps the same public lifecycle and hook invocation API as `PluginManager`. |
| 292 | +- `get_plugin_manager()` without arguments still resolves the global `__global__` manager. |
| 293 | +- Call sites that are not context-aware continue to function against the global manager. |
| 294 | + |
| 295 | +What changed is the wiring: the system now routes plugin access through the factory instead of a single shared manager instance. |
| 296 | + |
| 297 | +--- |
| 298 | + |
| 299 | +## Recommended Mental Model |
| 300 | + |
| 301 | +Use the following model when reasoning about the architecture: |
| 302 | + |
| 303 | +- **one factory per process** — holds the base YAML config and the manager cache |
| 304 | +- **one cached manager per context ID** — each with an independent registry and executor |
| 305 | +- **plugins declared once in YAML at startup** — the YAML is the canonical plugin registry |
| 306 | +- **per-context overrides fetched at manager-build time** — via `get_config_from_db`; subclass to wire to your DB |
| 307 | +- **one shared base config, optionally merged with per-context overrides** — override keys win; unknown plugins are ignored |
| 308 | + |
| 309 | +That is the current architecture implemented by the code, without requiring every request path to understand how plugin configuration is stored internally. |
0 commit comments