Skip to content

Commit a74cea4

Browse files
gandhipratik203cafalchiojonpspri
authored
feat(plugins): runtime plugin management — global toggle, per-plugin mode, cross-instance propagation (#4292)
* feat(plugins): runtime plugin management — global toggle, per-plugin mode, cross-instance propagation Adds runtime plugin management capabilities — global enable/disable, per-plugin mode changes, and cross-worker/cross-pod state propagation via Redis. Closes 14 multi-instance gaps identified in the plugin configuration system. Key changes: - PUT /admin/plugins — global enable/disable via Redis - PUT /admin/plugins/{name} — per-plugin mode change (enforce/permissive/disabled) - GET /admin/plugins — includes plugins_globally_enabled from runtime state - TTL-based cache refresh (30s default) for eventual consistency across instances - Wildcard binding invalidation fix — evicts all team contexts on * binding - DB error fallback — graceful degradation when Postgres is temporarily unavailable - MGET batched Redis reads for mode overrides - Structured audit logging for all plugin state changes - 43 tests (23 unit + 20 integration) Co-authored-by: cafalchio <mcafalchio@gmail.com> Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> * feat(plugins): runtime plugin management — address review findings Consolidates the review-driven changes on top of the initial runtime plugin management commit: - Redis is authoritative for both the global toggle and per-plugin mode overrides; single-node deployments fall through to an in-process map with explicit ``redis_persisted`` signalling in the admin responses. - Redis-synced local overrides expire at the cluster's 24h TTL so workers don't keep applying overrides the cluster has already released; durable entries (Redis unavailable at write time) remain sticky. - Factory-init failures on nodes with plugins disabled are recorded and surface one ERROR the first time a shared-toggle request hits a degraded node, instead of being silently swallowed. - Runtime-state globals live in a leaf ``_state.py`` to break the ``framework → manager → framework`` import cycle; writers go through ``_state.set_local_mode_override`` and ``prune_expired_local_overrides`` so snapshot/prune semantics stay consistent. - Admin toggle handler refreshes ``app.state.plugin_manager`` and the ``PluginService`` singleton so freshly disabled nodes can serve the runtime-enabled subsystem without a restart; the inverse disable path clears them. - Admin plugin-view GETs run a best-effort self-heal that always mirrors ``framework.get_plugin_manager()`` (TTL-cached) into the admin caches, so remote disables take effect on this worker's next read and a swallowed toggle-sync failure cannot leave views stale. - ``update_plugin_mode`` validates against the configured plugin set instead of the live manager, so operators can pre-stage per-plugin overrides on a process that booted with plugins disabled. - Test suite updated and expanded: deny-path regressions for the admin cache sync, remote-disable self-heal, configured-name validation, expired-override pruning, and backing-dict identity. Signed-off-by: Jonathan Springer <jps@s390x.com> * test(plugins): reset framework Redis provider between tests Lifespan-exercising tests in ``test_main_extended.py`` monkeypatch ``main.get_redis_client`` to an ``AsyncMock`` before triggering lifespan, which registers that mock as the plugin framework's shared Redis provider. ``set_shared_redis_provider`` is module-level state that ``monkeypatch`` doesn't roll back, so the mock bled into subsequent tests: the next call to ``_read_shared_enabled`` treated the mock's return value as a real Redis reply, decoded to ``False``, and made ``get_plugin_manager`` return ``None`` even after the test had set ``_PLUGINS_ENABLED = True``. Adds an autouse fixture in ``tests/unit/mcpgateway/conftest.py`` that clears the shared provider before and after each test. Plugin-suite tests re-install their dynamic provider after this runs, so behaviour there is unchanged. Signed-off-by: Jonathan Springer <jps@s390x.com> * test(admin): drop caplog dependency from error-swallow regression pins Two error-swallow regression tests asserted on ``caplog.records`` to prove the warning path was hit. That capture is brittle under pytest-xdist: if any earlier test in the same worker triggered the app's lifespan, its ``LoggingService.initialize`` calls ``root_logger.handlers.clear()`` and wipes caplog's handler, so subsequent ``LOGGER.warning`` calls never reach the capture fixture. The tests now verify the observable behaviour — the operation returns normally instead of raising, and the failing sync step was actually exercised (via ``assert_called_once_with``/sentinel) rather than skipped. Equally strong guarantee, no log-capture dependency. Signed-off-by: Jonathan Springer <jps@s390x.com> * test(admin): restore warning-path pins via logger-local handler The prior rewrite dropped caplog entirely, which lost the regression pin on the WARNING log a future refactor could accidentally remove. Replaces caplog with a small ``_capture_admin_logger_records`` context manager that attaches a handler directly to the ``mcpgateway.admin`` logger. Logger-local capture is immune to the xdist hazard that caused the CI failure (``LoggingService.initialize`` calls ``root_logger.handlers.clear()`` during lifespan, wiping caplog's root-attached handler) and also bypasses the root level gate — so the warning assertion remains reliable regardless of which tests ran earlier in the same worker. Signed-off-by: Jonathan Springer <jps@s390x.com> * test(plugins): close coverage gaps on framework, manager, tool-bindings router, and lifespan Adds targeted regression pins for the remaining uncovered lines: - ``framework/__init__.py`` (86% → 100%): Redis-transport failure branches in ``_read_shared_enabled``, ``enable_plugins_shared``, ``_publish_invalidation``, ``publish_plugin_mode_change`` and ``get_plugin_mode_override``; the ``list_configured_plugin_names``/``get_plugin_manager_factory`` accessors; unknown-frame rejection and swallow-and-log paths in ``_handle_invalidation_message``; and the listener's polling-when-no-client and subscribe/dispatch/cancel branches. - ``framework/manager.py`` (95% → 98%): TTL-expired cache eviction, ``_apply_redis_mode_overrides`` client-factory failure + model_copy ValidationError, and the swallow-and-log semantics of ``invalidate_all`` / ``invalidate_team`` plus the ``iter_context_ids`` snapshot. - ``admin.py``: ``update_plugin_mode`` no longer 500s when ``invalidate_all_plugin_managers`` raises — WARNs instead. - ``tool_plugin_bindings.py`` (66.7% → 100%): wildcard ``tool_name="*"`` binding routes through ``factory.invalidate_team`` + team-scoped publish, and still broadcasts when the local factory is degraded. - ``main.py`` lifespan: plugin-factory init failure crashes loud when ``plugins.enabled=true`` and marks the node degraded when it's false; a ``stop_plugin_invalidation_listener`` shutdown failure is swallowed. Signed-off-by: Jonathan Springer <jps@s390x.com> * test(admin): intercept LOGGER directly in admin warning-path pins Earlier iterations tried ``caplog`` (lost when lifespan clears root handlers) and a logger-local handler (still vulnerable to ``logger.disabled`` flips, filter additions, or LOG_LEVEL/effective-level gates depending on what other tests in the same xdist worker did). Both kept failing intermittently in CI. Replaces ``_capture_admin_logger_records`` with a direct ``patch.object`` on ``admin_module.LOGGER``. The spy records what the production code actually called; the standard logging chain is no longer in the test path at all, so worker ordering can't perturb the assertion. Signed-off-by: Jonathan Springer <jps@s390x.com> --------- Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> Signed-off-by: Jonathan Springer <jps@s390x.com> Co-authored-by: cafalchio <mcafalchio@gmail.com> Co-authored-by: Jonathan Springer <jps@s390x.com>
1 parent ed64a6a commit a74cea4

24 files changed

Lines changed: 5352 additions & 139 deletions

.secrets.baseline

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
"files": "(?x)( package-lock\\.json$ |Cargo\\.lock$ |uv\\.lock$ |go\\.sum$ |mcpgateway/sri_hashes\\.json$ )|^.secrets.baseline$",
44
"lines": null
55
},
6-
"generated_at": "2026-04-19T08:51:02Z",
6+
"generated_at": "2026-04-19T16:27:13Z",
77
"plugins_used": [
88
{
99
"name": "AWSKeyDetector"
@@ -4170,31 +4170,31 @@
41704170
"hashed_secret": "fa9beb99e4029ad5a6615399e7bbae21356086b3",
41714171
"is_secret": false,
41724172
"is_verified": false,
4173-
"line_number": 4184,
4173+
"line_number": 4187,
41744174
"type": "Secret Keyword",
41754175
"verified_result": null
41764176
},
41774177
{
41784178
"hashed_secret": "559b05f1b2863e725b76e216ac3dadecbf92e244",
41794179
"is_secret": false,
41804180
"is_verified": false,
4181-
"line_number": 4785,
4181+
"line_number": 4788,
41824182
"type": "Secret Keyword",
41834183
"verified_result": null
41844184
},
41854185
{
41864186
"hashed_secret": "a8af4759392d4f7496d613174f33afe2074a4b8d",
41874187
"is_secret": false,
41884188
"is_verified": false,
4189-
"line_number": 4787,
4189+
"line_number": 4790,
41904190
"type": "Secret Keyword",
41914191
"verified_result": null
41924192
},
41934193
{
41944194
"hashed_secret": "85b60d811d16ff56b3654587d4487f713bfa33b7",
41954195
"is_secret": false,
41964196
"is_verified": false,
4197-
"line_number": 15113,
4197+
"line_number": 15116,
41984198
"type": "Secret Keyword",
41994199
"verified_result": null
42004200
}

charts/mcp-stack/profiles/ocp/values-pgo.yaml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ mcpContextForge:
5858
include_user_info: false
5959
plugins:
6060
- name: "PIIFilterPlugin"
61-
kind: "plugins.pii_filter.pii_filter.PIIFilterPlugin"
61+
kind: "cpex_pii_filter.PIIFilterPlugin"
6262
description: "Detects and masks Personally Identifiable Information"
6363
version: "0.1.0"
6464
author: "Mihai Criveti"
@@ -85,7 +85,7 @@ mcpContextForge:
8585
- "test@example.com"
8686
- "555-555-5555"
8787
- name: "RateLimiterPlugin"
88-
kind: "plugins.rate_limiter.rate_limiter.RateLimiterPlugin"
88+
kind: "cpex_rate_limiter.RateLimiterPlugin"
8989
description: "Per-user/tenant/tool rate limits"
9090
version: "0.1.0"
9191
author: "Mihai Criveti"
@@ -103,7 +103,7 @@ mcpContextForge:
103103
by_tenant: "3000/m"
104104
by_tool: {}
105105
- name: "RetryWithBackoffPlugin"
106-
kind: "plugins.retry_with_backoff.retry_with_backoff.RetryWithBackoffPlugin"
106+
kind: "cpex_retry_with_backoff.RetryWithBackoffPlugin"
107107
description: "Detects transient failures and asks the gateway to re-invoke the tool after a jittered exponential backoff delay"
108108
version: "0.1.0"
109109
author: "Mihai Criveti"
@@ -136,7 +136,7 @@ mcpContextForge:
136136
strategy: "truncate"
137137
ellipsis: "…"
138138
- name: "SecretsDetection"
139-
kind: "plugins.secrets_detection.secrets_detection.SecretsDetectionPlugin"
139+
kind: "cpex_secrets_detection.SecretsDetectionPlugin"
140140
description: "Detects keys/tokens/secrets in inputs/outputs; optional redaction/blocking"
141141
version: "0.1.0"
142142
author: "ContextForge"
@@ -160,7 +160,7 @@ mcpContextForge:
160160
block_on_detection: true
161161
min_findings_to_block: 1
162162
- name: "EncodedExfilDetector"
163-
kind: "plugins.encoded_exfil_detection.encoded_exfil_detector.EncodedExfilDetectorPlugin"
163+
kind: "cpex_encoded_exfil_detection.EncodedExfilDetectorPlugin"
164164
description: "Detects suspicious encoded exfiltration patterns in prompt args and tool outputs"
165165
version: "0.1.0"
166166
author: "Mihai Criveti"

docs/docs/api/plugin-bindings-api.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -579,7 +579,7 @@ curl -s -X POST \
579579
2. `GatewayTenantPluginManagerFactory.get_config_from_db()` fetches all DB bindings for the `(team_id, tool_name)` pair, including any wildcard `*` bindings.
580580
3. For each binding, the DB `mode` and `config` are merged over the global `config.yaml` values (`_merge_tenant_config`). DB values always win.
581581
4. A **`TenantPluginManager`** is instantiated with the merged config and cached in memory, keyed by context ID.
582-
5. On upsert or delete, the cache entry is invalidated immediately so the next call picks up the new config.
582+
5. On upsert or delete, the handling worker invalidates its local cache entry and broadcasts a `binding_change` frame on the `plugin:invalidation` Redis pub/sub channel, so peer workers evict within milliseconds. If pub/sub delivery fails, the cache TTL (default 30 seconds) bounds the worst-case drift. For wildcard bindings (`tool_name="*"`), every cached context for the team is evicted on the handling worker.
583583

584584
### Priority execution order
585585

mcpgateway/admin.py

Lines changed: 174 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,11 @@
104104
PaginationMeta,
105105
PluginDetail,
106106
PluginListResponse,
107+
PluginModeUpdateRequest,
108+
PluginModeUpdateResponse,
107109
PluginStatsResponse,
110+
PluginToggleRequest,
111+
PluginToggleResponse,
108112
PromptCreate,
109113
PromptMetrics,
110114
PromptRead,
@@ -2028,11 +2032,10 @@ async def get_overview_partial(
20282032
resources_total = db.query(func.count(DbResource.id)).scalar() or 0 # pylint: disable=not-callable
20292033
resources_active = db.query(func.count(DbResource.id)).filter(DbResource.enabled.is_(True)).scalar() or 0 # pylint: disable=not-callable
20302034

2031-
# Plugin stats
2035+
# Plugin stats — self-heal the cache so the overview reflects the live
2036+
# shared toggle even on a process that booted with plugins disabled.
20322037
overview_plugin_service = get_plugin_service()
2033-
plugin_manager = getattr(request.app.state, "plugin_manager", None)
2034-
if plugin_manager:
2035-
overview_plugin_service.set_plugin_manager(plugin_manager)
2038+
await _sync_plugin_service_from_runtime(request, overview_plugin_service)
20362039
plugin_stats = await overview_plugin_service.get_plugin_statistics()
20372040

20382041
# Infrastructure status (database, cache, uptime)
@@ -16535,6 +16538,41 @@ async def get_gateways_section(
1653516538
####################
1653616539

1653716540

16541+
async def _sync_plugin_service_from_runtime(request: Request, plugin_service) -> None:
16542+
"""Self-heal the admin plugin cache from the live framework state.
16543+
16544+
The framework's ``get_plugin_manager`` is the single source of truth — it
16545+
reads the shared toggle (TTL-cached, so this is a cheap call) and returns
16546+
``None`` when plugins are globally disabled, even when the disable came
16547+
from a *remote* node via the Redis toggle.
16548+
16549+
Every admin read mirrors that answer back into ``app.state.plugin_manager``
16550+
and the ``PluginService`` singleton. This closes three gaps:
16551+
16552+
1. Processes that booted with plugins disabled never had ``app.state`` set,
16553+
so admin views returned empty until restart even after the shared
16554+
toggle was flipped on.
16555+
2. If ``toggle_plugins_global`` swallowed an admin-cache sync failure, the
16556+
stale cache would persist forever — now the next GET repairs it.
16557+
3. A remote disable (``PUT /admin/plugins {"enabled": false}`` on another
16558+
worker) would leave this worker's ``app.state.plugin_manager``
16559+
populated from a prior enable, making admin views serve plugin
16560+
metadata the cluster had already turned off.
16561+
16562+
Best-effort: a failure logs a WARNING and leaves ``app.state`` alone. It
16563+
never raises, so it can't turn a read into a 500.
16564+
"""
16565+
try:
16566+
# pylint: disable=import-outside-toplevel
16567+
from mcpgateway.plugins.framework import get_plugin_manager
16568+
16569+
plugin_manager = await get_plugin_manager()
16570+
request.app.state.plugin_manager = plugin_manager
16571+
plugin_service.set_plugin_manager(plugin_manager)
16572+
except Exception as sync_exc:
16573+
LOGGER.warning("Admin plugin-cache self-heal failed (%s) — view may render stale/empty", sync_exc)
16574+
16575+
1653816576
@admin_router.get("/plugins/partial")
1653916577
@require_permission("admin.plugins", allow_admin_bypass=False)
1654016578
async def get_plugins_partial(request: Request, db: Session = Depends(get_db), user=Depends(get_current_user_with_permissions)) -> HTMLResponse: # pylint: disable=unused-argument
@@ -16558,17 +16596,15 @@ async def get_plugins_partial(request: Request, db: Session = Depends(get_db), u
1655816596
# Get plugin service and check if plugins are enabled
1655916597
plugin_service = get_plugin_service()
1656016598

16561-
# Check if plugin manager is available in app state
16562-
plugin_manager = getattr(request.app.state, "plugin_manager", None)
16563-
if plugin_manager:
16564-
plugin_service.set_plugin_manager(plugin_manager)
16599+
# Self-heal the cache so the partial reflects the live shared toggle.
16600+
await _sync_plugin_service_from_runtime(request, plugin_service)
1656516601

1656616602
# Get plugin data
1656716603
plugins = plugin_service.get_all_plugins()
1656816604
stats = await plugin_service.get_plugin_statistics()
1656916605

1657016606
# Prepare context for template
16571-
context = {"request": request, "plugins": plugins, "stats": stats, "plugins_enabled": plugin_manager is not None, "root_path": _resolve_root_path(request)}
16607+
context = {"request": request, "plugins": plugins, "stats": stats, "plugins_enabled": plugin_service.get_plugin_manager() is not None, "root_path": _resolve_root_path(request)}
1657216608

1657316609
# Render the partial template
1657416610
return request.app.state.templates.TemplateResponse(request, "plugins_partial.html", context)
@@ -16620,10 +16656,8 @@ async def list_plugins(
1662016656
# Get plugin service
1662116657
plugin_service = get_plugin_service()
1662216658

16623-
# Check if plugin manager is available
16624-
plugin_manager = getattr(request.app.state, "plugin_manager", None)
16625-
if plugin_manager:
16626-
plugin_service.set_plugin_manager(plugin_manager)
16659+
# Self-heal the cache from the live framework state.
16660+
await _sync_plugin_service_from_runtime(request, plugin_service)
1662716661

1662816662
# Get filtered plugins
1662916663
if any([search, mode, hook, tag]):
@@ -16656,14 +16690,73 @@ async def list_plugins(
1665616690
},
1665716691
)
1665816692

16659-
return PluginListResponse(plugins=plugins, total=len(plugins), enabled_count=enabled_count, disabled_count=disabled_count)
16693+
from mcpgateway.plugins.framework import are_plugins_enabled_shared # pylint: disable=import-outside-toplevel
16694+
16695+
return PluginListResponse(plugins_globally_enabled=await are_plugins_enabled_shared(), plugins=plugins, total=len(plugins), enabled_count=enabled_count, disabled_count=disabled_count)
1666016696

1666116697
except Exception as e:
1666216698
LOGGER.error(f"Error listing plugins: {e}")
1666316699
structured_logger.error("Failed to list plugins in marketplace", user_id=get_user_id(user), user_email=get_user_email(user), error=e, component="plugin_marketplace", category="business_logic")
1666416700
raise HTTPException(status_code=500, detail=str(e))
1666516701

1666616702

16703+
@admin_router.put("/plugins", response_model=PluginToggleResponse)
16704+
@require_permission("admin.plugins", allow_admin_bypass=False)
16705+
async def toggle_plugins_global(
16706+
payload: PluginToggleRequest,
16707+
request: Request,
16708+
user=Depends(get_current_user_with_permissions),
16709+
) -> PluginToggleResponse:
16710+
"""Enable or disable the plugin subsystem globally and broadcast the change."""
16711+
# pylint: disable=import-outside-toplevel
16712+
from mcpgateway.plugins.framework import are_plugins_enabled_shared, enable_plugins_shared, get_plugin_manager
16713+
16714+
redis_persisted = await enable_plugins_shared(payload.enabled)
16715+
16716+
# Sync the admin-side cache so ``GET /admin/plugins`` and
16717+
# ``GET /admin/plugins/{name}`` reflect the toggle on a process that
16718+
# started with plugins disabled (``app.state.plugin_manager`` stayed unset
16719+
# and ``PluginService`` was never wired). Without this, enabling plugins
16720+
# at runtime leaves the admin surfaces reading stale/empty metadata until
16721+
# the next restart; disabling likewise keeps the old manager visible.
16722+
#
16723+
# Treat the sync as best-effort: the shared toggle above already committed,
16724+
# so an exception here (e.g. factory build failure on a degraded node) must
16725+
# not turn into a 500 for a toggle that actually took effect. The admin
16726+
# surfaces will self-correct on the next request that re-reads the manager.
16727+
try:
16728+
plugin_service = get_plugin_service()
16729+
if payload.enabled:
16730+
plugin_manager = await get_plugin_manager()
16731+
if plugin_manager is not None:
16732+
plugin_service.set_plugin_manager(plugin_manager)
16733+
request.app.state.plugin_manager = plugin_manager
16734+
else:
16735+
plugin_service.set_plugin_manager(None)
16736+
request.app.state.plugin_manager = None
16737+
except Exception as sync_exc:
16738+
LOGGER.warning(
16739+
"Plugin global toggle applied (enabled=%s) but admin-cache sync failed (%s) — admin views will refresh on next request",
16740+
payload.enabled,
16741+
sync_exc,
16742+
)
16743+
16744+
LOGGER.info(f"Plugins globally {'enabled' if payload.enabled else 'disabled'} by {get_user_email(user)}")
16745+
structured_logger = get_structured_logger()
16746+
structured_logger.info(
16747+
f"Plugin subsystem globally {'enabled' if payload.enabled else 'disabled'}",
16748+
user_id=get_user_id(user),
16749+
user_email=get_user_email(user),
16750+
component="plugin_runtime",
16751+
category="security",
16752+
resource_type="plugin_global_toggle",
16753+
resource_action="update",
16754+
custom_fields={"enabled": payload.enabled, "redis_persisted": redis_persisted},
16755+
)
16756+
16757+
return PluginToggleResponse(plugins_enabled=await are_plugins_enabled_shared(), redis_persisted=redis_persisted)
16758+
16759+
1666716760
@admin_router.get("/plugins/stats", response_model=PluginStatsResponse)
1666816761
@require_permission("admin.plugins", allow_admin_bypass=False)
1666916762
async def get_plugin_stats(request: Request, db: Session = Depends(get_db), user=Depends(get_current_user_with_permissions)) -> PluginStatsResponse: # pylint: disable=unused-argument
@@ -16687,10 +16780,8 @@ async def get_plugin_stats(request: Request, db: Session = Depends(get_db), user
1668716780
# Get plugin service
1668816781
plugin_service = get_plugin_service()
1668916782

16690-
# Check if plugin manager is available
16691-
plugin_manager = getattr(request.app.state, "plugin_manager", None)
16692-
if plugin_manager:
16693-
plugin_service.set_plugin_manager(plugin_manager)
16783+
# Self-heal the cache from the live framework state.
16784+
await _sync_plugin_service_from_runtime(request, plugin_service)
1669416785

1669516786
# Get statistics
1669616787
stats = await plugin_service.get_plugin_statistics()
@@ -16749,10 +16840,8 @@ async def get_plugin_details(name: str, request: Request, db: Session = Depends(
1674916840
# Get plugin service
1675016841
plugin_service = get_plugin_service()
1675116842

16752-
# Check if plugin manager is available
16753-
plugin_manager = getattr(request.app.state, "plugin_manager", None)
16754-
if plugin_manager:
16755-
plugin_service.set_plugin_manager(plugin_manager)
16843+
# Self-heal the cache from the live framework state.
16844+
await _sync_plugin_service_from_runtime(request, plugin_service)
1675616845

1675716846
# Get plugin details
1675816847
plugin = plugin_service.get_plugin_by_name(name)
@@ -16806,6 +16895,69 @@ async def get_plugin_details(name: str, request: Request, db: Session = Depends(
1680616895
raise HTTPException(status_code=500, detail=str(e))
1680716896

1680816897

16898+
@admin_router.put("/plugins/{name}", response_model=PluginModeUpdateResponse)
16899+
@require_permission("admin.plugins", allow_admin_bypass=False)
16900+
async def update_plugin_mode(
16901+
name: str,
16902+
payload: PluginModeUpdateRequest,
16903+
_db: Session = Depends(get_db), # required by rbac decorator's session lookup
16904+
user=Depends(get_current_user_with_permissions),
16905+
) -> PluginModeUpdateResponse:
16906+
"""Persist a per-plugin mode override in Redis and invalidate cached managers."""
16907+
# pylint: disable=import-outside-toplevel
16908+
from mcpgateway.plugins.framework import invalidate_all_plugin_managers, list_configured_plugin_names, publish_plugin_mode_change
16909+
16910+
mode = payload.mode
16911+
16912+
# Validate against the *configured* plugin set, not the live manager. On a
16913+
# process that booted with plugins globally disabled, no manager is wired
16914+
# and ``PluginService.get_all_plugins()`` returns ``[]``; without this the
16915+
# handler 404s for every valid name and blocks operators from pre-staging
16916+
# a per-plugin mode before turning the subsystem on.
16917+
plugin_names = list_configured_plugin_names()
16918+
if name not in plugin_names:
16919+
raise HTTPException(status_code=404, detail=f"Plugin '{name}' not found. Available: {', '.join(plugin_names[:10])}")
16920+
16921+
# publish_plugin_mode_change always updates the in-process override map
16922+
# (so single-node-no-Redis deployments still work) and additionally
16923+
# attempts a Redis SET + publish. The Redis outcome is surfaced back to
16924+
# the caller as redis_persisted so they know whether the change reached
16925+
# other workers.
16926+
redis_persisted = await publish_plugin_mode_change(name, mode)
16927+
# The override is already stored (in-process map and/or Redis) by the line
16928+
# above. Treat the cache sweep as best-effort — if it raises, a background
16929+
# TTL refresh still reaches the new mode, and the operator must not see a
16930+
# 500 for an override that actually took effect.
16931+
try:
16932+
await invalidate_all_plugin_managers()
16933+
except Exception as invalidate_exc:
16934+
LOGGER.warning(
16935+
"Plugin '%s' mode override stored but cache invalidation failed (%s) — fresh managers will rebuild on next TTL expiry",
16936+
name,
16937+
invalidate_exc,
16938+
)
16939+
16940+
if redis_persisted:
16941+
LOGGER.info(f"Plugin '{name}' mode changed to '{mode}' by {get_user_email(user)} (Redis persisted, 24h TTL)")
16942+
else:
16943+
LOGGER.warning(f"Plugin '{name}' mode changed to '{mode}' by {get_user_email(user)} (this worker only — Redis unavailable)")
16944+
16945+
structured_logger = get_structured_logger()
16946+
structured_logger.info(
16947+
f"Plugin '{name}' mode changed to '{mode}'",
16948+
user_id=get_user_id(user),
16949+
user_email=get_user_email(user),
16950+
component="plugin_runtime",
16951+
category="security",
16952+
resource_type="plugin_mode",
16953+
resource_id=name,
16954+
resource_action="update",
16955+
custom_fields={"plugin_name": name, "new_mode": mode, "redis_persisted": redis_persisted},
16956+
)
16957+
16958+
return PluginModeUpdateResponse(plugin=name, mode=mode, redis_persisted=redis_persisted)
16959+
16960+
1680916961
##################################################
1681016962
# MCP Registry Endpoints
1681116963
##################################################

0 commit comments

Comments
 (0)