You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* feat(plugins): runtime plugin management — global toggle, per-plugin mode, cross-instance propagation
Adds runtime plugin management capabilities — global enable/disable,
per-plugin mode changes, and cross-worker/cross-pod state propagation
via Redis. Closes 14 multi-instance gaps identified in the plugin
configuration system.
Key changes:
- PUT /admin/plugins — global enable/disable via Redis
- PUT /admin/plugins/{name} — per-plugin mode change (enforce/permissive/disabled)
- GET /admin/plugins — includes plugins_globally_enabled from runtime state
- TTL-based cache refresh (30s default) for eventual consistency across instances
- Wildcard binding invalidation fix — evicts all team contexts on * binding
- DB error fallback — graceful degradation when Postgres is temporarily unavailable
- MGET batched Redis reads for mode overrides
- Structured audit logging for all plugin state changes
- 43 tests (23 unit + 20 integration)
Co-authored-by: cafalchio <mcafalchio@gmail.com>
Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
* feat(plugins): runtime plugin management — address review findings
Consolidates the review-driven changes on top of the initial runtime plugin
management commit:
- Redis is authoritative for both the global toggle and per-plugin mode
overrides; single-node deployments fall through to an in-process map with
explicit ``redis_persisted`` signalling in the admin responses.
- Redis-synced local overrides expire at the cluster's 24h TTL so workers
don't keep applying overrides the cluster has already released; durable
entries (Redis unavailable at write time) remain sticky.
- Factory-init failures on nodes with plugins disabled are recorded and
surface one ERROR the first time a shared-toggle request hits a degraded
node, instead of being silently swallowed.
- Runtime-state globals live in a leaf ``_state.py`` to break the
``framework → manager → framework`` import cycle; writers go through
``_state.set_local_mode_override`` and ``prune_expired_local_overrides``
so snapshot/prune semantics stay consistent.
- Admin toggle handler refreshes ``app.state.plugin_manager`` and the
``PluginService`` singleton so freshly disabled nodes can serve the
runtime-enabled subsystem without a restart; the inverse disable path
clears them.
- Admin plugin-view GETs run a best-effort self-heal that always mirrors
``framework.get_plugin_manager()`` (TTL-cached) into the admin caches,
so remote disables take effect on this worker's next read and a
swallowed toggle-sync failure cannot leave views stale.
- ``update_plugin_mode`` validates against the configured plugin set
instead of the live manager, so operators can pre-stage per-plugin
overrides on a process that booted with plugins disabled.
- Test suite updated and expanded: deny-path regressions for the admin
cache sync, remote-disable self-heal, configured-name validation,
expired-override pruning, and backing-dict identity.
Signed-off-by: Jonathan Springer <jps@s390x.com>
* test(plugins): reset framework Redis provider between tests
Lifespan-exercising tests in ``test_main_extended.py`` monkeypatch
``main.get_redis_client`` to an ``AsyncMock`` before triggering lifespan,
which registers that mock as the plugin framework's shared Redis provider.
``set_shared_redis_provider`` is module-level state that ``monkeypatch``
doesn't roll back, so the mock bled into subsequent tests: the next call
to ``_read_shared_enabled`` treated the mock's return value as a real
Redis reply, decoded to ``False``, and made ``get_plugin_manager`` return
``None`` even after the test had set ``_PLUGINS_ENABLED = True``.
Adds an autouse fixture in ``tests/unit/mcpgateway/conftest.py`` that
clears the shared provider before and after each test. Plugin-suite tests
re-install their dynamic provider after this runs, so behaviour there is
unchanged.
Signed-off-by: Jonathan Springer <jps@s390x.com>
* test(admin): drop caplog dependency from error-swallow regression pins
Two error-swallow regression tests asserted on ``caplog.records`` to prove
the warning path was hit. That capture is brittle under pytest-xdist: if
any earlier test in the same worker triggered the app's lifespan, its
``LoggingService.initialize`` calls ``root_logger.handlers.clear()`` and
wipes caplog's handler, so subsequent ``LOGGER.warning`` calls never reach
the capture fixture.
The tests now verify the observable behaviour — the operation returns
normally instead of raising, and the failing sync step was actually
exercised (via ``assert_called_once_with``/sentinel) rather than skipped.
Equally strong guarantee, no log-capture dependency.
Signed-off-by: Jonathan Springer <jps@s390x.com>
* test(admin): restore warning-path pins via logger-local handler
The prior rewrite dropped caplog entirely, which lost the regression pin on
the WARNING log a future refactor could accidentally remove. Replaces caplog
with a small ``_capture_admin_logger_records`` context manager that attaches
a handler directly to the ``mcpgateway.admin`` logger.
Logger-local capture is immune to the xdist hazard that caused the CI
failure (``LoggingService.initialize`` calls ``root_logger.handlers.clear()``
during lifespan, wiping caplog's root-attached handler) and also bypasses
the root level gate — so the warning assertion remains reliable regardless
of which tests ran earlier in the same worker.
Signed-off-by: Jonathan Springer <jps@s390x.com>
* test(plugins): close coverage gaps on framework, manager, tool-bindings router, and lifespan
Adds targeted regression pins for the remaining uncovered lines:
- ``framework/__init__.py`` (86% → 100%): Redis-transport failure branches in
``_read_shared_enabled``, ``enable_plugins_shared``, ``_publish_invalidation``,
``publish_plugin_mode_change`` and ``get_plugin_mode_override``; the
``list_configured_plugin_names``/``get_plugin_manager_factory`` accessors;
unknown-frame rejection and swallow-and-log paths in
``_handle_invalidation_message``; and the listener's polling-when-no-client
and subscribe/dispatch/cancel branches.
- ``framework/manager.py`` (95% → 98%): TTL-expired cache eviction,
``_apply_redis_mode_overrides`` client-factory failure + model_copy
ValidationError, and the swallow-and-log semantics of ``invalidate_all`` /
``invalidate_team`` plus the ``iter_context_ids`` snapshot.
- ``admin.py``: ``update_plugin_mode`` no longer 500s when
``invalidate_all_plugin_managers`` raises — WARNs instead.
- ``tool_plugin_bindings.py`` (66.7% → 100%): wildcard ``tool_name="*"``
binding routes through ``factory.invalidate_team`` + team-scoped publish,
and still broadcasts when the local factory is degraded.
- ``main.py`` lifespan: plugin-factory init failure crashes loud when
``plugins.enabled=true`` and marks the node degraded when it's false; a
``stop_plugin_invalidation_listener`` shutdown failure is swallowed.
Signed-off-by: Jonathan Springer <jps@s390x.com>
* test(admin): intercept LOGGER directly in admin warning-path pins
Earlier iterations tried ``caplog`` (lost when lifespan clears root handlers)
and a logger-local handler (still vulnerable to ``logger.disabled`` flips,
filter additions, or LOG_LEVEL/effective-level gates depending on what other
tests in the same xdist worker did). Both kept failing intermittently in CI.
Replaces ``_capture_admin_logger_records`` with a direct ``patch.object``
on ``admin_module.LOGGER``. The spy records what the production code
actually called; the standard logging chain is no longer in the test path
at all, so worker ordering can't perturb the assertion.
Signed-off-by: Jonathan Springer <jps@s390x.com>
---------
Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
Signed-off-by: Jonathan Springer <jps@s390x.com>
Co-authored-by: cafalchio <mcafalchio@gmail.com>
Co-authored-by: Jonathan Springer <jps@s390x.com>
Copy file name to clipboardExpand all lines: docs/docs/api/plugin-bindings-api.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -579,7 +579,7 @@ curl -s -X POST \
579
579
2.`GatewayTenantPluginManagerFactory.get_config_from_db()` fetches all DB bindings for the `(team_id, tool_name)` pair, including any wildcard `*` bindings.
580
580
3. For each binding, the DB `mode` and `config` are merged over the global `config.yaml` values (`_merge_tenant_config`). DB values always win.
581
581
4. A **`TenantPluginManager`** is instantiated with the merged config and cached in memory, keyed by context ID.
582
-
5. On upsert or delete, the cache entry is invalidated immediately so the next call picks up the new config.
582
+
5. On upsert or delete, the handling worker invalidates its local cache entry and broadcasts a `binding_change` frame on the `plugin:invalidation` Redis pub/sub channel, so peer workers evict within milliseconds. If pub/sub delivery fails, the cache TTL (default 30 seconds) bounds the worst-case drift. For wildcard bindings (`tool_name="*"`), every cached context for the team is evicted on the handling worker.
structured_logger.error("Failed to list plugins in marketplace", user_id=get_user_id(user), user_email=get_user_email(user), error=e, component="plugin_marketplace", category="business_logic")
0 commit comments