chore: merge main into release/0.6.0#1709
Closed
sriaradhyula wants to merge 850 commits into
Closed
Conversation
Production installs need Keycloak clients to use real configured secrets instead of silently keeping placeholder values. Reconcile managed clients through the init scripts and cover the strict-secret path so token exchange setup fails visibly when required material is missing. Assisted-by: Cursor:GPT-5.5 Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Application routes should no longer collapse onto a broad supervisor invoke check when the OpenFGA model exposes narrower organization capabilities. Map protected route groups to explicit relations and make integration access-check oracles require read access before returning resource grant details. Assisted-by: Cursor:GPT-5.5 Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Co-authored-by: Cursor <cursoragent@cursor.com>
The legacy withAuth wrapper should produce auditable capability rows even for fallback routes. Map platform config reads to system_config#read and fall back to admin UI view/manage checks instead of the old supervisor umbrella capability. Assisted-by: Cursor:GPT-5.5 Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Credential exchange should use the same envelope-store defaults across factory creation and persistence tests. Keep this separate from RBAC UI changes because it affects credential handling behavior rather than route or policy wiring. Assisted-by: Cursor:GPT-5.5 Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Co-authored-by: Cursor <cursoragent@cursor.com>
RAG authorization now depends on token-backed OpenFGA decisions instead of local group-role defaults. Stop propagating the legacy group fallback settings during setup and migration docs so local and Helm installs do not mask deny-by-default policy behavior. Assisted-by: Cursor:GPT-5.5 Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Keep generated rag-stack chart reference changes separate from the RBAC implementation work so reviewers can treat them as documentation output. Assisted-by: Cursor:GPT-5.5 Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Capture the intended Docker build-cache work before implementation so UI and supervisor image changes can be reviewed against explicit cache and runtime-preservation requirements. Assisted-by: Cursor:GPT-5.5 Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Co-authored-by: Cursor <cursoragent@cursor.com>
The RBAC model needs one canonical generated artifact and explicit organization capabilities for fine-grained application route gates. Keep the source FGA, chart JSON, cleanup of duplicate artifacts, and model coverage together so downstream PDP changes can rely on one model shape. Assisted-by: Cursor:GPT-5.5 Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Record the explicit withAuth compatibility capability mapping so the canonical RBAC architecture doc stays aligned with the API middleware gate changes. Assisted-by: Cursor:GPT-5.5 Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Add the newly mapped withAuth resources and scopes to the shared RBAC types and let chat route tests pass the compatibility chat capability before route-level sharing checks. Assisted-by: Cursor:GPT-5.5 Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Co-authored-by: Cursor <cursoragent@cursor.com>
feat(openfga): add route capabilities to canonical model
…rict-secrets feat(keycloak): reconcile strict client secrets
…mplate Signed-off-by: Kevin Kantesaria <kkantesaria@splunk.com>
…mplates Signed-off-by: Kevin Kantesaria <kkantesaria@splunk.com>
…yment env blocks Signed-off-by: Kevin Kantesaria <kkantesaria@splunk.com>
…hardcoded caipe Signed-off-by: Kevin Kantesaria <kkantesaria@splunk.com>
Merge the latest main branch into the UI RBAC gate PR so the branch is mergeable again while preserving the explicit withAuth capability typing from the PR. Assisted-by: Cursor:GPT-5.5 Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Co-authored-by: Cursor <cursoragent@cursor.com>
feat(ui): map withAuth routes to RBAC capabilities
Local and Helm deployments need a consistent way to route managed MCP servers through AgentGateway while preserving OpenFGA authorization. Add the compose config bridge, generated route template, and MCP-only compose profile wiring so local testing matches the gateway-backed path. Assisted-by: Cursor:GPT-5.5 Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Co-authored-by: Cursor <cursoragent@cursor.com>
GitHub and GitLab MCP targets need provider PAT placeholders to survive both static config rendering and Mongo-backed route reconciliation. Add secret-backed Helm environment wiring and regression coverage so the bridge does not drop backendAuth when it refreshes AgentGateway routes. Assisted-by: Cursor:GPT-5.5 Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Co-authored-by: Cursor <cursoragent@cursor.com>
The AgentGateway coarse ext_authz layer checks mcp_gateway caller access before route-level policy runs. Add that tuple to baseline member grants so admitted users can reach MCP routes while still relying on the finer resource checks behind the gateway. Assisted-by: Cursor:GPT-5.5 Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Account for the rebased Knowledge Base MCP target and expanded baseline OpenFGA grant profile so the AgentGateway bridge tests match current main behavior. Assisted-by: Cursor:GPT-5.5 Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Co-authored-by: Cursor <cursoragent@cursor.com>
…fi-base fix(dynamic-agents): migrate image to wolfi-base
fix(rag): migrate rag-server image to wolfi-base
Co-authored-by: Cursor <cursoragent@cursor.com> # Conflicts: # pyproject.toml # uv.lock
chore(ui): upgrade caipe-ui base image to Node 24
…UX (#1696) * feat(slack-ui): config parity, channel-admin editing, and admin save UX Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Kevin Kantesaria <kkantesaria@splunk.com> * chore: bump version to 0.5.7-dev.2 * test(slack-ui): fix Webex panel selectors and dedupe platform-settings import Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Kevin Kantesaria <kkantesaria@splunk.com> * refactor(slack-ui): extract Slack channel detail into provider components Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Kevin Kantesaria <kkantesaria@splunk.com> * fix(slack-ui): hash Slack token for in-process cache key (#1700) * feat(slack-bot): authorize bot service account for BFF platform-config reads Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Kevin Kantesaria <kkantesaria@splunk.com> * fix(slack-ui): hash Slack token for in-process cache key instead of raw suffix token.slice(-12) stored a raw fragment of the Slack bot token as a Map key in module-level memory. Replace with a SHA-256 hash (first 16 hex chars) in both the emoji and users-lookup routes so the token is never exposed in process memory or debug dumps. Assisted-by: Claude:claude-sonnet-4-6 Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> --------- Signed-off-by: Kevin Kantesaria <kkantesaria@splunk.com> Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Co-authored-by: Kevin Kantesaria <kkantesaria@splunk.com> --------- Signed-off-by: Kevin Kantesaria <kkantesaria@splunk.com> Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Co-authored-by: Kevin Kantesaria <kkantesaria@splunk.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Sri Aradhyula <sraradhy@cisco.com>
Let each user narrow which OAuth scopes their own provider connection requests via an "Advanced settings" panel on My Connections. The connector's allowed `scopes` is both the admin-managed upper bound and the default selection. - connect route accepts an optional `?scopes=` selection - startConnection validates via the pure boundScopes guard, rejecting out-of-bounds or empty selections with 400 and never minting a zero-scope token - the choice is carried through the signed OAuth state cookie and persisted as requestedScopes/grantedScopes on provider_connections so relink pre-fills the prior choice and the grant is auditable Existing connections without these fields and connects that do not open Advanced settings behave exactly as before (connector default). Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Docusaurus 3 (MDX) evaluates `{...}` as JS expressions, so the literal
scope-set notation `{A, B, C}` raised "ReferenceError: A is not defined"
and failed the static site build. Wrap the brace sets in backticks so
they render as literal text.
Assisted-by: Claude:claude-opus-4.8
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
…org KB admin (#1703) * feat(rbac): fix RAG datasource access gap, add public datasources, reorg KB admin Signed-off-by: Kevin Kantesaria <kkantesaria@splunk.com> * chore: bump version to 0.5.7-dev.7 --------- Signed-off-by: Kevin Kantesaria <kkantesaria@splunk.com> Co-authored-by: Kevin Kantesaria <kkantesaria@splunk.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…ope-selection feat(credentials): per-user OAuth scope selection at connect time
…ce (#1702) * fix(mcp): provider-token auth, knowledge-base RAG, and authz resilience Bundles the entangled MCP authorization, provider-token, and platform fixes that share gateway config, dynamic-agents, and RBAC doc files. MCP authz resilience (spec 2026-06-02-mcp-authz-resilience): - raise the AgentGateway ext_authz timeout default to 10s across the static Helm path and the dev compose/config-bridge path, keeping the fail-closed posture (operator-tunable via global.agentgateway.extAuth.timeout) - bounded transient-retry with backoff when loading MCP tools so cold-start authz timeouts self-heal; classify transient/not-ready vs permanent failures and surface honest availability messaging Provider-token unification: - route all provider/identity tokens through the X-CAIPE-Provider-Token -> Authorization: Bearer gateway transformation - Jira (cloudId rewrite), PagerDuty (bearer or static Token fallback), GitHub/GitLab hybrid (per-user OAuth with org-PAT fallback) - knowledge-base/RAG: forward the caller user JWT, or mint a caipe-platform client-credentials service token in non-user contexts; probe path resolves the same credential so it no longer 401s config-bridge also prunes /mcp/<id> routes whose mcp_servers row was removed (reconciler is the single source of truth for /mcp/* routes). Also includes independent fixes that touched shared files: - webex bot: reuse the WDM device across reconnects and de-duplicate redelivered message ids - connector onboarding UX: smart team/agent fallback, skip blocked rows instead of stranding the batch, treat bare-gateway discovery rows as auto-migratable - credentials: CREDENTIAL_ALLOW_INSECURE_LOCAL_KEY_WRAP dev-only escape hatch so the local-cmk wrapper can run on the prod-parity UI image Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Co-authored-by: Cursor <cursoragent@cursor.com> * chore: bump version to 0.5.7-dev.7 * fix(docs): escape brace set in MCP authz resilience spec for MDX Docusaurus 3 (MDX) evaluates `{...}` as JS expressions, so the literal classification set `{transient, permanent, denied}` raised a ReferenceError and failed the static site build. Wrap it in backticks to render literally. Assisted-by: Claude:claude-opus-4.8 Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Co-authored-by: Cursor <cursoragent@cursor.com> * fix(ui): reconcile admin onboarding with #1696; fix discovery test + lint #1696 ("slack-ui config parity") landed a refactor that replaced the ConnectorAdminPanel onboarding-defaults subsystem. Reconcile this branch with that design: - ConnectorAdminPanel.tsx: drop the saved-defaults-based smart team/agent fallback (incompatible with #1696's apply-time defaults model); take #1696's discovery behavior. Revert the two smart-fallback Webex tests. - Keep the compatible onboarding niceties: ConnectorOnboardingWizard skip-blocked-rows and the Slack adapter's ready-row import filter. - agentgateway-mcp-discovery.test.ts: drop "rag" from the expected dev MCP targets (the rag route was removed from the AgentGateway configs in this PR). - ChatPanel.tsx: prefer-const on a never-reassigned local that ESLint flagged as a hard error (unblocks the lint gate). Assisted-by: Claude:claude-opus-4.8 Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Co-authored-by: Cursor <cursoragent@cursor.com> * chore(ui): remove accidentally committed node_modules symlink A local node_modules symlink (used for test tooling in an isolated worktree) slipped past .gitignore because the ignore pattern matches the directory form, not a symlink file. Remove it so it never reaches the repo. Assisted-by: Claude:claude-opus-4.8 Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Co-authored-by: Cursor <cursoragent@cursor.com> * fix(jira): stop logging credential-derived request URL (CodeQL high) The Jira MCP client logged the full request URL at debug level. That URL is constructed from the credential-bearing `prerequisites` (token/email/url), so CodeQL flagged it as clear-text logging of sensitive information (1 high). Log only the HTTP method and request path instead — sufficient for debugging routing and never carrying the token. Assisted-by: Claude:claude-opus-4.8 Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(dynamic-agents): narrow MCP load retry catch to Exception The per-server retry helper in get_tools_with_resilience caught BaseException, which also swallowed KeyboardInterrupt, SystemExit, and asyncio.CancelledError -- so a cancellation/shutdown could be classified and retried instead of propagating. Narrow the catch (and the related type hints / isinstance guard) to Exception. Transient failures we actually retry (timeouts, mid-stream disconnects) are all Exception subclasses, so retry and classification behavior is unchanged, while cancellation and interpreter shutdown now propagate correctly. The flake8-blind-except (BLE001) suppression is kept since a broad Exception catch is still intentional here. Addresses github-code-quality review on PR #1702. Assisted-by: Claude:claude-opus-4.8 Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Co-authored-by: Cursor <cursoragent@cursor.com> --------- Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* docs(agents): add spec readability rules Assisted-by: Codex:gpt-5 Signed-off-by: subbaksh <subbaksh@cisco.com> * docs(agents): consolidate agent guidance Assisted-by: Codex:gpt-5 Signed-off-by: subbaksh <subbaksh@cisco.com> * docs(agents): require explicit human signoff Assisted-by: Codex:gpt-5 Signed-off-by: subbaksh <subbaksh@cisco.com> * docs(agents): add assisted-by example Assisted-by: Codex:gpt-5 Signed-off-by: subbaksh <subbaksh@cisco.com> --------- Signed-off-by: subbaksh <subbaksh@cisco.com> Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Co-authored-by: Sri Aradhyula <sraradhy@cisco.com> Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(ui): collapse consecutive identical tool chips in timeline Chatty agents (e.g. AWS aws_cli_execute) fire the same tool many times in a row, flooding the chat timeline with identical "tool -> tool done" chips. Coalesce consecutive identical tool calls into a single row with a xN count while keeping the dropdown header's true call total. Assisted-by: Claude:claude-opus-4.8 Co-authored-by: Cursor <cursoragent@cursor.com> * test(ui): cover consecutive tool-chip dedupe in timeline Assisted-by: Claude:claude-opus-4.8 Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(ui): correct code editor overflow and dark theme Signed-off-by: Oliver Cowley <ocowley+cisco-dev@cisco.com> * refactor(ui): clarify comments and simplify some boolean evaluation Signed-off-by: Oliver Cowley <ocowley+cisco-dev@cisco.com> * fix(ui): missing UI file changes Signed-off-by: Oliver Cowley <ocowley+cisco-dev@cisco.com> * test(ui): add tests for new cases and test file for RichCodeEditor Signed-off-by: Oliver Cowley <ocowley+cisco-dev@cisco.com> * refactor(ui): coerce value to boolean for clarity Signed-off-by: Oliver Cowley <ocowley+cisco-dev@cisco.com> --------- Signed-off-by: Oliver Cowley <ocowley+cisco-dev@cisco.com>
) * feat(ui): one-click "Migrate all to latest" for schema migrations The admin Schema Migrations tab required ~4 steps (initialize-to-v1, select-all, preview, type-confirmation + apply). Collapse the happy path to a single primary "Migrate all to latest" action with one confirm dialog (inline preview, no typed phrase), and move the per-migration / bulk controls behind an "Advanced controls" toggle. Server: new applyAllMigrations() orchestrator + POST /api/admin/rebac/migrations/apply-all route. It bootstraps unversioned schema areas to v1, then applies every pending implemented migration in dependency order (topological sort over declared `dependencies`, fixing the prior release/area ordering that ignored dependencies), continuing on per-migration failure and returning an aggregated apply report. Single round-trip instead of the previous N+1 client-side calls. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Co-authored-by: Cursor <cursoragent@cursor.com> * chore: bump version to 0.5.4-dev.1 * test(ui): expand Advanced controls in MigrationTab tests The "Migrate all to latest" feature moved the per-migration list, select-all controls, and the completed-migrations toggle behind a collapsed "Advanced controls" disclosure. The existing MigrationTab tests assert on that content immediately after render, so all 9 of them broke. Add an openAdvancedControls() helper and call it after render in the affected tests so they exercise the per-migration UI through the new disclosure. Test-only change; the feature UX is unchanged. Assisted-by: Cursor:claude-opus-4.8 Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Co-authored-by: Cursor <cursoragent@cursor.com> * chore: bump version to 0.5.7-dev.2 * chore: bump version to 0.5.7-dev.9 --------- Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…tology-wolfi-base Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> # Conflicts: # pyproject.toml # uv.lock
…fi-base fix(rag): migrate agent-ontology image to wolfi-base
…ning The config-bridge reconciler assumed it owned every /mcp/<id> route and rendered them solely from the mcp_servers Mongo collection. The 14 built-in MCP servers are shipped statically in deploy/agentgateway/config.yaml and are never written to Mongo. With an empty (or freshly reset) mcp_servers collection, the reconciler classified every built-in route as stale and pruned it, leaving AgentGateway serving zero MCP routes. The UI "Sync with AgentGateway" then discovered nothing (it imports routes *from* the gateway), so the platform deadlocked with no MCP servers. Treat the bootstrap config.yaml /mcp/<id> routes as a protected baseline: always re-render them from their authoritative definition (preserving per-route transformations) and never prune them for being absent from Mongo. Mongo-backed targets are layered on top as dynamic routes that may still be added/pruned; a dynamic target sharing an id with a built-in defers to the built-in. - Add load_builtin_mcp_routes() to parse the shipped bootstrap config - Thread builtin_routes through merge_agentgateway_mcp_routes / reconcile_once - Add pyyaml to the config-bridge image for bootstrap parsing - Add regression tests for empty-Mongo, baseline-loss, and id-collision cases Assisted-by: Claude:claude-opus-4.8 Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Co-authored-by: Cursor <cursoragent@cursor.com>
…e-preserve-builtin-routes fix(agentgateway): protect built-in MCP routes from config-bridge pruning
Member
Author
|
Replaced by #1714, which merges main into release/0.6.0 via a dedicated branch (prebuild/chore/merge-main-into-release-0.6.0) with all 82 conflicts resolved. This PR used head=main directly, which prevented committing conflict resolutions. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
mainintorelease/0.6.0via merge commitTest plan