Skip to content

chore: merge main into release/0.6.0#1709

Closed
sriaradhyula wants to merge 850 commits into
release/0.6.0from
main
Closed

chore: merge main into release/0.6.0#1709
sriaradhyula wants to merge 850 commits into
release/0.6.0from
main

Conversation

@sriaradhyula
Copy link
Copy Markdown
Member

Summary

  • Brings 850 commits from main into release/0.6.0 via merge commit
  • Preserves the 92 release-specific commits (rc bumps, hotfixes, UI panel system, OAuth proxy fix)

Test plan

  • No regressions in existing release/0.6.0 functionality
  • CI passes after merge

caipe-ci-release Bot and others added 30 commits May 28, 2026 17:21
Production installs need Keycloak clients to use real configured secrets
instead of silently keeping placeholder values. Reconcile managed clients
through the init scripts and cover the strict-secret path so token
exchange setup fails visibly when required material is missing.

Assisted-by: Cursor:GPT-5.5
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Application routes should no longer collapse onto a broad supervisor
invoke check when the OpenFGA model exposes narrower organization
capabilities. Map protected route groups to explicit relations and make
integration access-check oracles require read access before returning
resource grant details.

Assisted-by: Cursor:GPT-5.5
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
The legacy withAuth wrapper should produce auditable capability rows even
for fallback routes. Map platform config reads to system_config#read and
fall back to admin UI view/manage checks instead of the old supervisor
umbrella capability.

Assisted-by: Cursor:GPT-5.5
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Credential exchange should use the same envelope-store defaults across
factory creation and persistence tests. Keep this separate from RBAC UI
changes because it affects credential handling behavior rather than route
or policy wiring.

Assisted-by: Cursor:GPT-5.5
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
RAG authorization now depends on token-backed OpenFGA decisions instead
of local group-role defaults. Stop propagating the legacy group fallback
settings during setup and migration docs so local and Helm installs do
not mask deny-by-default policy behavior.

Assisted-by: Cursor:GPT-5.5
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Keep generated rag-stack chart reference changes separate from the RBAC
implementation work so reviewers can treat them as documentation output.

Assisted-by: Cursor:GPT-5.5
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Capture the intended Docker build-cache work before implementation so UI
and supervisor image changes can be reviewed against explicit cache and
runtime-preservation requirements.

Assisted-by: Cursor:GPT-5.5
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
The RBAC model needs one canonical generated artifact and explicit
organization capabilities for fine-grained application route gates. Keep
the source FGA, chart JSON, cleanup of duplicate artifacts, and model
coverage together so downstream PDP changes can rely on one model shape.

Assisted-by: Cursor:GPT-5.5
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Record the explicit withAuth compatibility capability mapping so the canonical RBAC architecture doc stays aligned with the API middleware gate changes.

Assisted-by: Cursor:GPT-5.5
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Add the newly mapped withAuth resources and scopes to the shared RBAC types and let chat route tests pass the compatibility chat capability before route-level sharing checks.

Assisted-by: Cursor:GPT-5.5
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
feat(openfga): add route capabilities to canonical model
…rict-secrets

feat(keycloak): reconcile strict client secrets
…mplate

Signed-off-by: Kevin Kantesaria <kkantesaria@splunk.com>
…mplates

Signed-off-by: Kevin Kantesaria <kkantesaria@splunk.com>
…yment env blocks

Signed-off-by: Kevin Kantesaria <kkantesaria@splunk.com>
…hardcoded caipe

Signed-off-by: Kevin Kantesaria <kkantesaria@splunk.com>
Merge the latest main branch into the UI RBAC gate PR so the
branch is mergeable again while preserving the explicit withAuth
capability typing from the PR.

Assisted-by: Cursor:GPT-5.5
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
feat(ui): map withAuth routes to RBAC capabilities
Local and Helm deployments need a consistent way to route managed MCP
servers through AgentGateway while preserving OpenFGA authorization. Add
the compose config bridge, generated route template, and MCP-only compose
profile wiring so local testing matches the gateway-backed path.

Assisted-by: Cursor:GPT-5.5
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
GitHub and GitLab MCP targets need provider PAT placeholders to survive
both static config rendering and Mongo-backed route reconciliation. Add
secret-backed Helm environment wiring and regression coverage so the
bridge does not drop backendAuth when it refreshes AgentGateway routes.

Assisted-by: Cursor:GPT-5.5
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
The AgentGateway coarse ext_authz layer checks mcp_gateway caller access
before route-level policy runs. Add that tuple to baseline member grants
so admitted users can reach MCP routes while still relying on the finer
resource checks behind the gateway.

Assisted-by: Cursor:GPT-5.5
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Account for the rebased Knowledge Base MCP target and expanded baseline OpenFGA grant profile so the AgentGateway bridge tests match current main behavior.

Assisted-by: Cursor:GPT-5.5
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
caipe-ci-release Bot and others added 24 commits June 3, 2026 15:49
…fi-base

fix(dynamic-agents): migrate image to wolfi-base
fix(rag): migrate rag-server image to wolfi-base
Co-authored-by: Cursor <cursoragent@cursor.com>

# Conflicts:
#	pyproject.toml
#	uv.lock
chore(ui): upgrade caipe-ui base image to Node 24
…UX (#1696)

* feat(slack-ui): config parity, channel-admin editing, and admin save UX

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Kevin Kantesaria <kkantesaria@splunk.com>

* chore: bump version to 0.5.7-dev.2

* test(slack-ui): fix Webex panel selectors and dedupe platform-settings import

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Kevin Kantesaria <kkantesaria@splunk.com>

* refactor(slack-ui): extract Slack channel detail into provider components

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Kevin Kantesaria <kkantesaria@splunk.com>

* fix(slack-ui): hash Slack token for in-process cache key (#1700)

* feat(slack-bot): authorize bot service account for BFF platform-config reads

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Kevin Kantesaria <kkantesaria@splunk.com>

* fix(slack-ui): hash Slack token for in-process cache key instead of raw suffix

token.slice(-12) stored a raw fragment of the Slack bot token as a Map key
in module-level memory. Replace with a SHA-256 hash (first 16 hex chars) in
both the emoji and users-lookup routes so the token is never exposed in
process memory or debug dumps.

Assisted-by: Claude:claude-sonnet-4-6
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>

---------

Signed-off-by: Kevin Kantesaria <kkantesaria@splunk.com>
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Co-authored-by: Kevin Kantesaria <kkantesaria@splunk.com>

---------

Signed-off-by: Kevin Kantesaria <kkantesaria@splunk.com>
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Co-authored-by: Kevin Kantesaria <kkantesaria@splunk.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sri Aradhyula <sraradhy@cisco.com>
Let each user narrow which OAuth scopes their own provider connection
requests via an "Advanced settings" panel on My Connections. The
connector's allowed `scopes` is both the admin-managed upper bound and
the default selection.

- connect route accepts an optional `?scopes=` selection
- startConnection validates via the pure boundScopes guard, rejecting
  out-of-bounds or empty selections with 400 and never minting a
  zero-scope token
- the choice is carried through the signed OAuth state cookie and
  persisted as requestedScopes/grantedScopes on provider_connections so
  relink pre-fills the prior choice and the grant is auditable

Existing connections without these fields and connects that do not open
Advanced settings behave exactly as before (connector default).

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Docusaurus 3 (MDX) evaluates `{...}` as JS expressions, so the literal
scope-set notation `{A, B, C}` raised "ReferenceError: A is not defined"
and failed the static site build. Wrap the brace sets in backticks so
they render as literal text.

Assisted-by: Claude:claude-opus-4.8
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
…org KB admin (#1703)

* feat(rbac): fix RAG datasource access gap, add public datasources, reorg KB admin

Signed-off-by: Kevin Kantesaria <kkantesaria@splunk.com>

* chore: bump version to 0.5.7-dev.7

---------

Signed-off-by: Kevin Kantesaria <kkantesaria@splunk.com>
Co-authored-by: Kevin Kantesaria <kkantesaria@splunk.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…ope-selection

feat(credentials): per-user OAuth scope selection at connect time
…ce (#1702)

* fix(mcp): provider-token auth, knowledge-base RAG, and authz resilience

Bundles the entangled MCP authorization, provider-token, and platform
fixes that share gateway config, dynamic-agents, and RBAC doc files.

MCP authz resilience (spec 2026-06-02-mcp-authz-resilience):
- raise the AgentGateway ext_authz timeout default to 10s across the
  static Helm path and the dev compose/config-bridge path, keeping the
  fail-closed posture (operator-tunable via global.agentgateway.extAuth.timeout)
- bounded transient-retry with backoff when loading MCP tools so
  cold-start authz timeouts self-heal; classify transient/not-ready vs
  permanent failures and surface honest availability messaging

Provider-token unification:
- route all provider/identity tokens through the X-CAIPE-Provider-Token
  -> Authorization: Bearer gateway transformation
- Jira (cloudId rewrite), PagerDuty (bearer or static Token fallback),
  GitHub/GitLab hybrid (per-user OAuth with org-PAT fallback)
- knowledge-base/RAG: forward the caller user JWT, or mint a
  caipe-platform client-credentials service token in non-user contexts;
  probe path resolves the same credential so it no longer 401s

config-bridge also prunes /mcp/<id> routes whose mcp_servers row was
removed (reconciler is the single source of truth for /mcp/* routes).

Also includes independent fixes that touched shared files:
- webex bot: reuse the WDM device across reconnects and de-duplicate
  redelivered message ids
- connector onboarding UX: smart team/agent fallback, skip blocked rows
  instead of stranding the batch, treat bare-gateway discovery rows as
  auto-migratable
- credentials: CREDENTIAL_ALLOW_INSECURE_LOCAL_KEY_WRAP dev-only escape
  hatch so the local-cmk wrapper can run on the prod-parity UI image

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

* chore: bump version to 0.5.7-dev.7

* fix(docs): escape brace set in MCP authz resilience spec for MDX

Docusaurus 3 (MDX) evaluates `{...}` as JS expressions, so the literal
classification set `{transient, permanent, denied}` raised a ReferenceError
and failed the static site build. Wrap it in backticks to render literally.

Assisted-by: Claude:claude-opus-4.8
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(ui): reconcile admin onboarding with #1696; fix discovery test + lint

#1696 ("slack-ui config parity") landed a refactor that replaced the
ConnectorAdminPanel onboarding-defaults subsystem. Reconcile this branch
with that design:

- ConnectorAdminPanel.tsx: drop the saved-defaults-based smart team/agent
  fallback (incompatible with #1696's apply-time defaults model); take
  #1696's discovery behavior. Revert the two smart-fallback Webex tests.
- Keep the compatible onboarding niceties: ConnectorOnboardingWizard
  skip-blocked-rows and the Slack adapter's ready-row import filter.
- agentgateway-mcp-discovery.test.ts: drop "rag" from the expected dev MCP
  targets (the rag route was removed from the AgentGateway configs in this PR).
- ChatPanel.tsx: prefer-const on a never-reassigned local that ESLint flagged
  as a hard error (unblocks the lint gate).

Assisted-by: Claude:claude-opus-4.8
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

* chore(ui): remove accidentally committed node_modules symlink

A local node_modules symlink (used for test tooling in an isolated
worktree) slipped past .gitignore because the ignore pattern matches the
directory form, not a symlink file. Remove it so it never reaches the repo.

Assisted-by: Claude:claude-opus-4.8
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(jira): stop logging credential-derived request URL (CodeQL high)

The Jira MCP client logged the full request URL at debug level. That URL is
constructed from the credential-bearing `prerequisites` (token/email/url),
so CodeQL flagged it as clear-text logging of sensitive information (1 high).
Log only the HTTP method and request path instead — sufficient for debugging
routing and never carrying the token.

Assisted-by: Claude:claude-opus-4.8
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

* refactor(dynamic-agents): narrow MCP load retry catch to Exception

The per-server retry helper in get_tools_with_resilience caught
BaseException, which also swallowed KeyboardInterrupt, SystemExit, and
asyncio.CancelledError -- so a cancellation/shutdown could be classified
and retried instead of propagating.

Narrow the catch (and the related type hints / isinstance guard) to
Exception. Transient failures we actually retry (timeouts, mid-stream
disconnects) are all Exception subclasses, so retry and classification
behavior is unchanged, while cancellation and interpreter shutdown now
propagate correctly. The flake8-blind-except (BLE001) suppression is kept
since a broad Exception catch is still intentional here.

Addresses github-code-quality review on PR #1702.

Assisted-by: Claude:claude-opus-4.8
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* docs(agents): add spec readability rules

Assisted-by: Codex:gpt-5
Signed-off-by: subbaksh <subbaksh@cisco.com>

* docs(agents): consolidate agent guidance

Assisted-by: Codex:gpt-5
Signed-off-by: subbaksh <subbaksh@cisco.com>

* docs(agents): require explicit human signoff

Assisted-by: Codex:gpt-5
Signed-off-by: subbaksh <subbaksh@cisco.com>

* docs(agents): add assisted-by example

Assisted-by: Codex:gpt-5
Signed-off-by: subbaksh <subbaksh@cisco.com>

---------

Signed-off-by: subbaksh <subbaksh@cisco.com>
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Co-authored-by: Sri Aradhyula <sraradhy@cisco.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(ui): collapse consecutive identical tool chips in timeline

Chatty agents (e.g. AWS aws_cli_execute) fire the same tool many times in
a row, flooding the chat timeline with identical "tool -> tool done" chips.
Coalesce consecutive identical tool calls into a single row with a xN count
while keeping the dropdown header's true call total.

Assisted-by: Claude:claude-opus-4.8
Co-authored-by: Cursor <cursoragent@cursor.com>

* test(ui): cover consecutive tool-chip dedupe in timeline

Assisted-by: Claude:claude-opus-4.8
Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(ui): correct code editor overflow and dark theme

Signed-off-by: Oliver Cowley <ocowley+cisco-dev@cisco.com>

* refactor(ui): clarify comments and simplify some boolean evaluation

Signed-off-by: Oliver Cowley <ocowley+cisco-dev@cisco.com>

* fix(ui): missing UI file changes

Signed-off-by: Oliver Cowley <ocowley+cisco-dev@cisco.com>

* test(ui): add tests for new cases and test file for RichCodeEditor

Signed-off-by: Oliver Cowley <ocowley+cisco-dev@cisco.com>

* refactor(ui): coerce value to boolean for clarity

Signed-off-by: Oliver Cowley <ocowley+cisco-dev@cisco.com>

---------

Signed-off-by: Oliver Cowley <ocowley+cisco-dev@cisco.com>
)

* feat(ui): one-click "Migrate all to latest" for schema migrations

The admin Schema Migrations tab required ~4 steps (initialize-to-v1,
select-all, preview, type-confirmation + apply). Collapse the happy path
to a single primary "Migrate all to latest" action with one confirm
dialog (inline preview, no typed phrase), and move the per-migration /
bulk controls behind an "Advanced controls" toggle.

Server: new applyAllMigrations() orchestrator + POST
/api/admin/rebac/migrations/apply-all route. It bootstraps unversioned
schema areas to v1, then applies every pending implemented migration in
dependency order (topological sort over declared `dependencies`, fixing
the prior release/area ordering that ignored dependencies), continuing
on per-migration failure and returning an aggregated apply report.
Single round-trip instead of the previous N+1 client-side calls.

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

* chore: bump version to 0.5.4-dev.1

* test(ui): expand Advanced controls in MigrationTab tests

The "Migrate all to latest" feature moved the per-migration list,
select-all controls, and the completed-migrations toggle behind a
collapsed "Advanced controls" disclosure. The existing MigrationTab
tests assert on that content immediately after render, so all 9 of
them broke.

Add an openAdvancedControls() helper and call it after render in the
affected tests so they exercise the per-migration UI through the new
disclosure. Test-only change; the feature UX is unchanged.

Assisted-by: Cursor:claude-opus-4.8
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

* chore: bump version to 0.5.7-dev.2

* chore: bump version to 0.5.7-dev.9

---------

Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…tology-wolfi-base

Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>

# Conflicts:
#	pyproject.toml
#	uv.lock
…fi-base

fix(rag): migrate agent-ontology image to wolfi-base
…ning

The config-bridge reconciler assumed it owned every /mcp/<id> route and
rendered them solely from the mcp_servers Mongo collection. The 14 built-in
MCP servers are shipped statically in deploy/agentgateway/config.yaml and are
never written to Mongo. With an empty (or freshly reset) mcp_servers
collection, the reconciler classified every built-in route as stale and pruned
it, leaving AgentGateway serving zero MCP routes. The UI "Sync with
AgentGateway" then discovered nothing (it imports routes *from* the gateway),
so the platform deadlocked with no MCP servers.

Treat the bootstrap config.yaml /mcp/<id> routes as a protected baseline:
always re-render them from their authoritative definition (preserving per-route
transformations) and never prune them for being absent from Mongo. Mongo-backed
targets are layered on top as dynamic routes that may still be added/pruned; a
dynamic target sharing an id with a built-in defers to the built-in.

- Add load_builtin_mcp_routes() to parse the shipped bootstrap config
- Thread builtin_routes through merge_agentgateway_mcp_routes / reconcile_once
- Add pyyaml to the config-bridge image for bootstrap parsing
- Add regression tests for empty-Mongo, baseline-loss, and id-collision cases

Assisted-by: Claude:claude-opus-4.8
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
…e-preserve-builtin-routes

fix(agentgateway): protect built-in MCP routes from config-bridge pruning
@sriaradhyula
Copy link
Copy Markdown
Member Author

Replaced by #1714, which merges main into release/0.6.0 via a dedicated branch (prebuild/chore/merge-main-into-release-0.6.0) with all 82 conflicts resolved. This PR used head=main directly, which prevented committing conflict resolutions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

8 participants