Skip to content

Commit 4259578

Browse files
maskarbclaude
andauthored
feat: add automated model discovery with feature-flag-gated availability (#772)
## Summary Replaces hardcoded model arrays across 5 components with a single-source model manifest (`models.json`), feature-flag-gated availability via Unleash, and automated Vertex AI endpoint discovery. **Before:** Adding a new model required changes in 5+ files across 3 components, a CI build, and a release. **After:** Add a model ID to the manifest → discovery script probes Vertex AI → Unleash flag auto-created on deploy → admin enables via UI. [RHOAIENG-50567](https://issues.redhat.com/browse/RHOAIENG-50567) ## What changed ### Model Manifest (single source of truth) - **`components/manifests/base/models.json`** — Declarative list of all known models with Vertex AI mappings, provider, and availability. Deployed as a ConfigMap (`ambient-models`) mounted into backend and operator pods. - **`components/manifests/base/kustomization.yaml`** — `configMapGenerator` with `disableNameSuffixHash: true` for stable ConfigMap name. ### Backend (Go) - **`handlers/models.go`** — New `GET /api/projects/:projectName/models` endpoint. Reads manifest from ConfigMap volume, checks workspace-scoped feature flag overrides (ConfigMap), falls back to Unleash global state. Uses `atomic.Pointer` for thread-safe manifest caching. - **`handlers/sessions.go`** — `isModelAvailable()` validation added to `CreateSession` — rejects unknown or disabled models with 400. - **`handlers/featureflags_admin.go`** — Refactored: extracted `setFlagOverride()` shared helper (3 duplicate handlers → one-liners). Added `sanitizeParam()` for log injection prevention on all URL params. Added `errors.IsAlreadyExists` handling for ConfigMap creation race. Switched from `context.Background()` to `c.Request.Context()`. - **`cmd/sync_model_flags.go`** — New `sync-model-flags` subcommand. Ensures every manifest model has a corresponding Unleash flag (disabled, `scope:workspace` tagged). Runs at server startup with 3 retries + exponential backoff. All HTTP calls use `context.Context` for cancellation. - **`featureflags/featureflags.go`** — `initialized` changed to `atomic.Bool` (race fix). Added `IsModelEnabled()` with fail-open semantics (models available when Unleash is down). - **`types/models.go`** — New types: `Model`, `ModelEntry`, `ModelManifest`, `ListModelsResponse`. ### Operator (Go) - **`internal/models/models.go`** — `LoadManifest()` and `ResolveVertexID()` for manifest-driven Vertex AI model ID resolution. Parses `provider` field for future multi-provider support. - **`internal/handlers/sessions.go`** — Injects `LLM_MODEL_VERTEX_ID` env var into runner pods from manifest resolution (replaces reliance on runner's static `VERTEX_MODEL_MAP`). ### Frontend (TypeScript) - **`services/api/models.ts`** + **`services/queries/use-models.ts`** — React Query hook (`useModels`) with `staleTime: 60s` and `enabled` guard. - **`components/create-session-dialog.tsx`** — Model dropdown fetches from API (lazy-loaded when dialog opens). Falls back to hardcoded list if API unavailable. - **`components/workspace-sections/feature-flags-section.tsx`** — Replaced toggle switch with three-state segmented control (Default | On | Off). Invalidates `["models", projectName]` query after saving overrides. - **`types/api/models.ts`** — `LLMModel` and `ListModelsResponse` types. ### Runner (Python) - **`bridges/claude/auth.py`** — `setup_sdk_authentication()` now prefers `LLM_MODEL_VERTEX_ID` (operator-resolved from manifest) over static `VERTEX_MODEL_MAP`. Fallback chain: manifest → static map → default. ### CI/CD - **`.github/workflows/model-discovery.yml`** — Daily GHA workflow probes Vertex AI endpoints via Workload Identity Federation, updates `models.json`, and opens a PR if changes detected. - **`scripts/model-discovery.py`** — Discovery script with retry + exponential backoff on both version resolution and availability probing. Resilient to missing/malformed manifest. Never removes models. ## Model flow (end-to-end) ``` User selects model in UI → Frontend fetches GET /api/projects/:name/models → Backend reads manifest (ConfigMap volume) → Backend checks workspace override (ConfigMap) then Unleash → Returns filtered model list User creates session with model → Backend validates model via isModelAvailable() → Backend creates AgenticSession CR → Operator resolves vertexId from manifest → Operator injects LLM_MODEL_VERTEX_ID into runner pod → Runner prefers manifest-resolved ID over static map ``` ## Security - All user-facing reads use `reqK8s` (user-scoped K8s client) for RBAC enforcement - `sanitizeParam()` strips `\n`/`\r` from all URL params before logging - `url.PathEscape()` on all Unleash Admin API URL path segments - Generic error messages to users; detailed context in server logs only - ConfigMap creation race handled with `errors.IsAlreadyExists` fallback - GCP access token wrapped in `RuntimeError` to prevent leak in tracebacks ## Tests | Component | Tests | Coverage | |-----------|-------|----------| | Backend handlers | 14 cases (`models_test.go`) | Auth, overrides, caching, fallback, unavailable models | | Backend cmd | 8 cases (`sync_model_flags_test.go`) | Flag creation, conflict, skip, error, CLI args | | Backend featureflags | 5 cases (`featureflags_test.go`) | Unconfigured state, fail-open for models | | Operator models | 7 cases (`models_test.go`) | Load, resolve, missing, malformed, unavailable | | Runner auth | 9 cases (`test_claude_auth.py`) | Vertex ID priority, static map, API key mode | ## Test plan - [ ] `cd components/backend && go vet ./... && go test -tags test ./... && golangci-lint run` - [ ] `cd components/operator && go vet ./... && go test ./...` - [ ] Deploy to cluster — verify `GET /api/projects/:name/models` returns filtered list - [ ] Verify model dropdown in create session dialog populates from API - [ ] Toggle a model flag in Feature Flags admin → verify model list updates - [ ] Create session with a disabled model → verify 400 rejection - [ ] Verify `sync-model-flags` logs show flags created/skipped on startup 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 56a196e commit 4259578

30 files changed

Lines changed: 2559 additions & 436 deletions

File tree

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
name: Model Discovery
2+
3+
on:
4+
schedule:
5+
- cron: "0 6 * * *" # Daily at 06:00 UTC
6+
workflow_dispatch: {}
7+
8+
concurrency:
9+
group: model-discovery
10+
cancel-in-progress: false
11+
12+
jobs:
13+
discover:
14+
runs-on: ubuntu-latest
15+
16+
permissions:
17+
contents: write
18+
pull-requests: write
19+
id-token: write # For Workload Identity Federation
20+
21+
steps:
22+
- name: Checkout code
23+
uses: actions/checkout@v6
24+
25+
- name: Set up Python
26+
uses: actions/setup-python@v6
27+
with:
28+
python-version: "3.11"
29+
30+
- name: Authenticate to Google Cloud
31+
uses: google-github-actions/auth@v3
32+
with:
33+
project_id: ${{ secrets.GCP_PROJECT }}
34+
workload_identity_provider: ${{ secrets.GCP_WORKLOAD_IDENTITY_PROVIDER }}
35+
36+
- name: Set up gcloud CLI
37+
uses: google-github-actions/setup-gcloud@v3
38+
39+
- name: Run model discovery
40+
env:
41+
GCP_REGION: ${{ secrets.GCP_REGION }}
42+
GCP_PROJECT: ${{ secrets.GCP_PROJECT }}
43+
run: python scripts/model-discovery.py
44+
45+
- name: Check for changes
46+
id: diff
47+
run: git diff --quiet && echo "changed=false" >> "$GITHUB_OUTPUT" || echo "changed=true" >> "$GITHUB_OUTPUT"
48+
49+
- name: Create or update PR
50+
if: steps.diff.outputs.changed == 'true'
51+
uses: peter-evans/create-pull-request@v8
52+
with:
53+
branch: automated/model-discovery
54+
commit-message: "chore: update model manifest from Vertex AI discovery"
55+
title: "chore: update model manifest"
56+
body: |
57+
Automated model discovery run.
58+
59+
This PR updates `components/manifests/base/models.json` based on
60+
probing Vertex AI endpoints. New models are added with `available: true`.
61+
62+
Unleash flags are synced automatically on deploy by the
63+
`sync-model-flags` Job. After merge, enable new models via the
64+
Feature Flags admin UI.
65+
labels: automated,models
66+
delete-branch: true

Makefile

Lines changed: 13 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -54,14 +54,17 @@ else
5454
QUIET_REDIRECT := >/dev/null 2>&1
5555
endif
5656

57-
# Image tags
58-
FRONTEND_IMAGE ?= vteam_frontend:latest
59-
BACKEND_IMAGE ?= vteam_backend:latest
60-
OPERATOR_IMAGE ?= vteam_operator:latest
61-
RUNNER_IMAGE ?= vteam_claude_runner:latest
62-
STATE_SYNC_IMAGE ?= vteam_state_sync:latest
63-
PUBLIC_API_IMAGE ?= vteam_public_api:latest
64-
API_SERVER_IMAGE ?= vteam_api_server:latest
57+
# Image tag (override with: make build-all IMAGE_TAG=v1.2.3)
58+
IMAGE_TAG ?= latest
59+
60+
# Image names
61+
FRONTEND_IMAGE ?= vteam_frontend:$(IMAGE_TAG)
62+
BACKEND_IMAGE ?= vteam_backend:$(IMAGE_TAG)
63+
OPERATOR_IMAGE ?= vteam_operator:$(IMAGE_TAG)
64+
RUNNER_IMAGE ?= vteam_claude_runner:$(IMAGE_TAG)
65+
STATE_SYNC_IMAGE ?= vteam_state_sync:$(IMAGE_TAG)
66+
PUBLIC_API_IMAGE ?= vteam_public_api:$(IMAGE_TAG)
67+
API_SERVER_IMAGE ?= vteam_api_server:$(IMAGE_TAG)
6568

6669
# Podman prefixes image names with localhost/ — kind load needs to use the same
6770
# name so containerd can match the image reference used in the deployment spec
@@ -153,8 +156,8 @@ build-runner: ## Build Claude Code runner image
153156
build-state-sync: ## Build state-sync image for S3 persistence
154157
@echo "$(COLOR_BLUE)$(COLOR_RESET) Building state-sync with $(CONTAINER_ENGINE)..."
155158
@cd components/runners/state-sync && $(CONTAINER_ENGINE) build $(PLATFORM_FLAG) $(BUILD_FLAGS) \
156-
-t vteam_state_sync:latest .
157-
@echo "$(COLOR_GREEN)$(COLOR_RESET) State-sync built: vteam_state_sync:latest"
159+
-t $(STATE_SYNC_IMAGE) .
160+
@echo "$(COLOR_GREEN)$(COLOR_RESET) State-sync built: $(STATE_SYNC_IMAGE)"
158161

159162
build-public-api: ## Build public API gateway image
160163
@echo "$(COLOR_BLUE)$(COLOR_RESET) Building public-api with $(CONTAINER_ENGINE)..."

0 commit comments

Comments
 (0)