feat(w11): expand capability catalog to 66 entries + SQL generator + safety guards#3317
Open
wuyuanfr wants to merge 28 commits into
Open
feat(w11): expand capability catalog to 66 entries + SQL generator + safety guards#3317wuyuanfr wants to merge 28 commits into
wuyuanfr wants to merge 28 commits into
Conversation
…ggestion path raises
The connectivity check endpoint /model/temporary_healthcheck runs
_capacity_suggestion_for_model_request inline after a successful
verify_model_config_connectivity. Per W11 spec ("Suggestion failure
never changes connectivity success or failure"), an unexpected error
inside the suggestion path must not turn a successful connectivity
result into HTTP 500.
The prior code caught ValueError (covering the typed InvalidInput case
and Pydantic v2 ValidationError, which is a ValueError subclass), but
non-ValueError exceptions -- e.g. AttributeError/TypeError from a
malformed catalog profile entry, or future V2 provider-discovery HTTP
errors -- would propagate to the outer except Exception in
check_temporary_model_health and surface to operators as a misleading
"Failed to verify model connectivity" 500.
Restore the catch-all degrade-to-None branch and log at WARNING (not
DEBUG) so the real root cause is visible in default production log
streams without DEBUG enabled. Connectivity stays 200 with
capacity_suggestion: null; the per-row catalog issue surfaces in logs
where operators can act on it.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The Add dialog had two ways to trigger a catalog suggestion: clicking
the bottom connectivity-validation button (which the backend extends
with capacity_suggestion in /temporary_healthcheck's response) and a
secondary "Check" button beside the toggle that called the standalone
/suggest-capacity endpoint. In V1 catalog-only mode the two paths
overlap on every realistic add flow -- the user must run connectivity
anyway because the Add button is gated on it -- so the standalone
button is UX noise without functional value. Collapse Add to a single
toggle whose state gates both the embedded suggestion result and the
explanatory hint.
The Edit dialog keeps its explicit Check button per spec ("show
'Suggestion available' after validation or explicit check") because
existing rows may need to refresh a suggestion without re-running
connectivity, but the long-form hint sentence is redundant: title +
toggle + a button labelled "Check" already names the feature and the
action. Removing the hint matches the spec's i18n key list, which
never listed model.dialog.capacity.suggestion.hint to begin with.
Add dialog changes:
- Drop checkingCapacitySuggestion state, canSuggestCapacity guard,
and handleSuggestCapacity handler.
- Drop the secondary Button and its wrapping shrink-0 flex container;
the Switch becomes a direct child of the outer justify-between row.
- Drop the suggestionLoading prop from ModelCapacityFields entirely.
It only controlled the spinner on the "Use suggestion" button inside
the suggestion-result panel, which only renders after a suggestion
is set -- at which point verifyingConnectivity is already false, so
binding it added no observable effect.
- Replace the shared "hint" copy with a new key "hintAdd" whose
wording reflects the actual trigger ("Suggested from the approved
catalog after connectivity passes."), and gate it on
capacitySuggestionEnabled so the toggle's off-state no longer
contradicts itself with copy that promises automatic behavior.
Edit dialog changes:
- Remove the hint <div> and its wrapping container; the title becomes
a direct flex child alongside the Switch+Check controls.
i18n:
- Drop the obsolete "model.dialog.capacity.suggestion.hint" key from
en and zh; add "hintAdd" used only by Add dialog.
No backend wire change. Edit dialog still calls /suggest-capacity
through its existing Check button for the bare-row repair flow.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ge test
Phase 1.5 backend foundation per W11 spec L706-710 (SLO metrics),
L86-89/L944-948 (visibility env flag), and L312-322 (cross-tenant test).
No frontend change in this commit; V1.5 surfaces consume these signals in
follow-up frontend commits.
Metrics (4 instruments, each guarded behind try/except so a missing
OpenTelemetry runtime does not break the dispatch path):
1. model_capacity_suggestion_requests_total{match_kind, model_type,
provider} -- counter wrapping suggest_capacity. Drives the
"70% of new manual-add LLM rows produce match_kind != none" SLO.
2. model_capacity_suggestion_latency_ms{match_kind, provider} --
histogram around the same call. Used to verify V2 provider-discovery
p95 stays under the model-add latency budget.
3. model_capacity_suggestion_accept_total{match_kind, provider} --
counter emitted by the app layer when the operator save payload
carries accepted_suggestion_match_kind. Numerator for the
"95% accepted -> profile dispatch" SLO ratio.
4. model_capacity_suggestion_dispatch_profile_hit_total{provider} --
counter emitted in _resolve_input_budget when the resolved snapshot
carries a non-null capability_profile_version. Denominator for the
same SLO.
Accept signal pipe (audit-only):
- consts/model.py: ModelRequest gains accepted_suggestion_match_kind
and accepted_capability_profile_version. Both Optional[str], never
persisted to model_record_t.
- model_management_service.py: pop_capacity_accept_signal strips both
fields from save payloads and returns the popped values so the app
layer can label the counter.
- model_managment_app.py: /create and /update endpoints call
pop_capacity_accept_signal before invoking the service, then forward
the popped match_kind to _record_capacity_suggestion_accept after the
save returns. The dict the service sees no longer contains these
fields, preserving the "audit only -- not persisted" contract.
- The V1.5 frontend (next commit) will ship these fields on the wire;
until then the counter reads zero, which is the correct baseline.
suggest_capacity refactor:
- Inner body extracted to _suggest_capacity_inner so the public
function can time end-to-end and emit requests_total + latency_ms
exactly once per completed call. ValueError paths still raise --
client-shape errors must not pollute SLO ratios so the recorder
fires only on terminal CapacitySuggestionResult returns.
Visibility env flag (CAPACITY_VISIBILITY_ENABLED):
- Already declared in consts/const.py (default true) and consumed by
get_capacity_coverage. Confirmed wired end-to-end; no code change
needed here. The flag stays the developer-level rollback lever per
W11 spec; tenant_config_t overlay remains a follow-up.
Cross-tenant isolation test (spec L312-322):
- test_get_capacity_coverage_cross_tenant_isolation routes mocked
get_model_records by tenant_id and asserts each tenant only sees
its own bare rows in both bare_models[] and total_llm_vlm. Closes
the spec's required "tenant B row must not appear in tenant A's
response" coverage.
Test coverage added:
- Cross-tenant isolation for /capacity-coverage.
- pop_capacity_accept_signal extraction + dict mutation contract.
- accept_total OTel-optional no-op + label-cardinality (lower-cased
provider) wiring.
- suggest_capacity records requests_total + latency_ms on catalog
match, on "none" with provider fallback to "unknown", does NOT
record on ValueError, and runs cleanly when instruments are None.
- _resolve_input_budget records dispatch_profile_hit_total only when
capability_profile_version is non-null; recorder no-op when counter
is None.
Total: 8 files, +527 lines. All targeted unit suites pass
(test_model_capacity_suggestion_service 16/16,
test_model_management_service 70/70,
test_create_agent_info 174/174).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Mark bare-capacity LLM/VLM rows in the Manage Models list with the existing yellow "缺容量" / "Missing capacity" tag. Keep the aggregation banner on the Models page as the entry-point signal, but rewrite its copy to hand off to the per-row tag instead of duplicating per-row UI. Auto-fire /suggest-capacity from inside ModelEditDialog whenever it opens on a bare-capacity row, regardless of how the dialog was opened. Expose preset selectors on the capacity panel and ship the model-management permission helper for V1.5 surfaces #2/#3. Per spec line numbers cross-referenced inline: #1 -- per-row tag as visual indicator (spec L143-167): - Both badge sites in ModelDeleteDialog (provider-browser row L1507+ and added-model row L1652+) retain the existing yellow text tag (bg-yellow-100 border-yellow-200 text-yellow-700). We considered a warning-triangle icon and a separate click-target on the badge, then rolled both back: "缺容量"/"Missing capacity" reads as a status at the same glance an icon would, while the existing row onClick already opens the edit dialog -- so a button on the badge added complexity that ModelEditDialog now subsumes internally. - ModelEditDialog derives `isBareCapacityModel` from the loaded model (context_window_tokens or max_output_tokens null) and a single useEffect auto-fires handleSuggestCapacity once on open when the model is bare, the suggestion switch is on, and the form fields needed for the call are present. Any entry path -- row click, future gear-icon shortcut, deep link -- gets the same affordance, so the operator never has to also click "Check" on a bare row. - The deprecated model.dialog.capacityCoverage.{tag, warning, warningWithSuggestion} keys are dropped from en + zh in favour of a single spec-namespaced model.list.capacityWarning.tag key. No per-suggestion variants because the tag is purely a state label; the suggestion handoff happens inside the edit dialog where the green/info Alert carries that nuance instead. #5 -- aggregation banner kept as entry-point signal, copy retuned: - The summary Alert on the Models page (modelConfig.tsx) stays -- per-row tags live inside ModelDeleteDialog which is one click away. Without the banner, users on the Models page have no signal that any row needs attention. - Description copy rewritten so the banner points at the new per-row flow: "Click Manage, then click the warning icon on each affected row to repair." Removes the redundant "edit a marked model" wording. - Warning copy adds an "output token cap is not enforced" clause so the consequence (not just the symptom) is visible at a glance. #4 -- permission helper (spec L167-178): - frontend/lib/auth.ts gains canManageModels(role, isSpeedMode). Allowed roles: SU, ADMIN, DEV, SPEED. USER is excluded so regular agent authors see read-only notices rather than dead repair links. ASSET_OWNER is excluded -- model records are tenant scope, not asset-admin scope. Speed mode bypasses for the single-user dev experience, mirroring how other surfaces (chatHeader, etc.) treat it. - The banner and tag in this commit both live on /models which is already route-gated for non-USER roles, so no in-place gate is needed yet. The helper exists so the V1.5 agent-edit-selector commit (#2) and the dashboard widget commit (#3) consume the same primitive instead of reinventing role parsing. #8 -- preset selectors for context_window / output_reserve / max_output (spec L757-790): - ModelCapacityFields.tsx gains two preset arrays mirroring spec L767-790 verbatim (9 context-window values 4K..1M, 7 output values 256..16K). The context-window list is identical to MAX_TOKEN_OPTIONS in ModelMaxTokensInput; kept as a local constant rather than cross-importing so the two surfaces stay independently editable. - renderNumberInput gains an optional `presetOptions` parameter. When the field has no catalog suggestion yet (per spec L762-765 "when no suggestion exists ... render as preset-capable selector"), the input renders as AutoComplete with the preset list; otherwise it stays a plain numeric Input so an explicit catalog value doesn't get visually buried behind dropdown chrome. - Wired for contextWindowTokens, maxOutputTokens, and defaultOutputReserveTokens. maxOutputTokens reuses the 256..16K list so operators see the same dropdown choices they already see for the reserve field; values above 16K (e.g. GPT-4.1's 32K cap, GLM-5.1's 131K cap) still work via free-text typing through AutoComplete. maxInputTokens keeps plain numeric input -- it is an explicit operator-side limit, not common-preset land. - validateCapacityForm continues to enforce positive integers downstream. i18n delta summary: - DROPPED: model.dialog.capacityCoverage.tag, model.dialog.capacityCoverage.warning, model.dialog.capacityCoverage.warningWithSuggestion - ADDED: model.list.capacityWarning.tag (single state label, no tooltip variants) - REVISED (kept): modelConfig.capacityCoverage.warning + description with new entry-point copy; .manage button label unchanged. Net: 6 files, +148/-77. Typecheck clean (only pre-existing .next/types/validator.ts noise from the unrelated left-nav rename). No backend wire change. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…o-suggest population guard Two paired bugs in the V1.5 auto-suggest path, both surfacing as "open glm-5 shows qwen3.7-max suggestion" after the operator cancels qwen and immediately clicks glm-5 in the Manage Models list: 1. Stale render. ModelEditDialog returns null when `model` is falsy (line ~559) but React does not unmount on null return -- it just commits null and keeps the component instance alive, useState intact. With React 18's automatic batching, the cancel and the subsequent row click coalesce into one commit; the [isOpen] reset effect I added in e442a55 saw isOpen=true on its single run and skipped the cleanup, so capacitySuggestion stayed as qwenResult for the first render with model=glm5. The user briefly saw the wrong suggestion before the [model] effect cleared it. 2. Stale API call. Even after the first render flickered to qwen and then to null, the auto-suggest effect fired with closure-captured form values that were still qwen's (form was a single useState instance, the [model] effect's setForm had not been flushed yet at the time the auto-suggest effect ran in the same commit cycle). modelService.suggestCapacity({ modelName: "qwen3.7-max", ... }) was sent to the backend, and /suggest-capacity dutifully returned qwen3.7-max@1. The request token from the earlier amend did not help here because the API call was not racing -- it was sending the wrong input. Fixes in this commit: a) ModelDeleteDialog passes `key={editModel?.displayName || "__none__"}` to ModelEditDialog. Each new editModel forces a full unmount + remount, which resets every useState/useRef to its initial value. That eliminates the stale-render path (1). b) ModelEditDialog auto-suggest effect depends on `form.name` and `form.url` in addition to `[isOpen, isBareCapacityModel, capacitySuggestionEnabled]`. On a fresh mount, form starts empty (useState defaults); canSuggestCapacity() is false on the first pass so we do not fire. After the [model] effect's setForm re-renders, form.name and form.url change, the effect re-runs, canSuggestCapacity() now returns true with the correct values, and we send the API request scoped to the new model. That fixes the stale-input path (2). c) `autoSuggestFiredRef = useRef(false)` guards against re-firing when the operator subsequently types into the name or url fields. We still want exactly one auto-suggest per dialog instance, and thanks to (a) one instance == one model. Dead code removed: - The [isOpen] reset effect from e442a55. Key-based remount supersedes it: the component is unmounted on close, so there is no state to reset. - Its companion comments about "reset on close" semantics. Retained: - suggestionRequestRef token logic in handleSuggestCapacity. Covers a separate concern (rapid manual Check clicks on the same model with different inputs, where the older response must not overwrite the newer one). Key remount does not address this because there is no model swap. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…gnal SLO wiring Closes Week N+2/N+3 punch list for W11 V1.5. UI surfaces (#2 + #3): - Agent-edit model selector: bare-capacity subtitle on dropdown items and a non-blocking form Alert above Save when a bare model is picked. Admin/dev/su/speed see "fix in Model Management", others see "ask administrator". Permission gate via canManageModels(). - ModelCapacityCoverageWidget renders at top of resource-manage Models tab; hides on bare_count=0 or non-admin. Shared useCapacityCoverage hook backs both the widget and the agent-edit selector. Legacy max_tokens hint (#7): - Dual-target buttons (Fill into Context Window / Fill into Max Output) with heuristic ordering: values >= 16384 lead with Context Window, values < 16384 lead with Max Output. Each button hides once its target field is filled; the alert hides once both are filled. Old single-button "Apply as max_output_tokens" was reversed semantically: legacy max_tokens columns from the pre-W1 era were more often the provider context window, but at small values they really were the output cap -- the operator picks. Constructor audit (ModelEngine-Group#16): - test_model_consts pins ModelRequest and ModelCapacitySuggestionResponse field sets so a silent rename trips a test. - test_prepare_model_dict_persists_operator_capacity now pins all 7 capacity fields + canonical model_factory/model_name in the ModelRequest constructor kwargs. SLO data flow fix: - Frontend was never sending the W11 accept signal, so model_capacity_suggestion_accept_total stayed at zero and the "95% accepted suggestions hit profile" SLO could not be computed. buildCapacityRequestBody now threads acceptedSuggestionMatchKind + acceptedCapabilityProfileVersion; ModelAddDialog and ModelEditDialog include them in save payloads when the operator clicked "Use suggestion". - Two new app-layer integration tests pin: (1) accept signal present -> recorder fires with correct labels and audit fields are stripped from the service-layer payload; (2) plain save -> recorder does not fire (so accept_total stays aligned with dispatch_profile_hit_total as the SLO denominator). i18n: full spec keyset present in both en/zh (model.list.capacityWarning.*, agent.modelSelector.bareCapacity.*, dashboard.capacityCoverage.*, model.dialog.capacity.suggestion.*, model.dialog.capacity.preset.*, model.dialog.capacity.legacyMaxTokens.*). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…vertical layout for legacy hint - Agent model selector: replace inline yellow subtitle with TriangleAlert icon + hover tooltip to reduce visual clutter in dropdown options - ModelCapacityFields: switch legacy max_tokens Alert from action prop (horizontal) to description prop (vertical) so hint text stacks above apply buttons within the same alert box - Add i18n key agent.modelSelector.bareCapacity.tooltip (zh/en)
…st table + fuzzy canonicalization warning Gap 1 — Model Management list page badge: - ModelList.tsx: add useCapacityCoverage hook + TriangleAlert badge in the Name column for bare-capacity LLM/VLM rows - Badge shows yellow warning icon inline with model name - Hover tooltip explains enforcement is off; click opens ModelEditDialog (which auto-fires capacity suggestion for bare models) Gap 2 — Fuzzy canonicalization warning: - ModelCapacityFields.tsx: add acceptedSuggestion prop; render profileMissWarning text when catalog_fuzzy suggestion is shown but the user hasn't accepted the canonical model name - ModelAddDialog.tsx + ModelEditDialog.tsx: pass acceptedCapacitySuggestion through to ModelCapacityFields
…ialog buildCapacityPayload mirrors max_output_tokens into the legacy max_tokens column on every save, so a populated max_tokens is expected behavior, not a deprecation signal. The showDeprecatedMaxTokensWarning condition was always true for any model that went through the W11 save path, producing a misleading warning for every edit. Remove: showDeprecatedMaxTokensWarning prop, rendering branch, and the deprecatedMaxTokens i18n keys from both locales.
The catalog backfill (v2.2.0_0617) only covers exact (model_factory, model_name) matches. Rows added via the manual-add path (model_factory = 'OpenAI-API-Compatible') or any model not in the approved catalog remain bare, disabling W2 output-token enforcement. This migration fills remaining bare LLM/VLM rows with save-time defaults: context_window=32768, max_output=4096, reserve=4096. Idempotent (only writes when NULL), scoped to LLM/VLM, and includes max_tokens alias reconciliation.
…verage widget text
…odels Add 54 new catalog entries for models hosted on SiliconFlow: - DeepSeek: V4-Pro, V4-Flash, V3.2, V3.1-Terminus, R1, V3, R1-0528-Qwen3-8B plus Pro/ tier variants (11 entries) - Qwen: Qwen3.6, Qwen3.5 (7 sizes), Qwen3-VL (6 variants), Qwen3-Omni (3), Qwen3-Coder, Qwen3 dense (3), Qwen2.5 (5) (26 entries) - GLM/Zhipu: GLM-4 (3), GLM-5.2, GLM-4.5V, GLM-4.5-Air, Pro/GLM-5.1 (7 entries) - Other: Seed-OSS, Ling (2), MiniMax (2), Kimi-K2.7-Code, Nex-N2-Pro, Step-3.5-Flash, Hunyuan (2) (10 entries) CATALOG_REVISION bumped to 2026-06-27.1. Migration script v2.2.2_0627_backfill_expanded_catalog.sql backfills matching bare rows for existing deployments.
Replace manual SQL migration scripts with automatic catalog-driven backfill that runs on nexent-config container startup. The capability_profiles.CATALOG is now the single source of truth. New: backend/services/catalog_backfill_service.py - Phase 1: match model_record_t rows against catalog entries, fill NULL capacity columns with catalog values - Phase 2: fill remaining bare LLM/VLM rows with safe defaults (32K context, 4K output), enforcing max_output < context_window - Phase 3: reconcile legacy max_tokens with max_output_tokens Startup hook added to config_app.py. Manual SQL scripts deleted: - v2.2.2_0627_backfill_bare_capacity_defaults.sql - v2.2.2_0627_backfill_expanded_catalog.sql Verified: backfill runs on startup, idempotent (0 updates when all rows already populated).
…ignal wiring Audit of yesterday's W11 V1.5 commits (f0e82d3..f65f859) surfaced three live bugs in the operator-accept SLO data flow. The crash one (#1) is what tripped the SiliconFlow batch_create report; the other two are observability holes that drop production signal silently. #1 -- /provider/batch_create + /manage/batch_create crash on insert Reported as "Failed to batch create models: Unconsumed column names: accepted_capability_profile_version, accepted_suggestion_ match_kind". Root cause: f0e82d3 added the two audit-only fields to ModelRequest with the contract "app layer pops before service sees it", which holds for /create and /update -- but the batch path goes through prepare_model_dict, and that function rebuilds the dict via ModelRequest(...).model_dump(), which resurrects the two fields as None even if the app layer had popped them. The resurrected keys then fall through to create_model_record -> SQLAlchemy insert -> the table has no such columns -> raise. Worse, the /provider/batch_create app layer was not even popping in the first place. Fix: - prepare_model_dict: model_dump(exclude={...}) so the audit fields cannot resurface for any caller, present or future. Single defensive choke point. - /provider/batch_create + /manage/batch_create: per-model pop_capacity_accept_signal + emit _record_capacity_suggestion_ accept(provider) on success, so the batch path now also contributes to model_capacity_suggestion_accept_total. #2 -- /manage/create + /manage/update silently drop the accept signal The ManageTenantModelCreateRequest / ManageTenantModelUpdateRequest Pydantic schemas in f0e82d3 were not updated when ModelRequest gained the two accepted_* fields. With Pydantic's default extra="ignore", the frontend wire payload's accept_* fields were silently dropped at the schema boundary -- the service never saw them, the recorder never fired. accept_total under-reported every save coming from the SU / asset-owner surface (ModelEditDialog with tenantId, used by AssetOwnerResourcesComp and UserManageComp). In any deployment that leans on the centralized asset-owner model pool, this is the majority of accept events -- the SLO numerator was effectively half-blind. Fix: - Declare accepted_suggestion_match_kind + accepted_capability_ profile_version on both manage schemas with the same audit-only contract. - Both /manage/create and /manage/update now pop the signal off model_data before calling the service (otherwise the new fields would crash update_model_record / create_model_record the same way #1 did), then emit the recorder with provider=request. model_factory after the persist call succeeds. #3 -- ModelList badge silently hides on vlm2/vlm3 rows d6165cb added the bare-capacity TriangleAlert badge in ModelList.tsx with a redundant frontend type guard \`record.type === 'llm' || record.type === 'vlm'\`. Backend's CAPACITY_COVERAGE_MODEL_TYPES is {'llm','vlm','vlm2','vlm3'} -- bareModelIds from /capacity-coverage already filters by that set, but the frontend guard re-stated a smaller version that drifted. Bare vlm2 (image-gen) and vlm3 (video-und) rows never showed the warning icon or the click-to-fix entry point even though the backend marked them bare. Fix: drop the frontend type guard entirely and trust the authoritative bareModelIds set. Eliminates the duplicated-truth that caused the drift, so future type additions (vlm4, etc.) do not silently re-create the same gap. Regression tests: - test_prepare_model_dict_excludes_w11_accept_signal_fields pins the exclude kwarg so a future "let's clean up the dump call" cannot re-open #1. - test_provider_batch_create_strips_accept_signal_and_records covers the batch-app contract: per-model pop + recorder fires once per accepted row, labelled with provider. - test_manage_create_model_records_accept_signal_when_present and test_manage_update_model_records_accept_signal_when_present cover #2: audit fields stripped from the service-layer payload, recorder fires with provider=model_factory. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace the automatic Python backfill on container startup with a deterministic SQL generation approach. The capability_profiles.py catalog remains the single source of truth. New: scripts/generate_backfill_sql.py - Reads CATALOG from capability_profiles.py - Emits idempotent SQL with COALESCE protection - Enforces max_output < context_window via GREATEST/LEAST - Three phases: catalog match, safe defaults, max_tokens reconcile Generated: docker/sql/v2.2.2_0627_backfill_from_catalog.sql - 66 catalog entries + safe defaults + max_tokens reconcile - Operator runs manually during deployment Removed: backend/services/catalog_backfill_service.py Removed: startup hook from config_app.py Developer workflow: 1. Edit capability_profiles.py (add/update models) 2. Run: python scripts/generate_backfill_sql.py > docker/sql/... 3. Commit both files 4. Operator runs SQL during deployment
Add Phase 4 to generated backfill SQL that clamps default_output_reserve_tokens to max_output_tokens when reserve exceeds max_output. This prevents RequestedOutputExceedsCap errors at runtime that silently disable W2 capacity enforcement. Also add LEAST guard to Phase 1 and Phase 2 so newly filled reserve values never exceed the actual max_output_tokens. Verified: Phase 4 fixed 1 existing row with reserve > max_output.
Phase 2 fills bare rows with system defaults (32K/4K), not operator-confirmed values. Marking them as 'operator' was semantically wrong — it caused downstream code to treat these rows as operator-verified, skipping suggestion prompts and inflating SLO accuracy metrics. Changed to 'unknown' which accurately reflects that no one has reviewed these capacity values.
…rows Add 'default' as a legitimate capacity_source value to distinguish rows filled by the backfill safe-defaults from truly unknown sources. - SDK: CapacitySource Literal type, agent_model description, monitoring _dominant_capacity_source priority list - Backend: create_agent_info priority list, db_models column doc - Frontend: i18n keys for en/zh - SQL generator: Phase 2 now uses 'default' instead of 'unknown'
The SDK ModelConfig validator already auto-syncs max_tokens and max_output_tokens in memory. The DB-level reconcile was redundant and could silently overwrite operator-intentional legacy max_tokens values (e.g. operator set max_tokens=16384 for longer output, but Phase 1a/2 would fill max_output_tokens from catalog/default, then Phase 3 would overwrite the operator's 16384 with the catalog value). Phases now: 1a Catalog match -> fill bare rows 1b Catalog match -> tag already-filled rows 2 Safe defaults for remaining bare LLM/VLM rows 3 Clamp reserve to <= max_output_tokens
The validator was bidirectionally syncing max_tokens <-> max_output_tokens, but max_tokens is a legacy deprecated field. Writing max_output_tokens back into max_tokens on the Pydantic model risks propagating synthetic values to serialized/persisted configs, making legacy fields appear operator-set. Keep only the forward direction: max_tokens -> max_output_tokens (legacy migration path). The reverse alias in OpenAIModel.__init__ is safe because it is memory-only and needed for the OpenAI wire format (which uses max_tokens as the API field name).
…city-suggestion-v1.5 # Conflicts: # deploy/sql/migrations/v2.2.2_0627_backfill_from_catalog.sql
v2.2.2_0627_backfill_from_catalog.sql is a strict superset: - 66 catalog entries vs 10 - COALESCE + GREATEST/LEAST safety guards - Phase 1b profile tagging + Phase 2 safe defaults + Phase 3 reserve clamp - Removed the dangerous max_tokens reconcile that silently overwrote operator-intentional legacy values
Phase 1b previously only tagged rows with capability_profile_version when profile_version was NULL. Rows that already had the correct profile_version but stale capacity_source='default' were missed. Updated condition to also match rows where: - capability_profile_version already equals the catalog value - capacity_source is still 'default' This fixes the case where Phase 2 filled safe defaults (source='default'), then a subsequent run or manual edit aligned the values with catalog, but capacity_source was never upgraded to 'profile'. Verified: 2 rows (Qwen2.5-32B, Qwen2.5-14B) correctly upgraded from 'default' to 'profile'.
Replace repeated string literals with CONSTANT declarations in each DO block to satisfy SonarQube rules R49/R50/R83: - c_active_flag for 'N' (delete_flag) - c_source_profile for 'profile' (capacity_source) - c_source_default for 'default' (capacity_source) Reduces literal duplication: - 'N': 135 → 5 (only in comments) - 'profile': 134 → 4 (only in comments + constant) - 'default': 132 → not in top 20 (only in comments + constant) Verified: SQL executes successfully with constants.
The test was asserting against payload['provider'] which is not a ModelRequest field. The app layer uses request.model_factory (default 'OpenAI-API-Compatible'), so the assertion failed. Fix: explicitly set model_factory in the payload and assert against it.
…models
The 11 DeepSeek models hosted on SiliconFlow were incorrectly using
'deepseek' as the catalog key provider. When operators add these models
via SiliconFlow provider browser, DB stores model_factory='silicon',
so migration SQL WHERE LOWER(model_factory)='deepseek' never matched.
Changed catalog key from ('deepseek', 'deepseek-ai/...') to
('silicon', 'deepseek-ai/...') for all 11 SiliconFlow-hosted entries.
Updated capability_profile_version prefix from 'deepseek/' to 'silicon/'.
Kept tokenizer_family='deepseek' (tokenizer identifier, not provider).
Original 4 DeepSeek official API entries (deepseek-chat, deepseek-reasoner,
deepseek-v4-flash, deepseek-v4-pro) remain unchanged with provider='deepseek'.
_split_repo_name used split('/', 1) which splits on the FIRST slash.
The backend model_name_utils.split_repo_name splits on the LAST slash
(rsplit equivalent). For 3-segment IDs like 'Pro/deepseek-ai/DeepSeek-V3.2':
Generator (broken): repo='Pro', name='deepseek-ai/DeepSeek-V3.2'
Backend (correct): repo='Pro/deepseek-ai', name='DeepSeek-V3.2'
This caused all 10 Pro/ prefixed catalog entries to never match in
Phase 1a/1b, falling through to Phase 2 safe defaults instead of
getting correct catalog values.
Fix: split('/', 1) -> rsplit('/', 1)
Verified: DeepSeek-V3.2 (Pro/deepseek-ai) now correctly backfilled
with catalog values (164K ctx, 8K output, profile source).
Collaborator
Author
|
一行修复:split("/", 1) → rsplit("/", 1) 实验验证:DeepSeek-V3.2 从 NULL 正确回填为 catalog 值(164K context, 8K output, source=profile, version=silicon/deepseek-v3.2-pro@1),而不是之前错误的 Phase 2 默认值(32K/4K)。 |
…idator Pre-W1 models used max_tokens to mean 'total context window' (input + output). Post-W1 redefined max_tokens as max_output_tokens (output only). When validator copied large legacy values (e.g., 32768) directly to max_output_tokens, providers rejected requests with 'max_tokens exceeded max_seq_len' because there was no space left for input. Added heuristic: if max_tokens >= 32768, assume it's the old 'total context window' semantics and use conservative default (4096) instead of copying. This prevents the semantic drift while still supporting legitimate small output limits (< 32768).
Collaborator
Author
Collaborator
Author
Collaborator
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.






Summary
Expand the W11 capability catalog from 12 to 66 entries (adding 54 SiliconFlow-hosted models) and replace manual SQL migrations with an auto-generated, idempotent backfill script.
Changes
Catalog Expansion (12 → 66 entries)
SQL Generator (
scripts/generate_backfill_sql.py)capability_profiles.CATALOG, emits idempotent backfill SQLBackfill SQL Phases
Safety Guards
SDK Fix
Frontend
Removed
v2.2.0_0617_context_management_capacity_data_fix.sql(superseded)catalog_backfill_service.py+ startup hook (replaced by SQL generator)Testing