Skip to content

feat(w11): expand capability catalog to 66 entries + SQL generator + safety guards#3317

Open
wuyuanfr wants to merge 28 commits into
ModelEngine-Group:developfrom
liudfgoo:feature/w11-capacity-suggestion-v1.5
Open

feat(w11): expand capability catalog to 66 entries + SQL generator + safety guards#3317
wuyuanfr wants to merge 28 commits into
ModelEngine-Group:developfrom
liudfgoo:feature/w11-capacity-suggestion-v1.5

Conversation

@wuyuanfr

Copy link
Copy Markdown
Collaborator

Summary

Expand the W11 capability catalog from 12 to 66 entries (adding 54 SiliconFlow-hosted models) and replace manual SQL migrations with an auto-generated, idempotent backfill script.

Changes

Catalog Expansion (12 → 66 entries)

  • New SiliconFlow models: Qwen3.x series (26), DeepSeek V3/V4/R1 (11), GLM-5.x/4.x (7), MiniMax, Kimi, Step, Hunyuan, Ling, Seed-OSS (10)
  • CATALOG_REVISION bumped to 2026-06-27.1

SQL Generator (scripts/generate_backfill_sql.py)

  • Reads capability_profiles.CATALOG, emits idempotent backfill SQL
  • Developer workflow: edit catalog → run generator → commit SQL
  • Eliminates manual SQL maintenance and catalog/SQL drift

Backfill SQL Phases

  • Phase 1a: Fill bare rows matching catalog (capacity_source='profile')
  • Phase 1b: Tag already-filled rows with exact catalog match; upgrade capacity_source from 'default' to 'profile'
  • Phase 2: Safe defaults (32K/4K) for remaining bare LLM/VLM rows (capacity_source='default')
  • Phase 3: Clamp reserve ≤ max_output_tokens

Safety Guards

  • COALESCE protects existing non-NULL values
  • GREATEST/LEAST enforces max_output < context_window
  • Reserve clamped to ≤ max_output

SDK Fix

  • ModelConfig validator: remove reverse max_tokens backfill (legacy field should not be written to)
  • Add clamp: max_output_tokens from legacy max_tokens cannot exceed context_window_tokens

Frontend

  • Bare-capacity badge in model management table (TriangleAlert icon + tooltip)
  • Fuzzy canonicalization warning in suggestion acceptance UI
  • Removed obsolete deprecatedMaxTokens warning

Removed

  • v2.2.0_0617_context_management_capacity_data_fix.sql (superseded)
  • catalog_backfill_service.py + startup hook (replaced by SQL generator)

Testing

  • SQL generator produces valid, idempotent SQL
  • Backfill executed on dev deployment: Phase 1b correctly upgrades capacity_source
  • All 5 containers start cleanly, frontend accessible

wuyuanfr and others added 23 commits June 25, 2026 19:15
…ggestion path raises

The connectivity check endpoint /model/temporary_healthcheck runs
_capacity_suggestion_for_model_request inline after a successful
verify_model_config_connectivity. Per W11 spec ("Suggestion failure
never changes connectivity success or failure"), an unexpected error
inside the suggestion path must not turn a successful connectivity
result into HTTP 500.

The prior code caught ValueError (covering the typed InvalidInput case
and Pydantic v2 ValidationError, which is a ValueError subclass), but
non-ValueError exceptions -- e.g. AttributeError/TypeError from a
malformed catalog profile entry, or future V2 provider-discovery HTTP
errors -- would propagate to the outer except Exception in
check_temporary_model_health and surface to operators as a misleading
"Failed to verify model connectivity" 500.

Restore the catch-all degrade-to-None branch and log at WARNING (not
DEBUG) so the real root cause is visible in default production log
streams without DEBUG enabled. Connectivity stays 200 with
capacity_suggestion: null; the per-row catalog issue surfaces in logs
where operators can act on it.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The Add dialog had two ways to trigger a catalog suggestion: clicking
the bottom connectivity-validation button (which the backend extends
with capacity_suggestion in /temporary_healthcheck's response) and a
secondary "Check" button beside the toggle that called the standalone
/suggest-capacity endpoint. In V1 catalog-only mode the two paths
overlap on every realistic add flow -- the user must run connectivity
anyway because the Add button is gated on it -- so the standalone
button is UX noise without functional value. Collapse Add to a single
toggle whose state gates both the embedded suggestion result and the
explanatory hint.

The Edit dialog keeps its explicit Check button per spec ("show
'Suggestion available' after validation or explicit check") because
existing rows may need to refresh a suggestion without re-running
connectivity, but the long-form hint sentence is redundant: title +
toggle + a button labelled "Check" already names the feature and the
action. Removing the hint matches the spec's i18n key list, which
never listed model.dialog.capacity.suggestion.hint to begin with.

Add dialog changes:
- Drop checkingCapacitySuggestion state, canSuggestCapacity guard,
  and handleSuggestCapacity handler.
- Drop the secondary Button and its wrapping shrink-0 flex container;
  the Switch becomes a direct child of the outer justify-between row.
- Drop the suggestionLoading prop from ModelCapacityFields entirely.
  It only controlled the spinner on the "Use suggestion" button inside
  the suggestion-result panel, which only renders after a suggestion
  is set -- at which point verifyingConnectivity is already false, so
  binding it added no observable effect.
- Replace the shared "hint" copy with a new key "hintAdd" whose
  wording reflects the actual trigger ("Suggested from the approved
  catalog after connectivity passes."), and gate it on
  capacitySuggestionEnabled so the toggle's off-state no longer
  contradicts itself with copy that promises automatic behavior.

Edit dialog changes:
- Remove the hint <div> and its wrapping container; the title becomes
  a direct flex child alongside the Switch+Check controls.

i18n:
- Drop the obsolete "model.dialog.capacity.suggestion.hint" key from
  en and zh; add "hintAdd" used only by Add dialog.

No backend wire change. Edit dialog still calls /suggest-capacity
through its existing Check button for the bare-row repair flow.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ge test

Phase 1.5 backend foundation per W11 spec L706-710 (SLO metrics),
L86-89/L944-948 (visibility env flag), and L312-322 (cross-tenant test).
No frontend change in this commit; V1.5 surfaces consume these signals in
follow-up frontend commits.

Metrics (4 instruments, each guarded behind try/except so a missing
OpenTelemetry runtime does not break the dispatch path):

1. model_capacity_suggestion_requests_total{match_kind, model_type,
   provider} -- counter wrapping suggest_capacity. Drives the
   "70% of new manual-add LLM rows produce match_kind != none" SLO.
2. model_capacity_suggestion_latency_ms{match_kind, provider} --
   histogram around the same call. Used to verify V2 provider-discovery
   p95 stays under the model-add latency budget.
3. model_capacity_suggestion_accept_total{match_kind, provider} --
   counter emitted by the app layer when the operator save payload
   carries accepted_suggestion_match_kind. Numerator for the
   "95% accepted -> profile dispatch" SLO ratio.
4. model_capacity_suggestion_dispatch_profile_hit_total{provider} --
   counter emitted in _resolve_input_budget when the resolved snapshot
   carries a non-null capability_profile_version. Denominator for the
   same SLO.

Accept signal pipe (audit-only):
- consts/model.py: ModelRequest gains accepted_suggestion_match_kind
  and accepted_capability_profile_version. Both Optional[str], never
  persisted to model_record_t.
- model_management_service.py: pop_capacity_accept_signal strips both
  fields from save payloads and returns the popped values so the app
  layer can label the counter.
- model_managment_app.py: /create and /update endpoints call
  pop_capacity_accept_signal before invoking the service, then forward
  the popped match_kind to _record_capacity_suggestion_accept after the
  save returns. The dict the service sees no longer contains these
  fields, preserving the "audit only -- not persisted" contract.
- The V1.5 frontend (next commit) will ship these fields on the wire;
  until then the counter reads zero, which is the correct baseline.

suggest_capacity refactor:
- Inner body extracted to _suggest_capacity_inner so the public
  function can time end-to-end and emit requests_total + latency_ms
  exactly once per completed call. ValueError paths still raise --
  client-shape errors must not pollute SLO ratios so the recorder
  fires only on terminal CapacitySuggestionResult returns.

Visibility env flag (CAPACITY_VISIBILITY_ENABLED):
- Already declared in consts/const.py (default true) and consumed by
  get_capacity_coverage. Confirmed wired end-to-end; no code change
  needed here. The flag stays the developer-level rollback lever per
  W11 spec; tenant_config_t overlay remains a follow-up.

Cross-tenant isolation test (spec L312-322):
- test_get_capacity_coverage_cross_tenant_isolation routes mocked
  get_model_records by tenant_id and asserts each tenant only sees
  its own bare rows in both bare_models[] and total_llm_vlm. Closes
  the spec's required "tenant B row must not appear in tenant A's
  response" coverage.

Test coverage added:
- Cross-tenant isolation for /capacity-coverage.
- pop_capacity_accept_signal extraction + dict mutation contract.
- accept_total OTel-optional no-op + label-cardinality (lower-cased
  provider) wiring.
- suggest_capacity records requests_total + latency_ms on catalog
  match, on "none" with provider fallback to "unknown", does NOT
  record on ValueError, and runs cleanly when instruments are None.
- _resolve_input_budget records dispatch_profile_hit_total only when
  capability_profile_version is non-null; recorder no-op when counter
  is None.

Total: 8 files, +527 lines. All targeted unit suites pass
(test_model_capacity_suggestion_service 16/16,
test_model_management_service 70/70,
test_create_agent_info 174/174).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Mark bare-capacity LLM/VLM rows in the Manage Models list with the
existing yellow "缺容量" / "Missing capacity" tag. Keep the
aggregation banner on the Models page as the entry-point signal, but
rewrite its copy to hand off to the per-row tag instead of duplicating
per-row UI. Auto-fire /suggest-capacity from inside ModelEditDialog
whenever it opens on a bare-capacity row, regardless of how the dialog
was opened. Expose preset selectors on the capacity panel and ship the
model-management permission helper for V1.5 surfaces #2/#3.

Per spec line numbers cross-referenced inline:

#1 -- per-row tag as visual indicator (spec L143-167):
- Both badge sites in ModelDeleteDialog (provider-browser row L1507+
  and added-model row L1652+) retain the existing yellow text tag
  (bg-yellow-100 border-yellow-200 text-yellow-700). We considered a
  warning-triangle icon and a separate click-target on the badge,
  then rolled both back: "缺容量"/"Missing capacity" reads as a
  status at the same glance an icon would, while the existing row
  onClick already opens the edit dialog -- so a button on the badge
  added complexity that ModelEditDialog now subsumes internally.
- ModelEditDialog derives `isBareCapacityModel` from the loaded model
  (context_window_tokens or max_output_tokens null) and a single
  useEffect auto-fires handleSuggestCapacity once on open when the
  model is bare, the suggestion switch is on, and the form fields
  needed for the call are present. Any entry path -- row click,
  future gear-icon shortcut, deep link -- gets the same affordance,
  so the operator never has to also click "Check" on a bare row.
- The deprecated model.dialog.capacityCoverage.{tag, warning,
  warningWithSuggestion} keys are dropped from en + zh in favour of
  a single spec-namespaced model.list.capacityWarning.tag key. No
  per-suggestion variants because the tag is purely a state label;
  the suggestion handoff happens inside the edit dialog where the
  green/info Alert carries that nuance instead.

#5 -- aggregation banner kept as entry-point signal, copy retuned:
- The summary Alert on the Models page (modelConfig.tsx) stays --
  per-row tags live inside ModelDeleteDialog which is one click
  away. Without the banner, users on the Models page have no signal
  that any row needs attention.
- Description copy rewritten so the banner points at the new per-row
  flow: "Click Manage, then click the warning icon on each affected
  row to repair." Removes the redundant "edit a marked model"
  wording.
- Warning copy adds an "output token cap is not enforced" clause so
  the consequence (not just the symptom) is visible at a glance.

#4 -- permission helper (spec L167-178):
- frontend/lib/auth.ts gains canManageModels(role, isSpeedMode).
  Allowed roles: SU, ADMIN, DEV, SPEED. USER is excluded so regular
  agent authors see read-only notices rather than dead repair links.
  ASSET_OWNER is excluded -- model records are tenant scope, not
  asset-admin scope. Speed mode bypasses for the single-user dev
  experience, mirroring how other surfaces (chatHeader, etc.) treat it.
- The banner and tag in this commit both live on /models which is
  already route-gated for non-USER roles, so no in-place gate is
  needed yet. The helper exists so the V1.5 agent-edit-selector
  commit (#2) and the dashboard widget commit (#3) consume the same
  primitive instead of reinventing role parsing.

#8 -- preset selectors for context_window / output_reserve /
max_output (spec L757-790):
- ModelCapacityFields.tsx gains two preset arrays mirroring spec
  L767-790 verbatim (9 context-window values 4K..1M, 7 output
  values 256..16K). The context-window list is identical to
  MAX_TOKEN_OPTIONS in ModelMaxTokensInput; kept as a local
  constant rather than cross-importing so the two surfaces stay
  independently editable.
- renderNumberInput gains an optional `presetOptions` parameter.
  When the field has no catalog suggestion yet (per spec L762-765
  "when no suggestion exists ... render as preset-capable selector"),
  the input renders as AutoComplete with the preset list; otherwise
  it stays a plain numeric Input so an explicit catalog value
  doesn't get visually buried behind dropdown chrome.
- Wired for contextWindowTokens, maxOutputTokens, and
  defaultOutputReserveTokens. maxOutputTokens reuses the 256..16K
  list so operators see the same dropdown choices they already see
  for the reserve field; values above 16K (e.g. GPT-4.1's 32K cap,
  GLM-5.1's 131K cap) still work via free-text typing through
  AutoComplete. maxInputTokens keeps plain numeric input -- it is
  an explicit operator-side limit, not common-preset land.
- validateCapacityForm continues to enforce positive integers
  downstream.

i18n delta summary:
- DROPPED: model.dialog.capacityCoverage.tag,
  model.dialog.capacityCoverage.warning,
  model.dialog.capacityCoverage.warningWithSuggestion
- ADDED: model.list.capacityWarning.tag (single state label, no
  tooltip variants)
- REVISED (kept): modelConfig.capacityCoverage.warning + description
  with new entry-point copy; .manage button label unchanged.

Net: 6 files, +148/-77. Typecheck clean (only pre-existing
.next/types/validator.ts noise from the unrelated left-nav rename).
No backend wire change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…o-suggest population guard

Two paired bugs in the V1.5 auto-suggest path, both surfacing as
"open glm-5 shows qwen3.7-max suggestion" after the operator cancels
qwen and immediately clicks glm-5 in the Manage Models list:

1. Stale render. ModelEditDialog returns null when `model` is falsy
   (line ~559) but React does not unmount on null return -- it just
   commits null and keeps the component instance alive, useState
   intact. With React 18's automatic batching, the cancel and the
   subsequent row click coalesce into one commit; the [isOpen] reset
   effect I added in e442a55 saw isOpen=true on its single run and
   skipped the cleanup, so capacitySuggestion stayed as qwenResult
   for the first render with model=glm5. The user briefly saw the
   wrong suggestion before the [model] effect cleared it.

2. Stale API call. Even after the first render flickered to qwen and
   then to null, the auto-suggest effect fired with closure-captured
   form values that were still qwen's (form was a single useState
   instance, the [model] effect's setForm had not been flushed yet at
   the time the auto-suggest effect ran in the same commit cycle).
   modelService.suggestCapacity({ modelName: "qwen3.7-max", ... })
   was sent to the backend, and /suggest-capacity dutifully returned
   qwen3.7-max@1. The request token from the earlier amend did not
   help here because the API call was not racing -- it was sending
   the wrong input.

Fixes in this commit:

a) ModelDeleteDialog passes `key={editModel?.displayName || "__none__"}`
   to ModelEditDialog. Each new editModel forces a full unmount +
   remount, which resets every useState/useRef to its initial value.
   That eliminates the stale-render path (1).

b) ModelEditDialog auto-suggest effect depends on `form.name` and
   `form.url` in addition to `[isOpen, isBareCapacityModel,
   capacitySuggestionEnabled]`. On a fresh mount, form starts empty
   (useState defaults); canSuggestCapacity() is false on the first
   pass so we do not fire. After the [model] effect's setForm
   re-renders, form.name and form.url change, the effect re-runs,
   canSuggestCapacity() now returns true with the correct values,
   and we send the API request scoped to the new model. That fixes
   the stale-input path (2).

c) `autoSuggestFiredRef = useRef(false)` guards against re-firing
   when the operator subsequently types into the name or url fields.
   We still want exactly one auto-suggest per dialog instance, and
   thanks to (a) one instance == one model.

Dead code removed:
- The [isOpen] reset effect from e442a55. Key-based remount
  supersedes it: the component is unmounted on close, so there is
  no state to reset.
- Its companion comments about "reset on close" semantics.

Retained:
- suggestionRequestRef token logic in handleSuggestCapacity. Covers
  a separate concern (rapid manual Check clicks on the same model
  with different inputs, where the older response must not overwrite
  the newer one). Key remount does not address this because there
  is no model swap.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…gnal SLO wiring

Closes Week N+2/N+3 punch list for W11 V1.5.

UI surfaces (#2 + #3):
- Agent-edit model selector: bare-capacity subtitle on dropdown items
  and a non-blocking form Alert above Save when a bare model is picked.
  Admin/dev/su/speed see "fix in Model Management", others see
  "ask administrator". Permission gate via canManageModels().
- ModelCapacityCoverageWidget renders at top of resource-manage Models
  tab; hides on bare_count=0 or non-admin. Shared useCapacityCoverage
  hook backs both the widget and the agent-edit selector.

Legacy max_tokens hint (#7):
- Dual-target buttons (Fill into Context Window / Fill into Max Output)
  with heuristic ordering: values >= 16384 lead with Context Window,
  values < 16384 lead with Max Output. Each button hides once its
  target field is filled; the alert hides once both are filled. Old
  single-button "Apply as max_output_tokens" was reversed semantically:
  legacy max_tokens columns from the pre-W1 era were more often the
  provider context window, but at small values they really were the
  output cap -- the operator picks.

Constructor audit (ModelEngine-Group#16):
- test_model_consts pins ModelRequest and ModelCapacitySuggestionResponse
  field sets so a silent rename trips a test.
- test_prepare_model_dict_persists_operator_capacity now pins all 7
  capacity fields + canonical model_factory/model_name in the
  ModelRequest constructor kwargs.

SLO data flow fix:
- Frontend was never sending the W11 accept signal, so
  model_capacity_suggestion_accept_total stayed at zero and the
  "95% accepted suggestions hit profile" SLO could not be computed.
  buildCapacityRequestBody now threads acceptedSuggestionMatchKind +
  acceptedCapabilityProfileVersion; ModelAddDialog and ModelEditDialog
  include them in save payloads when the operator clicked "Use suggestion".
- Two new app-layer integration tests pin: (1) accept signal present
  -> recorder fires with correct labels and audit fields are stripped
  from the service-layer payload; (2) plain save -> recorder does not
  fire (so accept_total stays aligned with dispatch_profile_hit_total
  as the SLO denominator).

i18n: full spec keyset present in both en/zh
(model.list.capacityWarning.*, agent.modelSelector.bareCapacity.*,
dashboard.capacityCoverage.*, model.dialog.capacity.suggestion.*,
model.dialog.capacity.preset.*, model.dialog.capacity.legacyMaxTokens.*).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…vertical layout for legacy hint

- Agent model selector: replace inline yellow subtitle with TriangleAlert
  icon + hover tooltip to reduce visual clutter in dropdown options
- ModelCapacityFields: switch legacy max_tokens Alert from action prop
  (horizontal) to description prop (vertical) so hint text stacks above
  apply buttons within the same alert box
- Add i18n key agent.modelSelector.bareCapacity.tooltip (zh/en)
…st table + fuzzy canonicalization warning

Gap 1 — Model Management list page badge:
- ModelList.tsx: add useCapacityCoverage hook + TriangleAlert badge in
  the Name column for bare-capacity LLM/VLM rows
- Badge shows yellow warning icon inline with model name
- Hover tooltip explains enforcement is off; click opens ModelEditDialog
  (which auto-fires capacity suggestion for bare models)

Gap 2 — Fuzzy canonicalization warning:
- ModelCapacityFields.tsx: add acceptedSuggestion prop; render
  profileMissWarning text when catalog_fuzzy suggestion is shown but
  the user hasn't accepted the canonical model name
- ModelAddDialog.tsx + ModelEditDialog.tsx: pass acceptedCapacitySuggestion
  through to ModelCapacityFields
…ialog

buildCapacityPayload mirrors max_output_tokens into the legacy max_tokens
column on every save, so a populated max_tokens is expected behavior, not
a deprecation signal. The showDeprecatedMaxTokensWarning condition was
always true for any model that went through the W11 save path, producing
a misleading warning for every edit.

Remove: showDeprecatedMaxTokensWarning prop, rendering branch, and the
deprecatedMaxTokens i18n keys from both locales.
The catalog backfill (v2.2.0_0617) only covers exact (model_factory,
model_name) matches. Rows added via the manual-add path (model_factory
= 'OpenAI-API-Compatible') or any model not in the approved catalog
remain bare, disabling W2 output-token enforcement.

This migration fills remaining bare LLM/VLM rows with save-time
defaults: context_window=32768, max_output=4096, reserve=4096.
Idempotent (only writes when NULL), scoped to LLM/VLM, and includes
max_tokens alias reconciliation.
…odels

Add 54 new catalog entries for models hosted on SiliconFlow:
- DeepSeek: V4-Pro, V4-Flash, V3.2, V3.1-Terminus, R1, V3, R1-0528-Qwen3-8B
  plus Pro/ tier variants (11 entries)
- Qwen: Qwen3.6, Qwen3.5 (7 sizes), Qwen3-VL (6 variants), Qwen3-Omni (3),
  Qwen3-Coder, Qwen3 dense (3), Qwen2.5 (5) (26 entries)
- GLM/Zhipu: GLM-4 (3), GLM-5.2, GLM-4.5V, GLM-4.5-Air, Pro/GLM-5.1 (7 entries)
- Other: Seed-OSS, Ling (2), MiniMax (2), Kimi-K2.7-Code, Nex-N2-Pro,
  Step-3.5-Flash, Hunyuan (2) (10 entries)

CATALOG_REVISION bumped to 2026-06-27.1.

Migration script v2.2.2_0627_backfill_expanded_catalog.sql backfills
matching bare rows for existing deployments.
Replace manual SQL migration scripts with automatic catalog-driven
backfill that runs on nexent-config container startup. The
capability_profiles.CATALOG is now the single source of truth.

New: backend/services/catalog_backfill_service.py
- Phase 1: match model_record_t rows against catalog entries, fill
  NULL capacity columns with catalog values
- Phase 2: fill remaining bare LLM/VLM rows with safe defaults
  (32K context, 4K output), enforcing max_output < context_window
- Phase 3: reconcile legacy max_tokens with max_output_tokens

Startup hook added to config_app.py. Manual SQL scripts deleted:
- v2.2.2_0627_backfill_bare_capacity_defaults.sql
- v2.2.2_0627_backfill_expanded_catalog.sql

Verified: backfill runs on startup, idempotent (0 updates when all
rows already populated).
…ignal wiring

Audit of yesterday's W11 V1.5 commits (f0e82d3..f65f859) surfaced
three live bugs in the operator-accept SLO data flow. The crash one
(#1) is what tripped the SiliconFlow batch_create report; the other
two are observability holes that drop production signal silently.

#1 -- /provider/batch_create + /manage/batch_create crash on insert
    Reported as "Failed to batch create models: Unconsumed column
    names: accepted_capability_profile_version, accepted_suggestion_
    match_kind". Root cause: f0e82d3 added the two audit-only fields
    to ModelRequest with the contract "app layer pops before service
    sees it", which holds for /create and /update -- but the batch
    path goes through prepare_model_dict, and that function rebuilds
    the dict via ModelRequest(...).model_dump(), which resurrects the
    two fields as None even if the app layer had popped them. The
    resurrected keys then fall through to create_model_record ->
    SQLAlchemy insert -> the table has no such columns -> raise.
    Worse, the /provider/batch_create app layer was not even popping
    in the first place.
    Fix:
    - prepare_model_dict: model_dump(exclude={...}) so the audit
      fields cannot resurface for any caller, present or future.
      Single defensive choke point.
    - /provider/batch_create + /manage/batch_create: per-model
      pop_capacity_accept_signal + emit _record_capacity_suggestion_
      accept(provider) on success, so the batch path now also
      contributes to model_capacity_suggestion_accept_total.

#2 -- /manage/create + /manage/update silently drop the accept signal
    The ManageTenantModelCreateRequest / ManageTenantModelUpdateRequest
    Pydantic schemas in f0e82d3 were not updated when ModelRequest
    gained the two accepted_* fields. With Pydantic's default
    extra="ignore", the frontend wire payload's accept_* fields were
    silently dropped at the schema boundary -- the service never saw
    them, the recorder never fired. accept_total under-reported every
    save coming from the SU / asset-owner surface (ModelEditDialog
    with tenantId, used by AssetOwnerResourcesComp and UserManageComp).
    In any deployment that leans on the centralized asset-owner model
    pool, this is the majority of accept events -- the SLO numerator
    was effectively half-blind.
    Fix:
    - Declare accepted_suggestion_match_kind + accepted_capability_
      profile_version on both manage schemas with the same audit-only
      contract.
    - Both /manage/create and /manage/update now pop the signal off
      model_data before calling the service (otherwise the new fields
      would crash update_model_record / create_model_record the same
      way #1 did), then emit the recorder with provider=request.
      model_factory after the persist call succeeds.

#3 -- ModelList badge silently hides on vlm2/vlm3 rows
    d6165cb added the bare-capacity TriangleAlert badge in
    ModelList.tsx with a redundant frontend type guard
    \`record.type === 'llm' || record.type === 'vlm'\`. Backend's
    CAPACITY_COVERAGE_MODEL_TYPES is {'llm','vlm','vlm2','vlm3'} --
    bareModelIds from /capacity-coverage already filters by that
    set, but the frontend guard re-stated a smaller version that
    drifted. Bare vlm2 (image-gen) and vlm3 (video-und) rows never
    showed the warning icon or the click-to-fix entry point even
    though the backend marked them bare.
    Fix: drop the frontend type guard entirely and trust the
    authoritative bareModelIds set. Eliminates the duplicated-truth
    that caused the drift, so future type additions (vlm4, etc.) do
    not silently re-create the same gap.

Regression tests:
- test_prepare_model_dict_excludes_w11_accept_signal_fields pins the
  exclude kwarg so a future "let's clean up the dump call" cannot
  re-open #1.
- test_provider_batch_create_strips_accept_signal_and_records covers
  the batch-app contract: per-model pop + recorder fires once per
  accepted row, labelled with provider.
- test_manage_create_model_records_accept_signal_when_present and
  test_manage_update_model_records_accept_signal_when_present cover
  #2: audit fields stripped from the service-layer payload, recorder
  fires with provider=model_factory.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace the automatic Python backfill on container startup with a
deterministic SQL generation approach. The capability_profiles.py
catalog remains the single source of truth.

New: scripts/generate_backfill_sql.py
- Reads CATALOG from capability_profiles.py
- Emits idempotent SQL with COALESCE protection
- Enforces max_output < context_window via GREATEST/LEAST
- Three phases: catalog match, safe defaults, max_tokens reconcile

Generated: docker/sql/v2.2.2_0627_backfill_from_catalog.sql
- 66 catalog entries + safe defaults + max_tokens reconcile
- Operator runs manually during deployment

Removed: backend/services/catalog_backfill_service.py
Removed: startup hook from config_app.py

Developer workflow:
1. Edit capability_profiles.py (add/update models)
2. Run: python scripts/generate_backfill_sql.py > docker/sql/...
3. Commit both files
4. Operator runs SQL during deployment
Add Phase 4 to generated backfill SQL that clamps
default_output_reserve_tokens to max_output_tokens when reserve
exceeds max_output. This prevents RequestedOutputExceedsCap errors
at runtime that silently disable W2 capacity enforcement.

Also add LEAST guard to Phase 1 and Phase 2 so newly filled reserve
values never exceed the actual max_output_tokens.

Verified: Phase 4 fixed 1 existing row with reserve > max_output.
Phase 2 fills bare rows with system defaults (32K/4K), not
operator-confirmed values. Marking them as 'operator' was
semantically wrong — it caused downstream code to treat these
rows as operator-verified, skipping suggestion prompts and
inflating SLO accuracy metrics.

Changed to 'unknown' which accurately reflects that no one has
reviewed these capacity values.
…rows

Add 'default' as a legitimate capacity_source value to distinguish
rows filled by the backfill safe-defaults from truly unknown sources.

- SDK: CapacitySource Literal type, agent_model description,
  monitoring _dominant_capacity_source priority list
- Backend: create_agent_info priority list, db_models column doc
- Frontend: i18n keys for en/zh
- SQL generator: Phase 2 now uses 'default' instead of 'unknown'
The SDK ModelConfig validator already auto-syncs max_tokens and
max_output_tokens in memory. The DB-level reconcile was redundant
and could silently overwrite operator-intentional legacy max_tokens
values (e.g. operator set max_tokens=16384 for longer output, but
Phase 1a/2 would fill max_output_tokens from catalog/default, then
Phase 3 would overwrite the operator's 16384 with the catalog value).

Phases now:
  1a  Catalog match -> fill bare rows
  1b  Catalog match -> tag already-filled rows
  2   Safe defaults for remaining bare LLM/VLM rows
  3   Clamp reserve to <= max_output_tokens
The validator was bidirectionally syncing max_tokens <-> max_output_tokens,
but max_tokens is a legacy deprecated field. Writing max_output_tokens back
into max_tokens on the Pydantic model risks propagating synthetic values
to serialized/persisted configs, making legacy fields appear operator-set.

Keep only the forward direction: max_tokens -> max_output_tokens (legacy
migration path). The reverse alias in OpenAIModel.__init__ is safe because
it is memory-only and needed for the OpenAI wire format (which uses
max_tokens as the API field name).
…city-suggestion-v1.5

# Conflicts:
#	deploy/sql/migrations/v2.2.2_0627_backfill_from_catalog.sql
v2.2.2_0627_backfill_from_catalog.sql is a strict superset:
- 66 catalog entries vs 10
- COALESCE + GREATEST/LEAST safety guards
- Phase 1b profile tagging + Phase 2 safe defaults + Phase 3 reserve clamp
- Removed the dangerous max_tokens reconcile that silently overwrote
  operator-intentional legacy values
Phase 1b previously only tagged rows with capability_profile_version
when profile_version was NULL. Rows that already had the correct
profile_version but stale capacity_source='default' were missed.

Updated condition to also match rows where:
- capability_profile_version already equals the catalog value
- capacity_source is still 'default'

This fixes the case where Phase 2 filled safe defaults (source='default'),
then a subsequent run or manual edit aligned the values with catalog,
but capacity_source was never upgraded to 'profile'.

Verified: 2 rows (Qwen2.5-32B, Qwen2.5-14B) correctly upgraded
from 'default' to 'profile'.
@wuyuanfr wuyuanfr requested review from Dallas98 and WMC001 as code owners June 27, 2026 06:59
wuyuanfr added 4 commits June 27, 2026 15:10
Replace repeated string literals with CONSTANT declarations in each
DO block to satisfy SonarQube rules R49/R50/R83:
- c_active_flag for 'N' (delete_flag)
- c_source_profile for 'profile' (capacity_source)
- c_source_default for 'default' (capacity_source)

Reduces literal duplication:
- 'N': 135 → 5 (only in comments)
- 'profile': 134 → 4 (only in comments + constant)
- 'default': 132 → not in top 20 (only in comments + constant)

Verified: SQL executes successfully with constants.
The test was asserting against payload['provider'] which is not a
ModelRequest field. The app layer uses request.model_factory (default
'OpenAI-API-Compatible'), so the assertion failed.

Fix: explicitly set model_factory in the payload and assert against it.
…models

The 11 DeepSeek models hosted on SiliconFlow were incorrectly using
'deepseek' as the catalog key provider. When operators add these models
via SiliconFlow provider browser, DB stores model_factory='silicon',
so migration SQL WHERE LOWER(model_factory)='deepseek' never matched.

Changed catalog key from ('deepseek', 'deepseek-ai/...') to
('silicon', 'deepseek-ai/...') for all 11 SiliconFlow-hosted entries.
Updated capability_profile_version prefix from 'deepseek/' to 'silicon/'.

Kept tokenizer_family='deepseek' (tokenizer identifier, not provider).

Original 4 DeepSeek official API entries (deepseek-chat, deepseek-reasoner,
deepseek-v4-flash, deepseek-v4-pro) remain unchanged with provider='deepseek'.
_split_repo_name used split('/', 1) which splits on the FIRST slash.
The backend model_name_utils.split_repo_name splits on the LAST slash
(rsplit equivalent). For 3-segment IDs like 'Pro/deepseek-ai/DeepSeek-V3.2':

  Generator (broken): repo='Pro', name='deepseek-ai/DeepSeek-V3.2'
  Backend (correct):  repo='Pro/deepseek-ai', name='DeepSeek-V3.2'

This caused all 10 Pro/ prefixed catalog entries to never match in
Phase 1a/1b, falling through to Phase 2 safe defaults instead of
getting correct catalog values.

Fix: split('/', 1) -> rsplit('/', 1)

Verified: DeepSeek-V3.2 (Pro/deepseek-ai) now correctly backfilled
with catalog values (164K ctx, 8K output, profile source).
@wuyuanfr

Copy link
Copy Markdown
Collaborator Author

1c3f21a

一行修复:split("/", 1) → rsplit("/", 1)
影响范围:10 个 Pro/ 前缀的三段式 catalog 条目全部修复:
Pro/deepseek-ai/DeepSeek-V3.2
Pro/deepseek-ai/DeepSeek-V3.1-Terminus
Pro/deepseek-ai/DeepSeek-R1
Pro/deepseek-ai/DeepSeek-V3
Pro/zai-org/GLM-5.1
Pro/MiniMaxAI/MiniMax-M2.5
Pro/moonshotai/Kimi-K2.6 等

实验验证:DeepSeek-V3.2 从 NULL 正确回填为 catalog 值(164K context, 8K output, source=profile, version=silicon/deepseek-v3.2-pro@1),而不是之前错误的 Phase 2 默认值(32K/4K)。

…idator

Pre-W1 models used max_tokens to mean 'total context window' (input + output).
Post-W1 redefined max_tokens as max_output_tokens (output only).

When validator copied large legacy values (e.g., 32768) directly to
max_output_tokens, providers rejected requests with 'max_tokens exceeded
max_seq_len' because there was no space left for input.

Added heuristic: if max_tokens >= 32768, assume it's the old 'total context
window' semantics and use conservative default (4096) instead of copying.
This prevents the semantic drift while still supporting legitimate small
output limits (< 32768).
@wuyuanfr

wuyuanfr commented Jun 27, 2026

Copy link
Copy Markdown
Collaborator Author

SQL执行前后,Catalog 命中测试:

1、 Pro/deepseek-ai/DeepSeek-V3.2
image

2、Qwen2.5-32B-Instruct
image

3、 Phase 2 兜底,无法命中catalog,使用默认值
image

@wuyuanfr

Copy link
Copy Markdown
Collaborator Author

人工填写时下拉框给出建议值
image

image

@wuyuanfr

Copy link
Copy Markdown
Collaborator Author

删除容量建议“检查”按钮。
添加单个模型时,点击连通性验证自动检测是否命中catalog
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant