✨ Feat: model capacity foundation — context management upgrade by wuyuanfr · Pull Request #3293 · ModelEngine-Group/nexent

wuyuanfr · 2026-06-24T03:24:18Z

Overview

Delivers the first three workstreams of the context-management production plan (W1, W2, W11).
Replaces the conflated max_tokens field with explicit context/input/output semantics, enforces the resolved budget at the LLM dispatch boundary, and gives operators a one-click "Suggest" path to populate capacity from an approved catalog.

136 commits, 77 files changed, ~+8.6K / -0.6K LOC. Working design notes are intentionally excluded from the diff (kept in doc/working/ locally for collaboration).

What changes

W1 — Correct token-capacity configuration

Splits the legacy max_tokens into five typed fields on model_record_t: context_window_tokens, max_input_tokens, max_output_tokens, default_output_reserve_tokens, tokenizer_family (+ capacity_source, capability_profile_version provenance).
New ModelCapacityResolver (sdk/nexent/core/models/capacity_resolver.py) produces a ModelCapacitySnapshot with provider_input_limit_tokens derived from min(max_input_tokens, context_window_tokens - requested_output_tokens).
Approved 12-entry CATALOG in backend/consts/capability_profiles.py covers OpenAI / DashScope / Silicon / DeepSeek production deployments.
Legacy max_tokens retained as a deprecated alias of max_output_tokens for migration compatibility; never used as a context threshold after this PR.

W2 — Output and safety capacity reserve

New SafeInputBudgetCalculator (sdk/nexent/core/models/capacity_budget.py) emits SafeInputBudgetSnapshot { hard_input_budget, soft_input_budget, uncertainty_reserve, requested_output } per dispatch.
10% uncertainty reserve (CM-016) when tokenizer / reasoning-window / provider-overhead behavior is unknown.
soft_limit_ratio defaults to 0.8 (CM-027); per-tenant override via tenant_config_t.config_key = 'context.soft_limit_ratio'.
Per-agent (ag_tenant_agent_t.requested_output_tokens) and per-request (AgentRequest.requested_output_tokens) output-reserve overrides (CM-028) with validation against the model's max_output_tokens.
Dispatch enforcement (CM-030) at sdk/nexent/core/models/openai_llm.py:391-412: rejects caller-supplied max_tokens that does not match the W2 snapshot's requested_output_tokens and pins the snapshot value before the provider call. This is the trusted server-side boundary required by the production plan.

W11 — Capacity suggestion on model add (post-acceptance follow-up to W1)

POST /model/suggest-capacity with catalog-exact / normalized / fuzzy matching, and base-url → provider inference mirroring the frontend PROVIDER_HINTS map (10 substring patterns).
GET /model/capacity-coverage surfaces "bare" LLM/VLM rows (used by the inline banner in the model management page and the provider management dialog).
Frontend: "Suggest" button in single-add / single-edit; capacity-coverage warning banners; legacy max_tokens migration prompt with explicit Apply button (no more silent promotion); Tokenizer Family input hidden on all four model-config surfaces (catalog hits still write the value silently; the field is consumed by tokenizer_registry which has no registered adapters today, so forcing operators to type it has no runtime effect).
context_window_tokens and max_output_tokens are no longer required in the UI. Empty input shows a gray placeholder (32_768 / 4_096, matching the SDK fallback constants _TOKEN_THRESHOLD_LEGACY_FALLBACK and _DEFAULT_REQUESTED_OUTPUT_TOKENS). On Save, defaults are substituted into the wire payload so the
bare-capacity badge clears automatically. Verified across all six write surfaces (single add/edit, batch top-defaults, batch per-row gear, provider per-row gear, provider bulk-apply broadcast); the bulk-apply path preserves "empty = do not broadcast" semantics.

Schema migrations

Two idempotent SQL files under docker/sql/:

1. Required — run before deploying W1/W2 code

v2.2.0_0615_context_management_capacity_schema.sql
-- Migration kind: REQUIRED_SCHEMA
-- Required for: all upgraded deployments before running W1/W2 context-management code.
Merges four prior ALTER TABLE ADD COLUMN migrations:

W1 capacity fields on model_record_t (context_window_tokens, max_output_tokens, default_output_reserve_tokens, tokenizer_family, plus the dependent
provider_input_limit_tokens)
W1 capacity_snapshot JSON column on model_monitoring_record_t
W2 requested_output_tokens override on ag_tenant_agent_t
W2 safe_input_budget_snapshot JSON column on model_monitoring_record_t

Skipping this surfaces as column does not exist once the new code runs.

2. Recommended — data fix, safe to skip on fresh deployments

v2.2.0_0617_context_management_capacity_data_fix.sql
-- Safe to skip when: fresh deployment, or operators will manually fill capacity fields.
Runs two passes in strict order:

Backfill W1 capacity fields on existing rows from the catalog (writes max_output_tokens among other fields).
Reconcile legacy max_tokens with the freshly populated max_output_tokens so both aliases agree.

Reversing the order would clobber the backfilled values.

All statements are IF NOT EXISTS / ON CONFLICT DO NOTHING. Existing rows whose capacity stays NULL continue to work through the SDK fallback until an operator
edits them.

Backward compatibility

Existing model_record_t rows with max_tokens populated and the new W1 columns NULL keep working; SDK promotes the legacy column as max_output_tokens at resolve time.
Direct chat.completions.create callers that previously passed max_tokens are now rejected unless the value matches the W2 snapshot. No other production call sites changed in this PR; the broader dispatch hardening across remaining bypasses is the W10 follow-up.
Feature flag CAPACITY_SUGGESTION_ENABLED gates the W11 endpoints; turning it off restores pre-W11 behavior with no UI surface for catalog suggestion.

Test coverage

SDK: test_capacity_resolver.py, test_capacity_budget.py, test_openai_llm.py (dispatch enforcement), test_monitoring.py (snapshot fields).
Backend: test_model_capacity_suggestion_service.py, test_model_management_service.py, test_config_utils.py, provider tests for the four batch-add adapters.
Frontend: tsc --noEmit clean; six-surface matrix manually verified.

Notes for reviewers

The tokenizer_registry is intentionally empty in this PR: tokenizer_family is persisted from the catalog and consumed downstream, but resolve() returns (FallbackEstimator, "estimated") for every value today. The 10% uncertainty reserve fires uniformly as a consequence — this is the documented W1 ADR conservative path until verified adapters land.
The frontend "Suggest" button never sends a provider_hint from the single-add dialog (the hidden default form.provider="modelengine" would otherwise pin catalog lookup to ModelEngine for every operator). Hint is only sent in batch mode where the dropdown is user-controlled.
Capacity suggestions never overwrite the user-chosen model_factory: the catalog's suggested_provider namespace (deepseek, openai, jina, ...) is a superset of the frontend dropdown's allowed values, and writing an unknown one back made models vanish from the active list / edit dropdown (root cause of the per-row reclassification bug we fixed during testing).

…rks to do.

…city-and-request-safety Add context management upgrade design documents: - Context management production plan (EN/CN) - Memory improvement analysis and architecture - 16 workstreams for context management upgrade

…view Add review documents and update workstreams: - Phase 1-5 review documents - Findings registry and impact analysis - Updated 16 workstreams with detailed specs - Context management weekly design summary (CN)

…egistry Introduces the contract surface for W1 (Correct Model Token-Capacity Configuration) so W2/W3 development can begin against stable types. No runtime behaviour change — resolver/registry implementations land in the follow-up PR. New modules: - sdk/nexent/core/models/capacity_resolver.py: CapabilityProfile and ModelCapacitySnapshot (Pydantic v2, frozen), typed ResolverError hierarchy, compute_fingerprint() implementing the SHA-256/canonical-JSON contract from W1 ADR Decision 3, RESOLVER_VERSION constant, and a resolve_capacity() stub. - sdk/nexent/core/models/tokenizer_registry.py: TokenizerAdapter Protocol, empty REGISTRY, FallbackEstimator (char/4 heuristic that always returns counting_mode='estimated'), and resolve() function. Family-name validation pattern enforces the naming convention fixed in the ADR. - backend/consts/capability_profiles.py: CATALOG with eight approved day-one entries (openai/gpt-4o, openai/gpt-4.1, dashscope/qwen-plus, qwen-turbo, glm-5.1, silicon DeepSeek-V4-Flash, Qwen3.6-27B, Kimi-K2.6) plus CATALOG_REVISION. Design reference: doc/working/context-management-workstreams/ W1_ADR_Capability_Catalog_Storage_and_Fingerprint.md (locally hosted; team sharing channel separate from this repo per doc/.gitignore policy). Smoke-tested: fingerprint is deterministic and order-independent across unknown_capabilities and field_sources; ModelCapacitySnapshot rejects mutation; tokenizer resolve() falls back to estimated for unknown families; resolve_capacity stub raises NotImplementedError; CATALOG imports cleanly with all 8 entries. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

feat(W1): add type skeleton for ModelCapacityResolver and tokenizer registry

Adds seven nullable capacity fields to model_record_t so the ModelCapacityResolver can read operator overrides per W1 ADR: - context_window_tokens - max_input_tokens - max_output_tokens - default_output_reserve_tokens - tokenizer_family - capacity_source - capability_profile_version All columns are nullable, no defaults that change semantics. Legacy max_tokens is left untouched and continues to behave as a deprecated output-cap alias until consumers migrate (separate follow-up). Touchpoints: - docker/sql/v2.2.0_0615_add_capacity_fields_to_model_record_t.sql: idempotent upgrade with ALTER TABLE ... ADD COLUMN IF NOT EXISTS + COMMENT ON COLUMN. - docker/init.sql: fresh-install CREATE TABLE inline plus COMMENT ON COLUMN. - k8s/helm/nexent/charts/nexent-common/files/init.sql: same for k8s deploys. - backend/database/db_models.py: ModelRecord ORM columns. - backend/consts/model.py: ModelRequest Pydantic schema fields so CRUD round-trips the new values. Design reference: doc/working/context-management-workstreams/ W1_ADR_Capability_Catalog_Storage_and_Fingerprint.md (Decision 1, schema). Verification: - ORM exposes all 7 columns - Pydantic ModelRequest exposes all 7 fields - All three SQL files contain 14 occurrences (column + COMMENT per field) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Move W1_ADR_Capability_Catalog_Storage_and_Fingerprint.md from context-management-workstreams to context-management-workstream/ADRs for better organization. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

feat(W1): add capacity columns to model_record_t (additive migration)

Replaces the resolve_capacity NotImplementedError stub with the real ModelCapacityResolver per W1 ADR. The resolver: - Looks up the (provider, model_name) entry in the capability profile catalog passed by the caller. - Merges operator overrides over the profile (operator wins). - Validates that hard capacity is known and not impossible (output cap cannot exceed combined window; capacities must be positive). - Defaults requested_output_tokens to the profile's default_output_reserve_tokens; rejects requests that exceed max_output_tokens. - Derives provider_input_limit_tokens as min(max_input_tokens, context_window_tokens - requested_output_tokens) using only the limits that are defined. - Asks tokenizer_registry for (adapter, counting_mode); records capability gaps in unknown_capabilities. - Computes the deterministic SHA-256/canonical-JSON fingerprint from the resolved contract and builds an immutable ModelCapacitySnapshot. The resolver stays pure: the SDK never reads DB or env; backend callers supply the capability_profiles dict and operator_overrides. This matches CLAUDE.md's SDK layer rules. Typed failures raised on invalid input: - ProviderCapabilityUnknown (no hard capacity) - InvalidCapacityConfiguration (non-positive values, output > window, derived input limit non-positive) - RequestedOutputExceedsCap (request above max_output_tokens) Tests (15, all passing): - Catalog lookup + override precedence - Uncataloged with operator-supplied capacity - Rejection: missing capacity, impossible values, negative values, requested-output overflow - Default requested_output behavior - Separate-input-limit path (synthetic, no day-one model uses it) - Combined window + separate input limit takes minimum - Snapshot immutability (Pydantic ValidationError on mutation) - Fingerprint determinism and sensitivity to request changes - Tokenizer estimated-mode flag appears in unknown_capabilities Design reference: doc/working/context-management-workstreams/ W1_ADR_Capability_Catalog_Storage_and_Fingerprint.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

feat(W1): implement resolve_capacity with catalog + operator override

…LLM output cap ModelConfig (sdk/nexent/core/agents/agent_model.py): - Add max_output_tokens as the preferred name per W1 ADR. - Keep max_tokens as a deprecated alias; a model_validator backfills the unset side so old and new callers both work during migration. - Add the remaining capacity-snapshot fields so a ModelConfig can carry the resolved values from backend service down to the SDK: context_window_tokens, max_input_tokens, default_output_reserve_tokens, tokenizer_family, capacity_source, capability_profile_version. OpenAIModel (sdk/nexent/core/models/openai_llm.py): - Accept max_output_tokens (preferred) and max_tokens (deprecated). If only the legacy name is passed, log a debug and remap to max_output_tokens. - Internal attribute renamed to self.max_output_tokens; self.max_tokens is kept as an alias for any reader. - chat.completions.create still receives wire field max_tokens; only the internal name changed. NexentAgent.create_model (sdk/nexent/core/agents/nexent_agent.py): - Construct OpenAIModel with max_output_tokens=model_config.max_output_tokens so the new name flows through end-to-end. Backward compatibility: - Existing callers that set ModelConfig.max_tokens see no behavior change (validator copies it into max_output_tokens; the wire payload is identical). - Existing callers reading OpenAIModel.max_tokens see no behavior change (alias attribute returns the same value). Verified by table-driven smoke test of all four (max_tokens, max_output_tokens) combinations on ModelConfig. Design reference: doc/working/context-management-workstreams/W1_*.md and W1 ADR. Provider adapters (step 3) and create_agent_info (step 6) follow. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…p legacy max_tokens Replaces the long-standing bug where `model_info['max_tokens']` (a deprecated output cap, semantically wrong) was assigned to ContextManagerConfig.token_threshold (an input/context budget). The fix wires ModelCapacityResolver into the runtime path so the context manager receives a real input budget derived from the capacity snapshot. Changes in backend/agents/create_agent_info.py: - Add _resolve_input_budget(model_info): pulls operator overrides from the new model_record_t capacity columns, calls resolve_capacity(...) with the CATALOG from backend.consts.capability_profiles, and returns snapshot.provider_input_limit_tokens. - On ProviderCapabilityUnknown (uncataloged model with no operator-supplied hard capacity), falls back to a safe constant _TOKEN_THRESHOLD_LEGACY_FALLBACK (8192) so the migration window doesn't break existing setups. Logged prominently so admins know to backfill. - create_agent_config: stops reading model_info['max_tokens'] and passes the resolved input_budget into ContextManagerConfig.token_threshold. - create_model_config_list: passes all seven new capacity columns (context_window_tokens, max_input_tokens, max_output_tokens, default_output_reserve_tokens, tokenizer_family, capacity_source, capability_profile_version) through to the SDK ModelConfig so end-to-end capacity flow works. This is the end of the legacy max_tokens-as-context-threshold confusion. ModelConfig.max_tokens stays as a deprecated alias per W1 step 4; this commit removes its only known misuse from the runtime path. The fallback constant is intentionally conservative — it kicks compression early for unmigrated models so behavior degrades gracefully rather than overflowing provider context. W2 will subtract its 10% uncertainty reserve on top of the resolver's output once enforcement phase begins. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…neering methodology and recommendations for Nexent's evolution

…nexent into doc/context-management-upgrade

Restore W1_ADR_Capability_Catalog_Storage_and_Fingerprint.md from doc/context-management-upgrade branch to context-management-workstreams/ADRs directory. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

Persist resolved model capacity snapshot metadata on model monitoring records so per-request telemetry can report total window, output reserve, safe input budget, source, tokenizer mode, unknown capabilities, and fingerprint. - add nullable monitoring columns to ORM, fresh-install SQL, and idempotent upgrade migration - bind resolved capacity snapshots from agent creation into SDK monitoring context - enrich LLM, client-level, and record_model_call monitoring rows with snapshot fields - cover enqueue and ORM payload behavior in SDK monitoring tests Verification: - env PYTHONPATH=/home/feiran/nexent/sdk:/home/feiran/nexent:/home/feiran/nexent/backend uv run --project /home/feiran/nexent/backend pytest --rootdir=/home/feiran/nexent --import-mode=importlib /home/feiran/nexent/test/sdk/monitor/test_monitoring.py - env PYTHONPATH=/home/feiran/nexent/sdk:/home/feiran/nexent:/home/feiran/nexent/backend uv run --project /home/feiran/nexent/backend pytest --rootdir=/home/feiran/nexent --import-mode=importlib /home/feiran/nexent/test/sdk/core/models/test_capacity_resolver.py - env PYTHONPATH=/home/feiran/nexent/sdk:/home/feiran/nexent:/home/feiran/nexent/backend uv run --project /home/feiran/nexent/backend python -m py_compile backend/agents/create_agent_info.py backend/database/db_models.py sdk/nexent/core/agents/agent_model.py sdk/nexent/core/agents/run_agent.py sdk/nexent/monitor/monitoring.py sdk/nexent/monitor/__init__.py Co-Authored-By: Codex <codex@openai.com>

Expose provider-supplied token-capacity metadata as advisory candidate fields in discovery responses without promoting them into persisted model records. - add shared candidate extraction for common context, output, input, reserve, and tokenizer aliases - wire SiliconFlow, DashScope, TokenPony, and ModelEngine adapters to attach provider_candidate hints when present - keep prepare_model_dict from persisting provider_candidate fields automatically - cover positive and no-hint paths for provider discovery Verification: - env PYTHONPATH=/home/feiran/nexent/sdk:/home/feiran/nexent:/home/feiran/nexent/backend uv run --project /home/feiran/nexent/backend pytest --rootdir=/home/feiran/nexent --import-mode=importlib /home/feiran/nexent/test/backend/services/providers/test_silicon_provider.py /home/feiran/nexent/test/backend/services/providers/test_dashscope_provider.py /home/feiran/nexent/test/backend/services/providers/test_tokenpony_provider.py /home/feiran/nexent/test/backend/services/providers/test_modelengine_provider.py /home/feiran/nexent/test/backend/services/test_model_provider_service.py::test_prepare_model_dict_does_not_persist_provider_capacity_candidates - env PYTHONPATH=/home/feiran/nexent/sdk:/home/feiran/nexent:/home/feiran/nexent/backend uv run --project /home/feiran/nexent/backend python -m py_compile backend/services/providers/base.py backend/services/providers/silicon_provider.py backend/services/providers/dashscope_provider.py backend/services/providers/tokenpony_provider.py backend/services/providers/modelengine_provider.py Co-Authored-By: Codex <codex@openai.com>

Add explicit model-capacity controls to model management so operators can promote known capacity values through the existing model create and update flows. - extend frontend model types and service request/response mappings for capacity fields - add shared capacity form controls with tokenizer autocomplete, source badge, profile version text, and legacy max_tokens warning - wire capacity validation and operator payloads into Add/Edit Model dialogs - localize labels, tooltips, source names, and validation messages in en/zh Verification: - npm run type-check - node -e "const fs=require('fs'); for (const f of ['frontend/public/locales/en/common.json','frontend/public/locales/zh/common.json']) { JSON.parse(fs.readFileSync(f,'utf8').replace(/^\uFEFF/,'')); } console.log('locale json ok')" Co-Authored-By: Codex <codex@openai.com>

Review and accept decisions for 5 findings: - CM-018: structural validation blocks commit, semantic quality routes to W15 SLO - CM-021: source lineage + mandatory presence validation blocks, semantic coverage to W15 - CM-024: use claim-scoped production readiness terminology - CM-017: finite initial conflict set with explicit unresolved failure - CM-025: subagent as independent agent with parent_session_id, async tool delegation, no recursion Updated: finding-review-decisions.md, findings-registry.md (20/26 complete), W4, W6, W10, W11, W12, W13, parent plan. Added: pending-findings-decision-sheet.md for decision tracking. Remaining 6 findings (CM-009, CM-010, CM-014, CM-015, CM-022, CM-026) pending individual discussion.

…lease 1 gates Remove multimodal testing from Release 1 SLO gates. W15 covers text modality only; add modality contracts when specific product requirements emerge. Updated: finding-review-decisions.md, findings-registry.md (21/26 complete), W15, W3, pending-findings-decision-sheet.md.

…ents Architectural simplification: checkpoints are no longer an independent subsystem (W7). Compression results are stored as compression.snapshot events within the W5 execution event log. Recovery finds the latest compression.snapshot event and replays subsequent events. Eliminates: - Independent checkpoint table and CAS concurrency control - Redis checkpoint cache layer - W8 checkpoint-specific validation - CM-014 checkpoint schema migration (covered by CM-005) - W7 publication outbox for cross-system consistency Updated: W5 (compression.snapshot event type, recovery flow, dirty-state flush), W6, W8, W9, W13, W14, W15, parent plan, README, review artifacts. Deleted: W7_Durable_Multi_Worker_Context_State.md. CM-014 marked N/A (22/26 findings complete).

…plementation measurement Do not pre-define workload envelopes. After W1-W16 implementation, use W15 measurement infrastructure to collect real performance data and define envelopes based on observed data. No production-scale claim until envelopes are defined. Aligns with CM-004 (measure before optimizing) and CM-011 (evidence-based gates). Progress: 23/26 findings complete.

…mentation measurement Do not pre-define numeric availability, RPO, RTO, rebuild time, queue lag, or storage capacity targets. After W1-W16 implementation, use W15 measurement infrastructure to collect real recovery/availability data per topology and define targets based on observed data. No production-scale claim until targets are defined. Aligns with CM-009 (measure before defining envelopes) and CM-011 (evidence-based gates). Progress: 24/26 findings complete.

…ata validation W7 retirement eliminates the primary O(history) hashing consumer. Replace content hashing with metadata-based validation at three points: 1. compression.snapshot: partial_after_erasure + version fields 2. W6 materialized cache: snapshot validity + event count + version fields 3. Physical erasure: one-time partial_after_erasure flag No Merkle trees or segmented hashing needed. Storage-layer integrity handled by database checksums, not W8. Progress: 25/26 findings complete.

…ed OpenTelemetry spec Consolidate all decision trace requirements (W5, W6, W10, W15) into a single unified telemetry/observability specification (low priority, post-core). Use OpenTelemetry-style spans/attributes/events collected by external observability infrastructure, not product-internal persistence. Updated: W15 (replace decision trace persistence with OTel output), parent plan (replace decision trace references with unified telemetry spec), finding-review-decisions.md, findings-registry.md (26/26 complete), pending-findings-decision-sheet.md. All 26 findings now reviewed and decided.

Step 7 added capacity controls to ModelEditDialog (the OpenAI-API-Compatible "custom model" edit path) but missed ProviderConfigEditDialog, the dialog opened by the per-model gear icon under provider-categorized sections (SiliconFlow / DashScope / TokenPony / ModelEngine). For any model whose model_factory matches a recognized provider — including the W1 catalog keys 'dashscope' / 'silicon' / 'tokenpony' — that gear icon was the only edit path, leaving operators no way to set context_window_tokens et al. Changes: - ProviderConfigEditDialog: accept optional initialCapacity and hideCapacityFields props; render ModelCapacityFields when supported; include capacity payload in onSave callback shape. - modelService.updateBatchModel: accept and forward the 6 capacity fields (context_window_tokens, max_input_tokens, max_output_tokens, default_output_reserve_tokens, tokenizer_family, capacity_source) to the existing batch_update_models endpoint, which already pass-throughs arbitrary update_data per backend/services/model_management_service.py line 347. - ModelDeleteDialog single-model gear path: pass current capacity values from selectedSingleModel as initialCapacity, and forward saved capacity fields into the updateBatchModel call. - ModelDeleteDialog provider-level "Edit Config" path: pass hideCapacityFields={true} since handleProviderConfigSave applies settings batch-wise to all models from one provider and per-model capacity is not a batch concept. No behavior change for callers that don't pass initialCapacity (backward compatible). Verified with npm run type-check. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…odules pollution Two tests (test_get_models_llm_success, test_get_models_embedding_success) failed intermittently when test_model_provider_service.py ran after test_capacity_resolver.py or test_silicon_provider.py. Root cause: silicon_provider is loaded under two distinct sys.modules keys — `services.providers.silicon_provider` (the path production code uses) and `backend.services.providers.silicon_provider` (the path some test files use). Each binding gets its own `SILICON_GET_URL` attribute because `silicon_provider.py` does `from consts.provider import SILICON_GET_URL`, which copies the value into the importing module's namespace. When both keys are present, mock.patch targeting only the `backend.` path silently fails to override the value used by the production code path that SiliconModelProvider.get_models executes. Fix: introduce _patch_provider_module_constant context manager that patches the named attribute on every loaded copy of the module. Apply to all four SILICON_GET_URL mock.patch sites in this file. Verification: - 289 tests pass under the previously-failing combined order: test/sdk/core/models/test_capacity_resolver.py + test/sdk/monitor/test_monitoring.py + test/backend/services/providers/ + test/backend/services/test_model_provider_service.py The helper is order-independent and safe even when one of the two sys.modules paths is absent. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ement-upgrade-no-working-docs # Conflicts: # backend/agents/create_agent_info.py # test/sdk/core/models/test_openai_llm.py

…rfaces The Tokenizer Family input was rendered on Add, Edit, batch Add, and the provider-level "bulk modify config" surfaces. Per the W1 ADR the value is consumed only by `sdk/nexent/core/models/tokenizer_registry.resolve`, which today has no registered adapters and unconditionally returns `(FallbackEstimator, "estimated")` -- so the input never affects runtime behavior and forcing operators to type/choose it surfaces an irrelevant implementation detail. Hidden, not removed: the field stays in form state, payload builders, batch row mapping, and DB. W11 catalog suggestions still write it silently, existing DB values are still preserved through edits, and any future adapter registration becomes a one-line change with no UI work. Backend/SDK fully decoupled: - backend `consts/model.py` request schemas keep `tokenizer_family` - catalog entries in `consts/capability_profiles.py` still set it - SDK consumes it via `tokenizer_registry.resolve` and W2's `_UNKNOWN_CAPABILITIES_REQUIRING_RESERVE` continues to trigger the 10% reserve when counting_mode is estimated Changes in this commit: - ModelCapacityFields.tsx: drop the AutoComplete input block + the `TOKENIZER_FAMILY_OPTIONS` constant + the `AutoComplete` import + the `hideTokenizer` prop (interface + destructure) - ModelEditDialog.tsx: drop the `hideTokenizer` prop from the bulk-apply call site and the now-stale "Tokenizer hidden" comment - zh/en common.json: drop the two unused locale keys Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…aults Both fields are no longer required at any of the six capacity write surfaces. An empty input renders a gray placeholder showing what value would land if the user saves without typing; the form state stays "" so nothing is silently mutated client-side. At save time, the wire-payload builder substitutes the default into the API call only when the operator truly left the field empty -- otherwise the typed value (or existing DB value loaded into the form) is sent unchanged. Defaults chosen to mirror the existing SDK fallbacks so observed runtime behavior does not change when defaults land: - DEFAULT_CONTEXT_WINDOW_TOKENS = 32_768 (matches `_TOKEN_THRESHOLD_LEGACY_FALLBACK` in capacity_resolver.py) - DEFAULT_MAX_OUTPUT_TOKENS = 4_096 (matches `_DEFAULT_REQUESTED_OUTPUT_TOKENS` in capacity_resolver.py) Constants exported from ModelCapacityFields.tsx so the snake_case mirror in ModelAddDialog stays in sync. Six-surface contract -- single-row write paths apply defaults; the bulk-apply broadcast preserves "empty means do not broadcast": - 1) ModelAddDialog single-add form -> capacityFormToSnakePayload applies defaults - 2) ModelEditDialog single-edit form -> buildCapacityPayload (applyDefaults=true default) - 3) ModelAddDialog batch-import top-defaults panel -> capacityFormToSnakePayload(form) for batchDefaults; per-row `model.X ?? batchDefaults.X` now never falls through to undefined in the gate at isFormValid (the gate becomes defense-in-depth, comment updated) - 4) ModelAddDialog batch per-row gear (Settings Modal) -> capacityFormToSnakePayload(modelCapacity); preload-from-row-or- batch-default means "no-op save" already carries non-empty input and goes through toInt unchanged. Only "row=NULL plus batch-empty" materializes the defaults - 5) ProviderConfigEditDialog per-row gear (hideCapacityFields=false) -> buildCapacityPayload(capacityForm) - 6) ProviderConfigEditDialog "modify config" bulk-apply (hideCapacityFields=true) -> buildCapacityPayload(form, { applyDefaults: false }); `applyDefaultsOnEmpty={false}` on the panel suppresses the gray placeholder so operators do not read "empty means 32K/4K will be broadcast" requiredFields stripped from every validateCapacityForm call site and every ModelCapacityFields prop usage. validateCapacityForm still enforces the data-shape checks (positive integers, output <= window, reserve <= output) -- those are not affected by removing the "must be non-empty" requirement. Backend and SDK unchanged: the wire payload still ships the same snake_case keys; the only difference is that on save, those keys are guaranteed to carry a number (not null) for single-row writes, which makes the `_is_bare_capacity_model` badge and the W11 catalog-coverage banner clear themselves automatically for new rows. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Three failure clusters reported by CI after merging upstream/develop into this PR branch: 1) test_prepare_agent_run -- assert_called_once_with(...) on create_agent_run_info was missing `tool_params=None`. Production code at agent_service.py:2245 now passes `tool_params=agent_request.tool_params` and AgentRequest defaults `tool_params` to None when the fixture does not set it. Add the kwarg to the expected call. 2) update_agent_info_impl_* (14 tests) -- W2 added `_validate_requested_output_tokens_for_agent(request, tenant_id)` at agent_service.py:1164. The validator reads `request.requested_output_tokens` and compares it against the model's `max_output_tokens`. The existing tests build their request via `MagicMock(spec=AgentInfoRequest)` and never set `requested_output_tokens`, so: - either the spec exposes the field as a fresh MagicMock and the `> max_output_tokens` comparison fails with TypeError, - or Pydantic-v2 field introspection through dir() omits the name and the access AttributeErrors. Both branches are unrelated to what these tests cover, so this commit adds a module-level autouse fixture that stubs the validator to a no-op. Tests that want to exercise the validator in the future can still patch it locally; module-level autouse loses to per-test patches. 3) test_import_agent_by_agent_id_publish_version_error -- import_agent_by_agent_id reads `import_agent_info.requested_output_tokens` directly at agent_service.py:1874 (no validator involved), so the autouse fixture from (2) does not help. Set `mock_agent_info.requested_output_tokens = None` on the existing `MagicMock(spec=ExportAndImportAgentInfo)` so the access returns a defined value instead of AttributeErroring. 4) test_create_model_success / test_create_model_deep_thinking_success (test_nexent_agent.py) -- W1 renamed the SDK's OpenAIModel kwarg from `max_tokens` to `max_output_tokens`. The two `assert_called_once_with` blocks still asserted on the old name. Updated to `max_output_tokens`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ponse shape The production response shape at agent_service.py:1112 now includes `requested_output_tokens` (added by W2). The mocked `search_agent_info` payload does not include the key, so the function returns `None` for it via `.get(...)`. Add the key to expected_result to match. test_import_agent_by_agent_id_publish_version_error still fails for an unrelated reason: `create_agent`'s `mock.return_value` is configured to `{"agent_id": 100}` but the test result shows `create_agent(...)` returning the auto-MagicMock instead of the dict. Static analysis of the patch wiring shows nothing wrong; needs a local repro to inspect the mock state. Saving the partial progress first. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…lish_version_error The test claimed to verify "import_agent_by_agent_id swallows publish_version_impl exceptions and still returns the new agent id", but the three lines that actually configure the patched mocks were missing from the body: mock_query_tools.return_value = [] mock_create.return_value = {"agent_id": 100} mock_publish.side_effect = Exception("Publish error") Without them every patched mock returned the default auto-MagicMock, so `create_agent(...)` returned a MagicMock instead of the dict, `new_agent["agent_id"]` returned `MagicMock.__getitem__()`, publish_version_impl never raised, and `assert result == 100` failed against the MagicMock return value. Likely lost during the upstream/develop merge that introduced `requested_output_tokens` to the import flow (the missing-attribute error surfaced first, masking the deeper issue). Adding the three configuration lines back lets the test exercise the actual code path it was designed to cover. Verified locally: full test_agent_service.py passes 217/217. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

YehongPan · 2026-06-24T04:48:32Z

🔍 Code Review Comments

1. [安全/漏洞] _CAPACITY_WARNING_EMITTED 线程安全问题
_CAPACITY_WARNING_EMITTED 是模块级 set()，在多线程/多协程环境下无锁保护，存在竞态条件。应使用 threading.Lock 或确认 set 操作的原子性。

2. [逻辑漏洞] _resolve_input_budget 空 provider 静默 fallback
model_info.get("model_factory") 返回 None 时，provider 会是空字符串，但 resolve_capacity 可能不接受空 provider，导致静默 fallback 到 legacy threshold，掩盖配置错误。建议对空 provider 显式记录 WARNING。

3. [代码规范] create_agent_config 参数缩进不一致
函数签名中 request_requested_output_tokens 参数缩进缺少前导空格，违反 PEP 8 参数对齐规则。

YehongPan

Code Review

[安全/漏洞] _CAPACITY_WARNING_EMITTED 是模块级 set()，在多线程/多协程环境下无锁保护，存在竞态条件。应使用 threading.Lock 或确认 set 操作的原子性。
[逻辑漏洞] _resolve_input_budget 中 model_info.get("model_factory") 返回 None 时，provider 会是空字符串，resolve_capacity 可能不接受空 provider，导致静默 fallback 到 legacy threshold，掩盖配置错误。建议对空 provider 显式记录 WARNING。
[代码规范] create_agent_config 函数签名中 request_requested_output_tokens 参数缩进缺少前导空格，违反 PEP 8 参数对齐规则。

…edup with a lock Two small fixes reported during review: 1) `request_requested_output_tokens` in the `create_agent_config` signature was flush-left (zero indent) while every other parameter sits at four-space indent. Python's parser tolerates this inside parentheses, but linters and humans both stumble on it. Re-indent to align with the rest of the signature. 2) `_CAPACITY_WARNING_EMITTED` is a per-process dedup set for the "model has no W1/W2 capacity configured" operator warning. The `if dedup_key in S: return; S.add(dedup_key)` pattern was a check-then-add race: two threads on the same model could both pass the membership test before either added, leading to duplicate WARNING lines that defeat the per-process dedup contract. Wrap the test-and-set in a `threading.Lock`. The lock is released before `logger.warning(...)` so warning I/O is not serialised across paths; only the dedup decision is. Verified locally: test/backend/agents/test_create_agent_info.py 171/171 passes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

wuyuanfr · 2026-06-24T06:44:45Z

🔍 Code Review Comments

1. [安全/漏洞] _CAPACITY_WARNING_EMITTED 线程安全问题 _CAPACITY_WARNING_EMITTED 是模块级 set()，在多线程/多协程环境下无锁保护，存在竞态条件。应使用 threading.Lock 或确认 set 操作的原子性。

2. [逻辑漏洞] _resolve_input_budget 空 provider 静默 fallback model_info.get("model_factory") 返回 None 时，provider 会是空字符串，但 resolve_capacity 可能不接受空 provider，导致静默 fallback 到 legacy threshold，掩盖配置错误。建议对空 provider 显式记录 WARNING。

3. [代码规范] create_agent_config 参数缩进不一致 函数签名中 request_requested_output_tokens 参数缩进缺少前导空格，违反 PEP 8 参数对齐规则。

1 3 is same with what @JasonW404 mentioned, the issue was fixed in commit https://github.com/ModelEngine-Group/nexent/commit/72e378eaafab2eabf8555357984ca3e6436094c2.\

fix 2 in 10a41ca

wuyuanfr · 2026-06-24T06:56:31Z

1、模型配置界面，添加单个模型，弃用原本的“最大Token数”（模糊了“上下文窗口”和“最大输出Token数”两个概念） 2、增加 “上下文窗口” “最大输入Token数” “最大输出Token数” “输出预留Token数” 四个容量有关的配置供模型管理员填写（不填也可以添加模型，落库默认值）

wuyuanfr · 2026-06-24T06:58:15Z

点击使用建议后，匹配到的验证值填入输入框

WMC001

Observation: comprehensive capacity management refactor

The model capacity foundation changes are extensive and well-structured. The separation of W1 (provider capacity profiles) and W2 (per-agent requested_output_tokens overrides) is clearly documented. The _coerce_legacy_max_tokens_alias defense-in-depth pattern and the _capacity_suggestion_coverage_errors_total OpenTelemetry counter for silent failures are particularly thoughtful.

No bugs found in the backend Python layer. The implementation is robust with proper error handling, null checks, and fallback strategies throughout.

WMC001

Bug 1 (CRITICAL): Wrong tuple element order in _resolve_input_budget — production crash

backend/agents/create_agent_info.py — _resolve_input_budget returns a 3-tuple with the 2nd and 3rd elements in the wrong order relative to the caller's unpacking.

The function returns:

return (
    snapshot.provider_input_limit_tokens,         # [0] → int        (correct)
    _capacity_snapshot_for_monitoring(snapshot), # [1] → dict       (monitoring)
    snapshot,                                    # [2] → ModelCapacitySnapshot
)

But the call site unpacks as:

input_budget, capacity_snapshot, resolved_capacity_snapshot = _resolve_input_budget(model_info)

So capacity_snapshot receives the monitoring dict and resolved_capacity_snapshot receives the ModelCapacitySnapshot. Then _resolve_safe_input_budget is called with capacity_snapshot=resolved_capacity_snapshot — but _resolve_safe_input_budget internally passes capacity_snapshot (the dict) to SafeInputBudgetCalculator.calculate_safe_input_budget(), which accesses typed Pydantic attributes (snapshot.provider_input_limit_tokens, etc.). A plain dict has no such attributes — this raises AttributeError at runtime for every agent that uses the new W2 context management path.

Fix: swap the 2nd and 3rd return values in _resolve_input_budget:

return (
    snapshot.provider_input_limit_tokens,
    snapshot,                                        # ModelCapacitySnapshot goes 2nd
    _capacity_snapshot_for_monitoring(snapshot),     # monitoring dict goes 3rd
)

WMC001

Bug 2 (MEDIUM): None.get() crash in _validate_requested_output_tokens_for_agent

backend/services/agent_service.py — if get_model_by_model_id returns None, then model_info.get("max_output_tokens") raises AttributeError (None has no .get()). The existing if model_info else None guard is correct, but if the model record is an empty dict {}, max_output_tokens becomes None and the validation is silently skipped. Additionally, if max_output_tokens = 0 is stored in the DB (falsy but not None), any positive requested_output_tokens passes validation incorrectly.

Fix: add explicit type check and > 0 guard:

model_info = get_model_by_model_id(model_id, tenant_id=tenant_id)
if not isinstance(model_info, dict):
    return  # or log
max_output_tokens = model_info.get("max_output_tokens")
if max_output_tokens is not None and max_output_tokens <= 0:
    return

wuyuanfr · 2026-06-24T09:15:41Z

Bug 1 (CRITICAL): Wrong tuple element order in _resolve_input_budget — production crash

backend/agents/create_agent_info.py — _resolve_input_budget returns a 3-tuple with the 2nd and 3rd elements in the wrong order relative to the caller's unpacking.

The function returns:
return (
    snapshot.provider_input_limit_tokens,         # [0] → int        (correct)
    _capacity_snapshot_for_monitoring(snapshot), # [1] → dict       (monitoring)
    snapshot,                                    # [2] → ModelCapacitySnapshot
)
But the call site unpacks as:
input_budget, capacity_snapshot, resolved_capacity_snapshot = _resolve_input_budget(model_info)
So capacity_snapshot receives the monitoring dict and resolved_capacity_snapshot receives the ModelCapacitySnapshot. Then _resolve_safe_input_budget is called with capacity_snapshot=resolved_capacity_snapshot — but _resolve_safe_input_budget internally passes capacity_snapshot (the dict) to SafeInputBudgetCalculator.calculate_safe_input_budget(), which accesses typed Pydantic attributes (snapshot.provider_input_limit_tokens, etc.). A plain dict has no such attributes — this raises AttributeError at runtime for every agent that uses the new W2 context management path.

Fix: swap the 2nd and 3rd return values in _resolve_input_budget:
return (
    snapshot.provider_input_limit_tokens,
    snapshot,                                        # ModelCapacitySnapshot goes 2nd
    _capacity_snapshot_for_monitoring(snapshot),     # monitoring dict goes 3rd
)

tuple 顺序与解包匹配，传给 W2 的关键字参数 capacity_snapshot=resolved_capacity_snapshot 绑定的是右边的 typed 变量，不是位置 [1] 的 dict。位置 [1]
的 dict 只流向 AgentConfig.capacity_snapshot（监控/序列化用途）。如果非要更稳，可把变量名调成 capacity_snapshot_dict / capacity_snapshot 来减少阅读歧义，但不是 bug。

JasonW404 and others added 30 commits June 11, 2026 16:10

Doc: Add design for upgrading context management in nexent with 16 wo…

0ee0bb3

…rks to do.

docs: complete context management production review

7dc2d61

Merge pull request #1 from liudfgoo/feature/w1-capacity-skeleton

d19937f

feat(W1): add type skeleton for ModelCapacityResolver and tokenizer registry

Merge pull request #2 from liudfgoo/feature/w1-capacity-db-migration

690ca7d

feat(W1): add capacity columns to model_record_t (additive migration)

Merge pull request #3 from liudfgoo/feature/w1-capacity-resolver-impl

1ef4823

feat(W1): implement resolve_capacity with catalog + operator override

feat(loop-engineering): add comprehensive insight report on Loop Engi…

c8e9582

…neering methodology and recommendations for Nexent's evolution

Merge branch 'doc/context-management-upgrade' of github.com:liudfgoo/…

fce7753

…nexent into doc/context-management-upgrade

fix(W1): clarify optional capacity fields

76c1f7b

fix(web): bind production server to all interfaces

88d849d

wuyuanfr and others added 5 commits June 23, 2026 16:29

chore: exclude working docs from PR

1899172

Merge remote-tracking branch 'upstream/develop' into pr/context-manag…

fdbf948

…ement-upgrade-no-working-docs # Conflicts: # backend/agents/create_agent_info.py # test/sdk/core/models/test_openai_llm.py

test: update create_agent_info stubs for capacity modules

1055165

wuyuanfr requested review from Dallas98 and WMC001 as code owners June 24, 2026 03:24

JasonW404 mentioned this pull request Jun 24, 2026

feat: add LiteLLM as unified LLM provider #3182

Open

JasonW404 reviewed Jun 24, 2026

View reviewed changes

Comment thread backend/agents/create_agent_info.py Outdated

Comment thread backend/agents/create_agent_info.py

YehongPan reviewed Jun 24, 2026

View reviewed changes

Comment thread backend/apps/model_managment_app.py Outdated

YehongPan reviewed Jun 24, 2026

View reviewed changes

Comment thread backend/apps/model_managment_app.py

YehongPan reviewed Jun 24, 2026

View reviewed changes

Comment thread backend/apps/model_managment_app.py

wuyuanfr and others added 2 commits June 24, 2026 14:11

fix: tighten capacity suggestion error handling

10a41ca

fix: remove stale deepseek capacity backfill

f88eead

WMC001 reviewed Jun 24, 2026

View reviewed changes

Comment thread docker/sql/v2.2.2_0622_update_left_nav_menu.sql

chore: consolidate capacity migration sql

611ae4a

WMC001 reviewed Jun 24, 2026

View reviewed changes

wuyuanfr requested a review from WMC001 June 24, 2026 09:22

Uh oh!

Conversation

wuyuanfr commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

What changes

W1 — Correct token-capacity configuration

W2 — Output and safety capacity reserve

W11 — Capacity suggestion on model add (post-acceptance follow-up to W1)

Schema migrations

Backward compatibility

Test coverage

Notes for reviewers

Uh oh!

Uh oh!

Uh oh!

YehongPan commented Jun 24, 2026

🔍 Code Review Comments

Uh oh!

YehongPan left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wuyuanfr commented Jun 24, 2026

🔍 Code Review Comments

Uh oh!

wuyuanfr commented Jun 24, 2026

Uh oh!

wuyuanfr commented Jun 24, 2026

Uh oh!

Uh oh!

WMC001 left a comment

Choose a reason for hiding this comment

Uh oh!

WMC001 left a comment

Choose a reason for hiding this comment

Uh oh!

WMC001 left a comment

Choose a reason for hiding this comment

Uh oh!

wuyuanfr commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wuyuanfr commented Jun 24, 2026 •

edited

Loading