Skip to content

✨ Feat: model capacity foundation — context management upgrade#3293

Open
wuyuanfr wants to merge 143 commits into
ModelEngine-Group:developfrom
liudfgoo:pr/context-management-upgrade-no-working-docs
Open

✨ Feat: model capacity foundation — context management upgrade#3293
wuyuanfr wants to merge 143 commits into
ModelEngine-Group:developfrom
liudfgoo:pr/context-management-upgrade-no-working-docs

Conversation

@wuyuanfr

@wuyuanfr wuyuanfr commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Overview

Delivers the first three workstreams of the context-management production plan (W1, W2, W11).
Replaces the conflated max_tokens field with explicit context/input/output semantics, enforces the resolved budget at the LLM dispatch boundary, and gives operators a one-click "Suggest" path to populate capacity from an approved catalog.

136 commits, 77 files changed, ~+8.6K / -0.6K LOC. Working design notes are intentionally excluded from the diff (kept in doc/working/ locally for collaboration).

What changes

W1 — Correct token-capacity configuration

  • Splits the legacy max_tokens into five typed fields on model_record_t: context_window_tokens, max_input_tokens, max_output_tokens, default_output_reserve_tokens, tokenizer_family (+ capacity_source, capability_profile_version provenance).
  • New ModelCapacityResolver (sdk/nexent/core/models/capacity_resolver.py) produces a ModelCapacitySnapshot with provider_input_limit_tokens derived from min(max_input_tokens, context_window_tokens - requested_output_tokens).
  • Approved 12-entry CATALOG in backend/consts/capability_profiles.py covers OpenAI / DashScope / Silicon / DeepSeek production deployments.
  • Legacy max_tokens retained as a deprecated alias of max_output_tokens for migration compatibility; never used as a context threshold after this PR.

W2 — Output and safety capacity reserve

  • New SafeInputBudgetCalculator (sdk/nexent/core/models/capacity_budget.py) emits SafeInputBudgetSnapshot { hard_input_budget, soft_input_budget, uncertainty_reserve, requested_output } per dispatch.
  • 10% uncertainty reserve (CM-016) when tokenizer / reasoning-window / provider-overhead behavior is unknown.
  • soft_limit_ratio defaults to 0.8 (CM-027); per-tenant override via tenant_config_t.config_key = 'context.soft_limit_ratio'.
  • Per-agent (ag_tenant_agent_t.requested_output_tokens) and per-request (AgentRequest.requested_output_tokens) output-reserve overrides (CM-028) with validation against the model's max_output_tokens.
  • Dispatch enforcement (CM-030) at sdk/nexent/core/models/openai_llm.py:391-412: rejects caller-supplied max_tokens that does not match the W2 snapshot's requested_output_tokens and pins the snapshot value before the provider call. This is the trusted server-side boundary required by the production plan.

W11 — Capacity suggestion on model add (post-acceptance follow-up to W1)

  • POST /model/suggest-capacity with catalog-exact / normalized / fuzzy matching, and base-url → provider inference mirroring the frontend PROVIDER_HINTS map (10 substring patterns).
  • GET /model/capacity-coverage surfaces "bare" LLM/VLM rows (used by the inline banner in the model management page and the provider management dialog).
  • Frontend: "Suggest" button in single-add / single-edit; capacity-coverage warning banners; legacy max_tokens migration prompt with explicit Apply button (no more silent promotion); Tokenizer Family input hidden on all four model-config surfaces (catalog hits still write the value silently; the field is consumed by tokenizer_registry which has no registered adapters today, so forcing operators to type it has no runtime effect).
  • context_window_tokens and max_output_tokens are no longer required in the UI. Empty input shows a gray placeholder (32_768 / 4_096, matching the SDK fallback constants _TOKEN_THRESHOLD_LEGACY_FALLBACK and _DEFAULT_REQUESTED_OUTPUT_TOKENS). On Save, defaults are substituted into the wire payload so the
    bare-capacity badge clears automatically. Verified across all six write surfaces (single add/edit, batch top-defaults, batch per-row gear, provider per-row gear, provider bulk-apply broadcast); the bulk-apply path preserves "empty = do not broadcast" semantics.

Schema migrations

Two idempotent SQL files under docker/sql/:

1. Required — run before deploying W1/W2 code

v2.2.0_0615_context_management_capacity_schema.sql
-- Migration kind: REQUIRED_SCHEMA
-- Required for: all upgraded deployments before running W1/W2 context-management code.
Merges four prior ALTER TABLE ADD COLUMN migrations:

  • W1 capacity fields on model_record_t (context_window_tokens, max_output_tokens, default_output_reserve_tokens, tokenizer_family, plus the dependent
    provider_input_limit_tokens)
  • W1 capacity_snapshot JSON column on model_monitoring_record_t
  • W2 requested_output_tokens override on ag_tenant_agent_t
  • W2 safe_input_budget_snapshot JSON column on model_monitoring_record_t

Skipping this surfaces as column does not exist once the new code runs.

2. Recommended — data fix, safe to skip on fresh deployments

v2.2.0_0617_context_management_capacity_data_fix.sql
-- Safe to skip when: fresh deployment, or operators will manually fill capacity fields.
Runs two passes in strict order:

  1. Backfill W1 capacity fields on existing rows from the catalog (writes max_output_tokens among other fields).
  2. Reconcile legacy max_tokens with the freshly populated max_output_tokens so both aliases agree.

Reversing the order would clobber the backfilled values.

All statements are IF NOT EXISTS / ON CONFLICT DO NOTHING. Existing rows whose capacity stays NULL continue to work through the SDK fallback until an operator
edits them.

Backward compatibility

  • Existing model_record_t rows with max_tokens populated and the new W1 columns NULL keep working; SDK promotes the legacy column as max_output_tokens at resolve time.
  • Direct chat.completions.create callers that previously passed max_tokens are now rejected unless the value matches the W2 snapshot. No other production call sites changed in this PR; the broader dispatch hardening across remaining bypasses is the W10 follow-up.
  • Feature flag CAPACITY_SUGGESTION_ENABLED gates the W11 endpoints; turning it off restores pre-W11 behavior with no UI surface for catalog suggestion.

Test coverage

  • SDK: test_capacity_resolver.py, test_capacity_budget.py, test_openai_llm.py (dispatch enforcement), test_monitoring.py (snapshot fields).
  • Backend: test_model_capacity_suggestion_service.py, test_model_management_service.py, test_config_utils.py, provider tests for the four batch-add adapters.
  • Frontend: tsc --noEmit clean; six-surface matrix manually verified.

Notes for reviewers

  • The tokenizer_registry is intentionally empty in this PR: tokenizer_family is persisted from the catalog and consumed downstream, but resolve() returns (FallbackEstimator, "estimated") for every value today. The 10% uncertainty reserve fires uniformly as a consequence — this is the documented W1 ADR conservative path until verified adapters land.
  • The frontend "Suggest" button never sends a provider_hint from the single-add dialog (the hidden default form.provider="modelengine" would otherwise pin catalog lookup to ModelEngine for every operator). Hint is only sent in batch mode where the dropdown is user-controlled.
  • Capacity suggestions never overwrite the user-chosen model_factory: the catalog's suggested_provider namespace (deepseek, openai, jina, ...) is a superset of the frontend dropdown's allowed values, and writing an unknown one back made models vanish from the active list / edit dropdown (root cause of the per-row reclassification bug we fixed during testing).
image image

JasonW404 and others added 30 commits June 11, 2026 16:10
…city-and-request-safety

Add context management upgrade design documents:
- Context management production plan (EN/CN)
- Memory improvement analysis and architecture
- 16 workstreams for context management upgrade
…view

Add review documents and update workstreams:
- Phase 1-5 review documents
- Findings registry and impact analysis
- Updated 16 workstreams with detailed specs
- Context management weekly design summary (CN)
…egistry

Introduces the contract surface for W1 (Correct Model Token-Capacity
Configuration) so W2/W3 development can begin against stable types. No
runtime behaviour change — resolver/registry implementations land in the
follow-up PR.

New modules:
- sdk/nexent/core/models/capacity_resolver.py: CapabilityProfile and
  ModelCapacitySnapshot (Pydantic v2, frozen), typed ResolverError
  hierarchy, compute_fingerprint() implementing the SHA-256/canonical-JSON
  contract from W1 ADR Decision 3, RESOLVER_VERSION constant, and a
  resolve_capacity() stub.
- sdk/nexent/core/models/tokenizer_registry.py: TokenizerAdapter Protocol,
  empty REGISTRY, FallbackEstimator (char/4 heuristic that always returns
  counting_mode='estimated'), and resolve() function. Family-name
  validation pattern enforces the naming convention fixed in the ADR.
- backend/consts/capability_profiles.py: CATALOG with eight approved
  day-one entries (openai/gpt-4o, openai/gpt-4.1, dashscope/qwen-plus,
  qwen-turbo, glm-5.1, silicon DeepSeek-V4-Flash, Qwen3.6-27B,
  Kimi-K2.6) plus CATALOG_REVISION.

Design reference: doc/working/context-management-workstreams/
W1_ADR_Capability_Catalog_Storage_and_Fingerprint.md (locally hosted; team
sharing channel separate from this repo per doc/.gitignore policy).

Smoke-tested: fingerprint is deterministic and order-independent across
unknown_capabilities and field_sources; ModelCapacitySnapshot rejects
mutation; tokenizer resolve() falls back to estimated for unknown
families; resolve_capacity stub raises NotImplementedError; CATALOG
imports cleanly with all 8 entries.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
feat(W1): add type skeleton for ModelCapacityResolver and tokenizer registry
Adds seven nullable capacity fields to model_record_t so the
ModelCapacityResolver can read operator overrides per W1 ADR:
- context_window_tokens
- max_input_tokens
- max_output_tokens
- default_output_reserve_tokens
- tokenizer_family
- capacity_source
- capability_profile_version

All columns are nullable, no defaults that change semantics. Legacy
max_tokens is left untouched and continues to behave as a deprecated
output-cap alias until consumers migrate (separate follow-up).

Touchpoints:
- docker/sql/v2.2.0_0615_add_capacity_fields_to_model_record_t.sql: idempotent
  upgrade with ALTER TABLE ... ADD COLUMN IF NOT EXISTS + COMMENT ON COLUMN.
- docker/init.sql: fresh-install CREATE TABLE inline plus COMMENT ON COLUMN.
- k8s/helm/nexent/charts/nexent-common/files/init.sql: same for k8s deploys.
- backend/database/db_models.py: ModelRecord ORM columns.
- backend/consts/model.py: ModelRequest Pydantic schema fields so CRUD
  round-trips the new values.

Design reference: doc/working/context-management-workstreams/
W1_ADR_Capability_Catalog_Storage_and_Fingerprint.md (Decision 1, schema).

Verification:
- ORM exposes all 7 columns
- Pydantic ModelRequest exposes all 7 fields
- All three SQL files contain 14 occurrences (column + COMMENT per field)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Move W1_ADR_Capability_Catalog_Storage_and_Fingerprint.md from context-management-workstreams to context-management-workstream/ADRs for better organization.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
feat(W1): add capacity columns to model_record_t (additive migration)
Replaces the resolve_capacity NotImplementedError stub with the real
ModelCapacityResolver per W1 ADR. The resolver:

- Looks up the (provider, model_name) entry in the capability profile
  catalog passed by the caller.
- Merges operator overrides over the profile (operator wins).
- Validates that hard capacity is known and not impossible (output cap
  cannot exceed combined window; capacities must be positive).
- Defaults requested_output_tokens to the profile's
  default_output_reserve_tokens; rejects requests that exceed
  max_output_tokens.
- Derives provider_input_limit_tokens as min(max_input_tokens,
  context_window_tokens - requested_output_tokens) using only the limits
  that are defined.
- Asks tokenizer_registry for (adapter, counting_mode); records
  capability gaps in unknown_capabilities.
- Computes the deterministic SHA-256/canonical-JSON fingerprint from the
  resolved contract and builds an immutable ModelCapacitySnapshot.

The resolver stays pure: the SDK never reads DB or env; backend callers
supply the capability_profiles dict and operator_overrides. This matches
CLAUDE.md's SDK layer rules.

Typed failures raised on invalid input:
- ProviderCapabilityUnknown (no hard capacity)
- InvalidCapacityConfiguration (non-positive values, output > window,
  derived input limit non-positive)
- RequestedOutputExceedsCap (request above max_output_tokens)

Tests (15, all passing):
- Catalog lookup + override precedence
- Uncataloged with operator-supplied capacity
- Rejection: missing capacity, impossible values, negative values,
  requested-output overflow
- Default requested_output behavior
- Separate-input-limit path (synthetic, no day-one model uses it)
- Combined window + separate input limit takes minimum
- Snapshot immutability (Pydantic ValidationError on mutation)
- Fingerprint determinism and sensitivity to request changes
- Tokenizer estimated-mode flag appears in unknown_capabilities

Design reference: doc/working/context-management-workstreams/
W1_ADR_Capability_Catalog_Storage_and_Fingerprint.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
feat(W1): implement resolve_capacity with catalog + operator override
…LLM output cap

ModelConfig (sdk/nexent/core/agents/agent_model.py):
- Add max_output_tokens as the preferred name per W1 ADR.
- Keep max_tokens as a deprecated alias; a model_validator backfills the
  unset side so old and new callers both work during migration.
- Add the remaining capacity-snapshot fields so a ModelConfig can carry
  the resolved values from backend service down to the SDK: context_window_tokens,
  max_input_tokens, default_output_reserve_tokens, tokenizer_family,
  capacity_source, capability_profile_version.

OpenAIModel (sdk/nexent/core/models/openai_llm.py):
- Accept max_output_tokens (preferred) and max_tokens (deprecated). If only
  the legacy name is passed, log a debug and remap to max_output_tokens.
- Internal attribute renamed to self.max_output_tokens; self.max_tokens is
  kept as an alias for any reader.
- chat.completions.create still receives wire field max_tokens; only the
  internal name changed.

NexentAgent.create_model (sdk/nexent/core/agents/nexent_agent.py):
- Construct OpenAIModel with max_output_tokens=model_config.max_output_tokens
  so the new name flows through end-to-end.

Backward compatibility:
- Existing callers that set ModelConfig.max_tokens see no behavior change
  (validator copies it into max_output_tokens; the wire payload is identical).
- Existing callers reading OpenAIModel.max_tokens see no behavior change
  (alias attribute returns the same value).

Verified by table-driven smoke test of all four (max_tokens, max_output_tokens)
combinations on ModelConfig.

Design reference: doc/working/context-management-workstreams/W1_*.md and
W1 ADR. Provider adapters (step 3) and create_agent_info (step 6) follow.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…p legacy max_tokens

Replaces the long-standing bug where `model_info['max_tokens']` (a deprecated
output cap, semantically wrong) was assigned to ContextManagerConfig.token_threshold
(an input/context budget). The fix wires ModelCapacityResolver into the
runtime path so the context manager receives a real input budget derived from
the capacity snapshot.

Changes in backend/agents/create_agent_info.py:

- Add _resolve_input_budget(model_info): pulls operator overrides from the
  new model_record_t capacity columns, calls resolve_capacity(...) with the
  CATALOG from backend.consts.capability_profiles, and returns
  snapshot.provider_input_limit_tokens.
- On ProviderCapabilityUnknown (uncataloged model with no operator-supplied
  hard capacity), falls back to a safe constant _TOKEN_THRESHOLD_LEGACY_FALLBACK
  (8192) so the migration window doesn't break existing setups. Logged
  prominently so admins know to backfill.
- create_agent_config: stops reading model_info['max_tokens'] and passes
  the resolved input_budget into ContextManagerConfig.token_threshold.
- create_model_config_list: passes all seven new capacity columns
  (context_window_tokens, max_input_tokens, max_output_tokens,
  default_output_reserve_tokens, tokenizer_family, capacity_source,
  capability_profile_version) through to the SDK ModelConfig so end-to-end
  capacity flow works.

This is the end of the legacy max_tokens-as-context-threshold confusion.
ModelConfig.max_tokens stays as a deprecated alias per W1 step 4; this commit
removes its only known misuse from the runtime path.

The fallback constant is intentionally conservative — it kicks compression
early for unmigrated models so behavior degrades gracefully rather than
overflowing provider context. W2 will subtract its 10% uncertainty reserve
on top of the resolver's output once enforcement phase begins.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…neering methodology and recommendations for Nexent's evolution
Restore W1_ADR_Capability_Catalog_Storage_and_Fingerprint.md from doc/context-management-upgrade branch to context-management-workstreams/ADRs directory.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Persist resolved model capacity snapshot metadata on model monitoring records so per-request telemetry can report total window, output reserve, safe input budget, source, tokenizer mode, unknown capabilities, and fingerprint.

- add nullable monitoring columns to ORM, fresh-install SQL, and idempotent upgrade migration
- bind resolved capacity snapshots from agent creation into SDK monitoring context
- enrich LLM, client-level, and record_model_call monitoring rows with snapshot fields
- cover enqueue and ORM payload behavior in SDK monitoring tests

Verification:
- env PYTHONPATH=/home/feiran/nexent/sdk:/home/feiran/nexent:/home/feiran/nexent/backend uv run --project /home/feiran/nexent/backend pytest --rootdir=/home/feiran/nexent --import-mode=importlib /home/feiran/nexent/test/sdk/monitor/test_monitoring.py
- env PYTHONPATH=/home/feiran/nexent/sdk:/home/feiran/nexent:/home/feiran/nexent/backend uv run --project /home/feiran/nexent/backend pytest --rootdir=/home/feiran/nexent --import-mode=importlib /home/feiran/nexent/test/sdk/core/models/test_capacity_resolver.py
- env PYTHONPATH=/home/feiran/nexent/sdk:/home/feiran/nexent:/home/feiran/nexent/backend uv run --project /home/feiran/nexent/backend python -m py_compile backend/agents/create_agent_info.py backend/database/db_models.py sdk/nexent/core/agents/agent_model.py sdk/nexent/core/agents/run_agent.py sdk/nexent/monitor/monitoring.py sdk/nexent/monitor/__init__.py

Co-Authored-By: Codex <codex@openai.com>
Expose provider-supplied token-capacity metadata as advisory candidate fields in discovery responses without promoting them into persisted model records.

- add shared candidate extraction for common context, output, input, reserve, and tokenizer aliases
- wire SiliconFlow, DashScope, TokenPony, and ModelEngine adapters to attach provider_candidate hints when present
- keep prepare_model_dict from persisting provider_candidate fields automatically
- cover positive and no-hint paths for provider discovery

Verification:
- env PYTHONPATH=/home/feiran/nexent/sdk:/home/feiran/nexent:/home/feiran/nexent/backend uv run --project /home/feiran/nexent/backend pytest --rootdir=/home/feiran/nexent --import-mode=importlib /home/feiran/nexent/test/backend/services/providers/test_silicon_provider.py /home/feiran/nexent/test/backend/services/providers/test_dashscope_provider.py /home/feiran/nexent/test/backend/services/providers/test_tokenpony_provider.py /home/feiran/nexent/test/backend/services/providers/test_modelengine_provider.py /home/feiran/nexent/test/backend/services/test_model_provider_service.py::test_prepare_model_dict_does_not_persist_provider_capacity_candidates
- env PYTHONPATH=/home/feiran/nexent/sdk:/home/feiran/nexent:/home/feiran/nexent/backend uv run --project /home/feiran/nexent/backend python -m py_compile backend/services/providers/base.py backend/services/providers/silicon_provider.py backend/services/providers/dashscope_provider.py backend/services/providers/tokenpony_provider.py backend/services/providers/modelengine_provider.py

Co-Authored-By: Codex <codex@openai.com>
Add explicit model-capacity controls to model management so operators can promote known capacity values through the existing model create and update flows.

- extend frontend model types and service request/response mappings for capacity fields
- add shared capacity form controls with tokenizer autocomplete, source badge, profile version text, and legacy max_tokens warning
- wire capacity validation and operator payloads into Add/Edit Model dialogs
- localize labels, tooltips, source names, and validation messages in en/zh

Verification:
- npm run type-check
- node -e "const fs=require('fs'); for (const f of ['frontend/public/locales/en/common.json','frontend/public/locales/zh/common.json']) { JSON.parse(fs.readFileSync(f,'utf8').replace(/^\uFEFF/,'')); } console.log('locale json ok')"

Co-Authored-By: Codex <codex@openai.com>
Review and accept decisions for 5 findings:
- CM-018: structural validation blocks commit, semantic quality routes to W15 SLO
- CM-021: source lineage + mandatory presence validation blocks, semantic coverage to W15
- CM-024: use claim-scoped production readiness terminology
- CM-017: finite initial conflict set with explicit unresolved failure
- CM-025: subagent as independent agent with parent_session_id, async tool delegation, no recursion

Updated: finding-review-decisions.md, findings-registry.md (20/26 complete),
W4, W6, W10, W11, W12, W13, parent plan.
Added: pending-findings-decision-sheet.md for decision tracking.

Remaining 6 findings (CM-009, CM-010, CM-014, CM-015, CM-022, CM-026)
pending individual discussion.
…lease 1 gates

Remove multimodal testing from Release 1 SLO gates. W15 covers text modality
only; add modality contracts when specific product requirements emerge.

Updated: finding-review-decisions.md, findings-registry.md (21/26 complete),
W15, W3, pending-findings-decision-sheet.md.
…ents

Architectural simplification: checkpoints are no longer an independent
subsystem (W7). Compression results are stored as compression.snapshot
events within the W5 execution event log. Recovery finds the latest
compression.snapshot event and replays subsequent events.

Eliminates:
- Independent checkpoint table and CAS concurrency control
- Redis checkpoint cache layer
- W8 checkpoint-specific validation
- CM-014 checkpoint schema migration (covered by CM-005)
- W7 publication outbox for cross-system consistency

Updated: W5 (compression.snapshot event type, recovery flow, dirty-state
flush), W6, W8, W9, W13, W14, W15, parent plan, README, review artifacts.
Deleted: W7_Durable_Multi_Worker_Context_State.md.
CM-014 marked N/A (22/26 findings complete).
…plementation measurement

Do not pre-define workload envelopes. After W1-W16 implementation, use W15
measurement infrastructure to collect real performance data and define
envelopes based on observed data. No production-scale claim until envelopes
are defined. Aligns with CM-004 (measure before optimizing) and CM-011
(evidence-based gates).

Progress: 23/26 findings complete.
…mentation measurement

Do not pre-define numeric availability, RPO, RTO, rebuild time, queue lag,
or storage capacity targets. After W1-W16 implementation, use W15
measurement infrastructure to collect real recovery/availability data per
topology and define targets based on observed data. No production-scale
claim until targets are defined. Aligns with CM-009 (measure before
defining envelopes) and CM-011 (evidence-based gates).

Progress: 24/26 findings complete.
…ata validation

W7 retirement eliminates the primary O(history) hashing consumer. Replace
content hashing with metadata-based validation at three points:
1. compression.snapshot: partial_after_erasure + version fields
2. W6 materialized cache: snapshot validity + event count + version fields
3. Physical erasure: one-time partial_after_erasure flag

No Merkle trees or segmented hashing needed. Storage-layer integrity handled
by database checksums, not W8.

Progress: 25/26 findings complete.
…ed OpenTelemetry spec

Consolidate all decision trace requirements (W5, W6, W10, W15) into a single
unified telemetry/observability specification (low priority, post-core).
Use OpenTelemetry-style spans/attributes/events collected by external
observability infrastructure, not product-internal persistence.

Updated: W15 (replace decision trace persistence with OTel output),
parent plan (replace decision trace references with unified telemetry spec),
finding-review-decisions.md, findings-registry.md (26/26 complete),
pending-findings-decision-sheet.md.

All 26 findings now reviewed and decided.
Step 7 added capacity controls to ModelEditDialog (the OpenAI-API-Compatible
"custom model" edit path) but missed ProviderConfigEditDialog, the dialog
opened by the per-model gear icon under provider-categorized sections
(SiliconFlow / DashScope / TokenPony / ModelEngine). For any model whose
model_factory matches a recognized provider — including the W1 catalog
keys 'dashscope' / 'silicon' / 'tokenpony' — that gear icon was the only
edit path, leaving operators no way to set context_window_tokens et al.

Changes:
- ProviderConfigEditDialog: accept optional initialCapacity and
  hideCapacityFields props; render ModelCapacityFields when supported;
  include capacity payload in onSave callback shape.
- modelService.updateBatchModel: accept and forward the 6 capacity
  fields (context_window_tokens, max_input_tokens, max_output_tokens,
  default_output_reserve_tokens, tokenizer_family, capacity_source) to
  the existing batch_update_models endpoint, which already pass-throughs
  arbitrary update_data per backend/services/model_management_service.py
  line 347.
- ModelDeleteDialog single-model gear path: pass current capacity values
  from selectedSingleModel as initialCapacity, and forward saved capacity
  fields into the updateBatchModel call.
- ModelDeleteDialog provider-level "Edit Config" path: pass
  hideCapacityFields={true} since handleProviderConfigSave applies
  settings batch-wise to all models from one provider and per-model
  capacity is not a batch concept.

No behavior change for callers that don't pass initialCapacity (backward
compatible). Verified with npm run type-check.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…odules pollution

Two tests (test_get_models_llm_success, test_get_models_embedding_success)
failed intermittently when test_model_provider_service.py ran after
test_capacity_resolver.py or test_silicon_provider.py. Root cause:
silicon_provider is loaded under two distinct sys.modules keys —
`services.providers.silicon_provider` (the path production code uses) and
`backend.services.providers.silicon_provider` (the path some test files
use). Each binding gets its own `SILICON_GET_URL` attribute because
`silicon_provider.py` does `from consts.provider import SILICON_GET_URL`,
which copies the value into the importing module's namespace.

When both keys are present, mock.patch targeting only the `backend.` path
silently fails to override the value used by the production code path
that SiliconModelProvider.get_models executes.

Fix: introduce _patch_provider_module_constant context manager that
patches the named attribute on every loaded copy of the module. Apply to
all four SILICON_GET_URL mock.patch sites in this file.

Verification:
- 289 tests pass under the previously-failing combined order:
  test/sdk/core/models/test_capacity_resolver.py +
  test/sdk/monitor/test_monitoring.py +
  test/backend/services/providers/ +
  test/backend/services/test_model_provider_service.py

The helper is order-independent and safe even when one of the two sys.modules
paths is absent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
wuyuanfr and others added 5 commits June 23, 2026 16:29
…ement-upgrade-no-working-docs

# Conflicts:
#	backend/agents/create_agent_info.py
#	test/sdk/core/models/test_openai_llm.py
…rfaces

The Tokenizer Family input was rendered on Add, Edit, batch Add, and the
provider-level "bulk modify config" surfaces. Per the W1 ADR the value
is consumed only by `sdk/nexent/core/models/tokenizer_registry.resolve`,
which today has no registered adapters and unconditionally returns
`(FallbackEstimator, "estimated")` -- so the input never affects runtime
behavior and forcing operators to type/choose it surfaces an irrelevant
implementation detail.

Hidden, not removed: the field stays in form state, payload builders,
batch row mapping, and DB. W11 catalog suggestions still write it
silently, existing DB values are still preserved through edits, and any
future adapter registration becomes a one-line change with no UI work.

Backend/SDK fully decoupled:
- backend `consts/model.py` request schemas keep `tokenizer_family`
- catalog entries in `consts/capability_profiles.py` still set it
- SDK consumes it via `tokenizer_registry.resolve` and W2's
  `_UNKNOWN_CAPABILITIES_REQUIRING_RESERVE` continues to trigger the
  10% reserve when counting_mode is estimated

Changes in this commit:
- ModelCapacityFields.tsx: drop the AutoComplete input block + the
  `TOKENIZER_FAMILY_OPTIONS` constant + the `AutoComplete` import +
  the `hideTokenizer` prop (interface + destructure)
- ModelEditDialog.tsx: drop the `hideTokenizer` prop from the bulk-apply
  call site and the now-stale "Tokenizer hidden" comment
- zh/en common.json: drop the two unused locale keys

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…aults

Both fields are no longer required at any of the six capacity write
surfaces. An empty input renders a gray placeholder showing what value
would land if the user saves without typing; the form state stays "" so
nothing is silently mutated client-side. At save time, the wire-payload
builder substitutes the default into the API call only when the operator
truly left the field empty -- otherwise the typed value (or existing DB
value loaded into the form) is sent unchanged.

Defaults chosen to mirror the existing SDK fallbacks so observed runtime
behavior does not change when defaults land:
- DEFAULT_CONTEXT_WINDOW_TOKENS = 32_768
  (matches `_TOKEN_THRESHOLD_LEGACY_FALLBACK` in capacity_resolver.py)
- DEFAULT_MAX_OUTPUT_TOKENS = 4_096
  (matches `_DEFAULT_REQUESTED_OUTPUT_TOKENS` in capacity_resolver.py)

Constants exported from ModelCapacityFields.tsx so the snake_case mirror
in ModelAddDialog stays in sync.

Six-surface contract -- single-row write paths apply defaults; the
bulk-apply broadcast preserves "empty means do not broadcast":
- 1) ModelAddDialog single-add form -> capacityFormToSnakePayload
     applies defaults
- 2) ModelEditDialog single-edit form -> buildCapacityPayload
     (applyDefaults=true default)
- 3) ModelAddDialog batch-import top-defaults panel ->
     capacityFormToSnakePayload(form) for batchDefaults; per-row
     `model.X ?? batchDefaults.X` now never falls through to undefined
     in the gate at isFormValid (the gate becomes defense-in-depth,
     comment updated)
- 4) ModelAddDialog batch per-row gear (Settings Modal) ->
     capacityFormToSnakePayload(modelCapacity); preload-from-row-or-
     batch-default means "no-op save" already carries non-empty input
     and goes through toInt unchanged. Only "row=NULL plus batch-empty"
     materializes the defaults
- 5) ProviderConfigEditDialog per-row gear
     (hideCapacityFields=false) -> buildCapacityPayload(capacityForm)
- 6) ProviderConfigEditDialog "modify config" bulk-apply
     (hideCapacityFields=true) -> buildCapacityPayload(form,
     { applyDefaults: false }); `applyDefaultsOnEmpty={false}` on the
     panel suppresses the gray placeholder so operators do not read
     "empty means 32K/4K will be broadcast"

requiredFields stripped from every validateCapacityForm call site
and every ModelCapacityFields prop usage. validateCapacityForm still
enforces the data-shape checks (positive integers, output <= window,
reserve <= output) -- those are not affected by removing the
"must be non-empty" requirement.

Backend and SDK unchanged: the wire payload still ships the same
snake_case keys; the only difference is that on save, those keys are
guaranteed to carry a number (not null) for single-row writes, which
makes the `_is_bare_capacity_model` badge and the W11 catalog-coverage
banner clear themselves automatically for new rows.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@wuyuanfr wuyuanfr requested review from Dallas98 and WMC001 as code owners June 24, 2026 03:24
Three failure clusters reported by CI after merging upstream/develop
into this PR branch:

1) test_prepare_agent_run -- assert_called_once_with(...) on
   create_agent_run_info was missing `tool_params=None`. Production
   code at agent_service.py:2245 now passes
   `tool_params=agent_request.tool_params` and AgentRequest defaults
   `tool_params` to None when the fixture does not set it. Add the
   kwarg to the expected call.

2) update_agent_info_impl_* (14 tests) -- W2 added
   `_validate_requested_output_tokens_for_agent(request, tenant_id)`
   at agent_service.py:1164. The validator reads
   `request.requested_output_tokens` and compares it against the
   model's `max_output_tokens`. The existing tests build their
   request via `MagicMock(spec=AgentInfoRequest)` and never set
   `requested_output_tokens`, so:
   - either the spec exposes the field as a fresh MagicMock and the
     `> max_output_tokens` comparison fails with TypeError,
   - or Pydantic-v2 field introspection through dir() omits the
     name and the access AttributeErrors.
   Both branches are unrelated to what these tests cover, so this
   commit adds a module-level autouse fixture that stubs the
   validator to a no-op. Tests that want to exercise the validator
   in the future can still patch it locally; module-level autouse
   loses to per-test patches.

3) test_import_agent_by_agent_id_publish_version_error --
   import_agent_by_agent_id reads `import_agent_info.requested_output_tokens`
   directly at agent_service.py:1874 (no validator involved), so the
   autouse fixture from (2) does not help. Set
   `mock_agent_info.requested_output_tokens = None` on the existing
   `MagicMock(spec=ExportAndImportAgentInfo)` so the access returns a
   defined value instead of AttributeErroring.

4) test_create_model_success / test_create_model_deep_thinking_success
   (test_nexent_agent.py) -- W1 renamed the SDK's OpenAIModel kwarg
   from `max_tokens` to `max_output_tokens`. The two `assert_called_once_with`
   blocks still asserted on the old name. Updated to `max_output_tokens`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ponse shape

The production response shape at agent_service.py:1112 now includes
`requested_output_tokens` (added by W2). The mocked
`search_agent_info` payload does not include the key, so the function
returns `None` for it via `.get(...)`. Add the key to expected_result
to match.

test_import_agent_by_agent_id_publish_version_error still fails for an
unrelated reason: `create_agent`'s `mock.return_value` is configured to
`{"agent_id": 100}` but the test result shows `create_agent(...)`
returning the auto-MagicMock instead of the dict. Static analysis of
the patch wiring shows nothing wrong; needs a local repro to inspect
the mock state. Saving the partial progress first.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Comment thread backend/agents/create_agent_info.py Outdated
Comment thread backend/agents/create_agent_info.py
…lish_version_error

The test claimed to verify "import_agent_by_agent_id swallows
publish_version_impl exceptions and still returns the new agent id",
but the three lines that actually configure the patched mocks were
missing from the body:

    mock_query_tools.return_value = []
    mock_create.return_value = {"agent_id": 100}
    mock_publish.side_effect = Exception("Publish error")

Without them every patched mock returned the default auto-MagicMock,
so `create_agent(...)` returned a MagicMock instead of the dict,
`new_agent["agent_id"]` returned `MagicMock.__getitem__()`,
publish_version_impl never raised, and `assert result == 100` failed
against the MagicMock return value.

Likely lost during the upstream/develop merge that introduced
`requested_output_tokens` to the import flow (the missing-attribute
error surfaced first, masking the deeper issue). Adding the three
configuration lines back lets the test exercise the actual code path
it was designed to cover.

Verified locally: full test_agent_service.py passes 217/217.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@YehongPan

Copy link
Copy Markdown
Contributor

🔍 Code Review Comments

1. [安全/漏洞] _CAPACITY_WARNING_EMITTED 线程安全问题
_CAPACITY_WARNING_EMITTED 是模块级 set(),在多线程/多协程环境下无锁保护,存在竞态条件。应使用 threading.Lock 或确认 set 操作的原子性。

2. [逻辑漏洞] _resolve_input_budget 空 provider 静默 fallback
model_info.get("model_factory") 返回 None 时,provider 会是空字符串,但 resolve_capacity 可能不接受空 provider,导致静默 fallback 到 legacy threshold,掩盖配置错误。建议对空 provider 显式记录 WARNING。

3. [代码规范] create_agent_config 参数缩进不一致
函数签名中 request_requested_output_tokens 参数缩进缺少前导空格,违反 PEP 8 参数对齐规则。

@YehongPan YehongPan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

  • [安全/漏洞] _CAPACITY_WARNING_EMITTED 是模块级 set(),在多线程/多协程环境下无锁保护,存在竞态条件。应使用 threading.Lock 或确认 set 操作的原子性。
  • [逻辑漏洞] _resolve_input_budgetmodel_info.get("model_factory") 返回 None 时,provider 会是空字符串,resolve_capacity 可能不接受空 provider,导致静默 fallback 到 legacy threshold,掩盖配置错误。建议对空 provider 显式记录 WARNING。
  • [代码规范] create_agent_config 函数签名中 request_requested_output_tokens 参数缩进缺少前导空格,违反 PEP 8 参数对齐规则。

Comment thread backend/apps/model_managment_app.py Outdated
Comment thread backend/apps/model_managment_app.py
Comment thread backend/apps/model_managment_app.py
wuyuanfr and others added 2 commits June 24, 2026 14:11
…edup with a lock

Two small fixes reported during review:

1) `request_requested_output_tokens` in the `create_agent_config`
   signature was flush-left (zero indent) while every other parameter
   sits at four-space indent. Python's parser tolerates this inside
   parentheses, but linters and humans both stumble on it. Re-indent
   to align with the rest of the signature.

2) `_CAPACITY_WARNING_EMITTED` is a per-process dedup set for the
   "model has no W1/W2 capacity configured" operator warning. The
   `if dedup_key in S: return; S.add(dedup_key)` pattern was a
   check-then-add race: two threads on the same model could both pass
   the membership test before either added, leading to duplicate
   WARNING lines that defeat the per-process dedup contract.

   Wrap the test-and-set in a `threading.Lock`. The lock is released
   before `logger.warning(...)` so warning I/O is not serialised
   across paths; only the dedup decision is.

Verified locally: test/backend/agents/test_create_agent_info.py
171/171 passes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@wuyuanfr

Copy link
Copy Markdown
Collaborator Author

🔍 Code Review Comments

1. [安全/漏洞] _CAPACITY_WARNING_EMITTED 线程安全问题 _CAPACITY_WARNING_EMITTED 是模块级 set(),在多线程/多协程环境下无锁保护,存在竞态条件。应使用 threading.Lock 或确认 set 操作的原子性。

2. [逻辑漏洞] _resolve_input_budget 空 provider 静默 fallback model_info.get("model_factory") 返回 None 时,provider 会是空字符串,但 resolve_capacity 可能不接受空 provider,导致静默 fallback 到 legacy threshold,掩盖配置错误。建议对空 provider 显式记录 WARNING。

3. [代码规范] create_agent_config 参数缩进不一致 函数签名中 request_requested_output_tokens 参数缩进缺少前导空格,违反 PEP 8 参数对齐规则。

1 3 is same with what @JasonW404 mentioned, the issue was fixed in commit https://github.com/ModelEngine-Group/nexent/commit/72e378eaafab2eabf8555357984ca3e6436094c2.\

fix 2 in 10a41ca

@wuyuanfr

Copy link
Copy Markdown
Collaborator Author
image 1、模型配置界面,添加单个模型,弃用原本的“最大Token数”(模糊了“上下文窗口”和“最大输出Token数”两个概念) 2、增加 “上下文窗口” “最大输入Token数” “最大输出Token数” “输出预留Token数” 四个容量有关的配置供模型管理员填写(不填也可以添加模型,落库默认值)

@wuyuanfr

Copy link
Copy Markdown
Collaborator Author
image image

点击使用建议后,匹配到的验证值填入输入框

Comment thread docker/sql/v2.2.2_0622_update_left_nav_menu.sql

@WMC001 WMC001 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observation: comprehensive capacity management refactor

The model capacity foundation changes are extensive and well-structured. The separation of W1 (provider capacity profiles) and W2 (per-agent requested_output_tokens overrides) is clearly documented. The _coerce_legacy_max_tokens_alias defense-in-depth pattern and the _capacity_suggestion_coverage_errors_total OpenTelemetry counter for silent failures are particularly thoughtful.

No bugs found in the backend Python layer. The implementation is robust with proper error handling, null checks, and fallback strategies throughout.

@WMC001 WMC001 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug 1 (CRITICAL): Wrong tuple element order in _resolve_input_budget — production crash

backend/agents/create_agent_info.py_resolve_input_budget returns a 3-tuple with the 2nd and 3rd elements in the wrong order relative to the caller's unpacking.

The function returns:

return (
    snapshot.provider_input_limit_tokens,         # [0] → int        (correct)
    _capacity_snapshot_for_monitoring(snapshot), # [1] → dict       (monitoring)
    snapshot,                                    # [2] → ModelCapacitySnapshot
)

But the call site unpacks as:

input_budget, capacity_snapshot, resolved_capacity_snapshot = _resolve_input_budget(model_info)

So capacity_snapshot receives the monitoring dict and resolved_capacity_snapshot receives the ModelCapacitySnapshot. Then _resolve_safe_input_budget is called with capacity_snapshot=resolved_capacity_snapshot — but _resolve_safe_input_budget internally passes capacity_snapshot (the dict) to SafeInputBudgetCalculator.calculate_safe_input_budget(), which accesses typed Pydantic attributes (snapshot.provider_input_limit_tokens, etc.). A plain dict has no such attributes — this raises AttributeError at runtime for every agent that uses the new W2 context management path.

Fix: swap the 2nd and 3rd return values in _resolve_input_budget:

return (
    snapshot.provider_input_limit_tokens,
    snapshot,                                        # ModelCapacitySnapshot goes 2nd
    _capacity_snapshot_for_monitoring(snapshot),     # monitoring dict goes 3rd
)

@WMC001 WMC001 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug 2 (MEDIUM): None.get() crash in _validate_requested_output_tokens_for_agent

backend/services/agent_service.py — if get_model_by_model_id returns None, then model_info.get("max_output_tokens") raises AttributeError (None has no .get()). The existing if model_info else None guard is correct, but if the model record is an empty dict {}, max_output_tokens becomes None and the validation is silently skipped. Additionally, if max_output_tokens = 0 is stored in the DB (falsy but not None), any positive requested_output_tokens passes validation incorrectly.

Fix: add explicit type check and > 0 guard:

model_info = get_model_by_model_id(model_id, tenant_id=tenant_id)
if not isinstance(model_info, dict):
    return  # or log
max_output_tokens = model_info.get("max_output_tokens")
if max_output_tokens is not None and max_output_tokens <= 0:
    return

@wuyuanfr

Copy link
Copy Markdown
Collaborator Author

Bug 1 (CRITICAL): Wrong tuple element order in _resolve_input_budget — production crash

backend/agents/create_agent_info.py_resolve_input_budget returns a 3-tuple with the 2nd and 3rd elements in the wrong order relative to the caller's unpacking.

The function returns:

return (
    snapshot.provider_input_limit_tokens,         # [0] → int        (correct)
    _capacity_snapshot_for_monitoring(snapshot), # [1] → dict       (monitoring)
    snapshot,                                    # [2] → ModelCapacitySnapshot
)

But the call site unpacks as:

input_budget, capacity_snapshot, resolved_capacity_snapshot = _resolve_input_budget(model_info)

So capacity_snapshot receives the monitoring dict and resolved_capacity_snapshot receives the ModelCapacitySnapshot. Then _resolve_safe_input_budget is called with capacity_snapshot=resolved_capacity_snapshot — but _resolve_safe_input_budget internally passes capacity_snapshot (the dict) to SafeInputBudgetCalculator.calculate_safe_input_budget(), which accesses typed Pydantic attributes (snapshot.provider_input_limit_tokens, etc.). A plain dict has no such attributes — this raises AttributeError at runtime for every agent that uses the new W2 context management path.

Fix: swap the 2nd and 3rd return values in _resolve_input_budget:

return (
    snapshot.provider_input_limit_tokens,
    snapshot,                                        # ModelCapacitySnapshot goes 2nd
    _capacity_snapshot_for_monitoring(snapshot),     # monitoring dict goes 3rd
)

tuple 顺序与解包匹配,传给 W2 的关键字参数 capacity_snapshot=resolved_capacity_snapshot 绑定的是右边的 typed 变量,不是位置 [1] 的 dict。位置 [1]
的 dict 只流向 AgentConfig.capacity_snapshot(监控/序列化用途)。如果非要更稳,可把变量名调成 capacity_snapshot_dict / capacity_snapshot 来减少阅读歧义,但不是 bug。

@wuyuanfr wuyuanfr requested a review from WMC001 June 24, 2026 09:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants