Skip to content

Commit 4becd69

Browse files
wuyuanfrJasonW404claudesisyphus-dev-aicodex
authored
✨ Feat: model capacity foundation — context management upgrade (#3293)
* Doc: Add design for upgrading context management in nexent with 16 works to do. * docs: complete context management production review * feat(W1): add type skeleton for ModelCapacityResolver and tokenizer registry Introduces the contract surface for W1 (Correct Model Token-Capacity Configuration) so W2/W3 development can begin against stable types. No runtime behaviour change — resolver/registry implementations land in the follow-up PR. New modules: - sdk/nexent/core/models/capacity_resolver.py: CapabilityProfile and ModelCapacitySnapshot (Pydantic v2, frozen), typed ResolverError hierarchy, compute_fingerprint() implementing the SHA-256/canonical-JSON contract from W1 ADR Decision 3, RESOLVER_VERSION constant, and a resolve_capacity() stub. - sdk/nexent/core/models/tokenizer_registry.py: TokenizerAdapter Protocol, empty REGISTRY, FallbackEstimator (char/4 heuristic that always returns counting_mode='estimated'), and resolve() function. Family-name validation pattern enforces the naming convention fixed in the ADR. - backend/consts/capability_profiles.py: CATALOG with eight approved day-one entries (openai/gpt-4o, openai/gpt-4.1, dashscope/qwen-plus, qwen-turbo, glm-5.1, silicon DeepSeek-V4-Flash, Qwen3.6-27B, Kimi-K2.6) plus CATALOG_REVISION. Design reference: doc/working/context-management-workstreams/ W1_ADR_Capability_Catalog_Storage_and_Fingerprint.md (locally hosted; team sharing channel separate from this repo per doc/.gitignore policy). Smoke-tested: fingerprint is deterministic and order-independent across unknown_capabilities and field_sources; ModelCapacitySnapshot rejects mutation; tokenizer resolve() falls back to estimated for unknown families; resolve_capacity stub raises NotImplementedError; CATALOG imports cleanly with all 8 entries. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(W1): add capacity columns to model_record_t (additive migration) Adds seven nullable capacity fields to model_record_t so the ModelCapacityResolver can read operator overrides per W1 ADR: - context_window_tokens - max_input_tokens - max_output_tokens - default_output_reserve_tokens - tokenizer_family - capacity_source - capability_profile_version All columns are nullable, no defaults that change semantics. Legacy max_tokens is left untouched and continues to behave as a deprecated output-cap alias until consumers migrate (separate follow-up). Touchpoints: - docker/sql/v2.2.0_0615_add_capacity_fields_to_model_record_t.sql: idempotent upgrade with ALTER TABLE ... ADD COLUMN IF NOT EXISTS + COMMENT ON COLUMN. - docker/init.sql: fresh-install CREATE TABLE inline plus COMMENT ON COLUMN. - k8s/helm/nexent/charts/nexent-common/files/init.sql: same for k8s deploys. - backend/database/db_models.py: ModelRecord ORM columns. - backend/consts/model.py: ModelRequest Pydantic schema fields so CRUD round-trips the new values. Design reference: doc/working/context-management-workstreams/ W1_ADR_Capability_Catalog_Storage_and_Fingerprint.md (Decision 1, schema). Verification: - ORM exposes all 7 columns - Pydantic ModelRequest exposes all 7 fields - All three SQL files contain 14 occurrences (column + COMMENT per field) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs: move W1 ADR to dedicated ADRs directory Move W1_ADR_Capability_Catalog_Storage_and_Fingerprint.md from context-management-workstreams to context-management-workstream/ADRs for better organization. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai> * feat(W1): implement resolve_capacity with catalog + operator override Replaces the resolve_capacity NotImplementedError stub with the real ModelCapacityResolver per W1 ADR. The resolver: - Looks up the (provider, model_name) entry in the capability profile catalog passed by the caller. - Merges operator overrides over the profile (operator wins). - Validates that hard capacity is known and not impossible (output cap cannot exceed combined window; capacities must be positive). - Defaults requested_output_tokens to the profile's default_output_reserve_tokens; rejects requests that exceed max_output_tokens. - Derives provider_input_limit_tokens as min(max_input_tokens, context_window_tokens - requested_output_tokens) using only the limits that are defined. - Asks tokenizer_registry for (adapter, counting_mode); records capability gaps in unknown_capabilities. - Computes the deterministic SHA-256/canonical-JSON fingerprint from the resolved contract and builds an immutable ModelCapacitySnapshot. The resolver stays pure: the SDK never reads DB or env; backend callers supply the capability_profiles dict and operator_overrides. This matches CLAUDE.md's SDK layer rules. Typed failures raised on invalid input: - ProviderCapabilityUnknown (no hard capacity) - InvalidCapacityConfiguration (non-positive values, output > window, derived input limit non-positive) - RequestedOutputExceedsCap (request above max_output_tokens) Tests (15, all passing): - Catalog lookup + override precedence - Uncataloged with operator-supplied capacity - Rejection: missing capacity, impossible values, negative values, requested-output overflow - Default requested_output behavior - Separate-input-limit path (synthetic, no day-one model uses it) - Combined window + separate input limit takes minimum - Snapshot immutability (Pydantic ValidationError on mutation) - Fingerprint determinism and sensitivity to request changes - Tokenizer estimated-mode flag appears in unknown_capabilities Design reference: doc/working/context-management-workstreams/ W1_ADR_Capability_Catalog_Storage_and_Fingerprint.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(W1 step 4): extend SDK ModelConfig with capacity fields, rename LLM output cap ModelConfig (sdk/nexent/core/agents/agent_model.py): - Add max_output_tokens as the preferred name per W1 ADR. - Keep max_tokens as a deprecated alias; a model_validator backfills the unset side so old and new callers both work during migration. - Add the remaining capacity-snapshot fields so a ModelConfig can carry the resolved values from backend service down to the SDK: context_window_tokens, max_input_tokens, default_output_reserve_tokens, tokenizer_family, capacity_source, capability_profile_version. OpenAIModel (sdk/nexent/core/models/openai_llm.py): - Accept max_output_tokens (preferred) and max_tokens (deprecated). If only the legacy name is passed, log a debug and remap to max_output_tokens. - Internal attribute renamed to self.max_output_tokens; self.max_tokens is kept as an alias for any reader. - chat.completions.create still receives wire field max_tokens; only the internal name changed. NexentAgent.create_model (sdk/nexent/core/agents/nexent_agent.py): - Construct OpenAIModel with max_output_tokens=model_config.max_output_tokens so the new name flows through end-to-end. Backward compatibility: - Existing callers that set ModelConfig.max_tokens see no behavior change (validator copies it into max_output_tokens; the wire payload is identical). - Existing callers reading OpenAIModel.max_tokens see no behavior change (alias attribute returns the same value). Verified by table-driven smoke test of all four (max_tokens, max_output_tokens) combinations on ModelConfig. Design reference: doc/working/context-management-workstreams/W1_*.md and W1 ADR. Provider adapters (step 3) and create_agent_info (step 6) follow. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(W1 step 6): wire ModelCapacityResolver in create_agent_info, drop legacy max_tokens Replaces the long-standing bug where `model_info['max_tokens']` (a deprecated output cap, semantically wrong) was assigned to ContextManagerConfig.token_threshold (an input/context budget). The fix wires ModelCapacityResolver into the runtime path so the context manager receives a real input budget derived from the capacity snapshot. Changes in backend/agents/create_agent_info.py: - Add _resolve_input_budget(model_info): pulls operator overrides from the new model_record_t capacity columns, calls resolve_capacity(...) with the CATALOG from backend.consts.capability_profiles, and returns snapshot.provider_input_limit_tokens. - On ProviderCapabilityUnknown (uncataloged model with no operator-supplied hard capacity), falls back to a safe constant _TOKEN_THRESHOLD_LEGACY_FALLBACK (8192) so the migration window doesn't break existing setups. Logged prominently so admins know to backfill. - create_agent_config: stops reading model_info['max_tokens'] and passes the resolved input_budget into ContextManagerConfig.token_threshold. - create_model_config_list: passes all seven new capacity columns (context_window_tokens, max_input_tokens, max_output_tokens, default_output_reserve_tokens, tokenizer_family, capacity_source, capability_profile_version) through to the SDK ModelConfig so end-to-end capacity flow works. This is the end of the legacy max_tokens-as-context-threshold confusion. ModelConfig.max_tokens stays as a deprecated alias per W1 step 4; this commit removes its only known misuse from the runtime path. The fallback constant is intentionally conservative — it kicks compression early for unmigrated models so behavior degrades gracefully rather than overflowing provider context. W2 will subtract its 10% uncertainty reserve on top of the resolver's output once enforcement phase begins. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(loop-engineering): add comprehensive insight report on Loop Engineering methodology and recommendations for Nexent's evolution * docs: add W1 ADR to ADRs directory Restore W1_ADR_Capability_Catalog_Storage_and_Fingerprint.md from doc/context-management-upgrade branch to context-management-workstreams/ADRs directory. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai> * feat(W1 step 8): emit capacity snapshot fields in monitoring Persist resolved model capacity snapshot metadata on model monitoring records so per-request telemetry can report total window, output reserve, safe input budget, source, tokenizer mode, unknown capabilities, and fingerprint. - add nullable monitoring columns to ORM, fresh-install SQL, and idempotent upgrade migration - bind resolved capacity snapshots from agent creation into SDK monitoring context - enrich LLM, client-level, and record_model_call monitoring rows with snapshot fields - cover enqueue and ORM payload behavior in SDK monitoring tests Verification: - env PYTHONPATH=/home/feiran/nexent/sdk:/home/feiran/nexent:/home/feiran/nexent/backend uv run --project /home/feiran/nexent/backend pytest --rootdir=/home/feiran/nexent --import-mode=importlib /home/feiran/nexent/test/sdk/monitor/test_monitoring.py - env PYTHONPATH=/home/feiran/nexent/sdk:/home/feiran/nexent:/home/feiran/nexent/backend uv run --project /home/feiran/nexent/backend pytest --rootdir=/home/feiran/nexent --import-mode=importlib /home/feiran/nexent/test/sdk/core/models/test_capacity_resolver.py - env PYTHONPATH=/home/feiran/nexent/sdk:/home/feiran/nexent:/home/feiran/nexent/backend uv run --project /home/feiran/nexent/backend python -m py_compile backend/agents/create_agent_info.py backend/database/db_models.py sdk/nexent/core/agents/agent_model.py sdk/nexent/core/agents/run_agent.py sdk/nexent/monitor/monitoring.py sdk/nexent/monitor/__init__.py Co-Authored-By: Codex <codex@openai.com> * feat(W1 step 3): surface provider-discovery capacity hints as candidates Expose provider-supplied token-capacity metadata as advisory candidate fields in discovery responses without promoting them into persisted model records. - add shared candidate extraction for common context, output, input, reserve, and tokenizer aliases - wire SiliconFlow, DashScope, TokenPony, and ModelEngine adapters to attach provider_candidate hints when present - keep prepare_model_dict from persisting provider_candidate fields automatically - cover positive and no-hint paths for provider discovery Verification: - env PYTHONPATH=/home/feiran/nexent/sdk:/home/feiran/nexent:/home/feiran/nexent/backend uv run --project /home/feiran/nexent/backend pytest --rootdir=/home/feiran/nexent --import-mode=importlib /home/feiran/nexent/test/backend/services/providers/test_silicon_provider.py /home/feiran/nexent/test/backend/services/providers/test_dashscope_provider.py /home/feiran/nexent/test/backend/services/providers/test_tokenpony_provider.py /home/feiran/nexent/test/backend/services/providers/test_modelengine_provider.py /home/feiran/nexent/test/backend/services/test_model_provider_service.py::test_prepare_model_dict_does_not_persist_provider_capacity_candidates - env PYTHONPATH=/home/feiran/nexent/sdk:/home/feiran/nexent:/home/feiran/nexent/backend uv run --project /home/feiran/nexent/backend python -m py_compile backend/services/providers/base.py backend/services/providers/silicon_provider.py backend/services/providers/dashscope_provider.py backend/services/providers/tokenpony_provider.py backend/services/providers/modelengine_provider.py Co-Authored-By: Codex <codex@openai.com> * feat(W1 step 7): expose capacity fields in Add/Edit Model forms Add explicit model-capacity controls to model management so operators can promote known capacity values through the existing model create and update flows. - extend frontend model types and service request/response mappings for capacity fields - add shared capacity form controls with tokenizer autocomplete, source badge, profile version text, and legacy max_tokens warning - wire capacity validation and operator payloads into Add/Edit Model dialogs - localize labels, tooltips, source names, and validation messages in en/zh Verification: - npm run type-check - node -e "const fs=require('fs'); for (const f of ['frontend/public/locales/en/common.json','frontend/public/locales/zh/common.json']) { JSON.parse(fs.readFileSync(f,'utf8').replace(/^\uFEFF/,'')); } console.log('locale json ok')" Co-Authored-By: Codex <codex@openai.com> * docs: review 5 findings (CM-017, CM-018, CM-021, CM-024, CM-025) Review and accept decisions for 5 findings: - CM-018: structural validation blocks commit, semantic quality routes to W15 SLO - CM-021: source lineage + mandatory presence validation blocks, semantic coverage to W15 - CM-024: use claim-scoped production readiness terminology - CM-017: finite initial conflict set with explicit unresolved failure - CM-025: subagent as independent agent with parent_session_id, async tool delegation, no recursion Updated: finding-review-decisions.md, findings-registry.md (20/26 complete), W4, W6, W10, W11, W12, W13, parent plan. Added: pending-findings-decision-sheet.md for decision tracking. Remaining 6 findings (CM-009, CM-010, CM-014, CM-015, CM-022, CM-026) pending individual discussion. * docs: accept CM-026 decision — exclude unsupported modalities from Release 1 gates Remove multimodal testing from Release 1 SLO gates. W15 covers text modality only; add modality contracts when specific product requirements emerge. Updated: finding-review-decisions.md, findings-registry.md (21/26 complete), W15, W3, pending-findings-decision-sheet.md. * docs: retire W7, merge checkpoints into W5 as compression.snapshot events Architectural simplification: checkpoints are no longer an independent subsystem (W7). Compression results are stored as compression.snapshot events within the W5 execution event log. Recovery finds the latest compression.snapshot event and replays subsequent events. Eliminates: - Independent checkpoint table and CAS concurrency control - Redis checkpoint cache layer - W8 checkpoint-specific validation - CM-014 checkpoint schema migration (covered by CM-005) - W7 publication outbox for cross-system consistency Updated: W5 (compression.snapshot event type, recovery flow, dirty-state flush), W6, W8, W9, W13, W14, W15, parent plan, README, review artifacts. Deleted: W7_Durable_Multi_Worker_Context_State.md. CM-014 marked N/A (22/26 findings complete). * fix(W1): clarify optional capacity fields * docs: accept CM-009 decision — defer workload envelopes until post-implementation measurement Do not pre-define workload envelopes. After W1-W16 implementation, use W15 measurement infrastructure to collect real performance data and define envelopes based on observed data. No production-scale claim until envelopes are defined. Aligns with CM-004 (measure before optimizing) and CM-011 (evidence-based gates). Progress: 23/26 findings complete. * docs: accept CM-010 decision — defer numeric targets until post-implementation measurement Do not pre-define numeric availability, RPO, RTO, rebuild time, queue lag, or storage capacity targets. After W1-W16 implementation, use W15 measurement infrastructure to collect real recovery/availability data per topology and define targets based on observed data. No production-scale claim until targets are defined. Aligns with CM-009 (measure before defining envelopes) and CM-011 (evidence-based gates). Progress: 24/26 findings complete. * docs: accept CM-015 decision — remove content hashing, use O(1) metadata validation W7 retirement eliminates the primary O(history) hashing consumer. Replace content hashing with metadata-based validation at three points: 1. compression.snapshot: partial_after_erasure + version fields 2. W6 materialized cache: snapshot validity + event count + version fields 3. Physical erasure: one-time partial_after_erasure flag No Merkle trees or segmented hashing needed. Storage-layer integrity handled by database checksums, not W8. Progress: 25/26 findings complete. * fix(web): bind production server to all interfaces * docs: accept CM-022 decision — consolidate decision traces into unified OpenTelemetry spec Consolidate all decision trace requirements (W5, W6, W10, W15) into a single unified telemetry/observability specification (low priority, post-core). Use OpenTelemetry-style spans/attributes/events collected by external observability infrastructure, not product-internal persistence. Updated: W15 (replace decision trace persistence with OTel output), parent plan (replace decision trace references with unified telemetry spec), finding-review-decisions.md, findings-registry.md (26/26 complete), pending-findings-decision-sheet.md. All 26 findings now reviewed and decided. * fix(W1 step 7): expose capacity fields in ProviderConfigEditDialog Step 7 added capacity controls to ModelEditDialog (the OpenAI-API-Compatible "custom model" edit path) but missed ProviderConfigEditDialog, the dialog opened by the per-model gear icon under provider-categorized sections (SiliconFlow / DashScope / TokenPony / ModelEngine). For any model whose model_factory matches a recognized provider — including the W1 catalog keys 'dashscope' / 'silicon' / 'tokenpony' — that gear icon was the only edit path, leaving operators no way to set context_window_tokens et al. Changes: - ProviderConfigEditDialog: accept optional initialCapacity and hideCapacityFields props; render ModelCapacityFields when supported; include capacity payload in onSave callback shape. - modelService.updateBatchModel: accept and forward the 6 capacity fields (context_window_tokens, max_input_tokens, max_output_tokens, default_output_reserve_tokens, tokenizer_family, capacity_source) to the existing batch_update_models endpoint, which already pass-throughs arbitrary update_data per backend/services/model_management_service.py line 347. - ModelDeleteDialog single-model gear path: pass current capacity values from selectedSingleModel as initialCapacity, and forward saved capacity fields into the updateBatchModel call. - ModelDeleteDialog provider-level "Edit Config" path: pass hideCapacityFields={true} since handleProviderConfigSave applies settings batch-wise to all models from one provider and per-model capacity is not a batch concept. No behavior change for callers that don't pass initialCapacity (backward compatible). Verified with npm run type-check. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test: stabilize test_model_provider_service against dual-import sys.modules pollution Two tests (test_get_models_llm_success, test_get_models_embedding_success) failed intermittently when test_model_provider_service.py ran after test_capacity_resolver.py or test_silicon_provider.py. Root cause: silicon_provider is loaded under two distinct sys.modules keys — `services.providers.silicon_provider` (the path production code uses) and `backend.services.providers.silicon_provider` (the path some test files use). Each binding gets its own `SILICON_GET_URL` attribute because `silicon_provider.py` does `from consts.provider import SILICON_GET_URL`, which copies the value into the importing module's namespace. When both keys are present, mock.patch targeting only the `backend.` path silently fails to override the value used by the production code path that SiliconModelProvider.get_models executes. Fix: introduce _patch_provider_module_constant context manager that patches the named attribute on every loaded copy of the module. Apply to all four SILICON_GET_URL mock.patch sites in this file. Verification: - 289 tests pass under the previously-failing combined order: test/sdk/core/models/test_capacity_resolver.py + test/sdk/monitor/test_monitoring.py + test/backend/services/providers/ + test/backend/services/test_model_provider_service.py The helper is order-independent and safe even when one of the two sys.modules paths is absent. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(W1): record post-acceptance known limitations and open W17 for capacity-suggestion UX W1 ADR additions: - KL-1: catalog miss for default model_factory='OpenAI-API-Compatible'. Manual-add LLM rows skip the embedding-only _infer_model_factory path, fall through to ProviderCapabilityUnknown, and lose catalog values. Documented with the end-to-end workaround verified on 2026-06-15 for glm-5.1 (catalog hit confirmed via direct SQL UPDATE). - KL-2: provider-level batch Edit Config dialog hides capacity controls because they are per-model. Per-model gear icon path exposes them (fix landed 2026-06-16). New W17 workstream proposal: - POST /api/v1/models/suggest-capacity endpoint and frontend wiring. - Catalog fuzzy match + provider discovery, returns placeholders for the capacity form. Operator accepts → saved with capacity_source='operator'. - Subsumes the LLM gap in _infer_model_factory by replacing it with a shared host-to-provider map. - Phased rollout behind a feature flag, with SLO target of >=70% match rate on new manual-add LLM rows. Workstream README updated to index W17 under Model Capacity and Request Safety, with a dependency note linking to KL-1. The ADR remains Accepted. KL-1/KL-2 are post-acceptance discoveries that trigger the new workstream rather than reopen the ADR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs: update W3 with dispatch path analysis and bypass elimination plan Add current dispatch path analysis: 1 chokepoint (openai_llm.py:186), 9 trusted paths, 2 production bypasses (B1: llm_utils.py, B2: conversation_management_service.py). Split step 9 into sub-steps: - 9a: Fix B1 (system prompt generation bypass) - 9b: Fix B2 (title generation bypass) - 9c: Credential isolation (architecture layer) Add bypass files to repository touchpoints. Add bypass elimination tests. * docs(W17): integrate post-acceptance workstream into both production plans Per classification decision (Option A): W17 sits in the existing "Model Capacity and Request Safety" module — same owners as W1-W3 — but is marked Medium / post-acceptance to distinguish it from the Blocker-level original freeze. This avoids creating a new module table for a single workstream while keeping the design-freeze boundary intact. Both plans: - §1.2 (en) / §1.1 (zh) per-workstream table: add W17 row labeled "Medium (post-acceptance)" / "中 (落地后增加)" linking to its spec. - New §1.4 (en) / §1.3 (zh) "Post-Acceptance Additions" section: explain that W17 was opened after the 2026-06-12 design freeze, triggered by KL-1 surfaced during the glm-5.1 end-to-end test. Document the KL- vs CM- finding prefix convention. - §2.3.1 module section: add a full W17 entry after W3 with status, problem, solution, proof, acceptance criteria, and the "post-acceptance, unscheduled" schedule note. - §3 Phase plan table: add a sixth row "Post-acceptance follow-ups" / "落地后增加" decoupled from Phase 0-5, with a clarifying paragraph that W17 and future KL-triggered work do not move the August 7 milestone. Frozen design-phase documents are NOT modified to avoid rewriting history: - context-management-weekly-design-summary-zh.md (2026-06-08 to 06-12 status) - review/findings-registry.md (26 CM- findings closed) - review/over-engineering-secondary-review.md ("no new unconditional workstream"; W17 is conditional on observed KL-1) - All review/phase*-review.md per-W reviews - W1_HANDOFF_remaining_steps_3_7_8.md (historical handoff, steps closed) The over-engineering guardrail still applies: W17 is conditional on the specific named limitation KL-1, not a new unconditional workstream. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(W1 step 7): unify max_tokens with capacity panel and migrate legacy on edit Frontend UX corrections discovered during W1 end-to-end testing: 1. Add Model dialog (single model) The standalone "Max Tokens *" field has the same semantic meaning as max_output_tokens in the capacity panel (W1 step 4 makes them aliases on the SDK side). Showing both is confusing and forced operators to type the same number twice. For LLM/VLM types the legacy field is now removed: - ModelCapacityFields gains a `formMode` prop. In 'add' mode the panel renders as a flat labelled section (no Collapse, no "empty hint" alert) and hides defaultOutputReserveTokens; required fields render a red asterisk and are enforced through validateCapacityForm. - ModelAddDialog passes formMode='add' with requiredFields=['contextWindowTokens', 'maxInputTokens']. The legacy Max Tokens input renders only when supportsCapacityFields is false (voice/rerank types still use it). - isFormValid drops isValidMaxTokens(form.maxTokens) when supportsCapacityFields is true; capacity validation is the source of truth. - The connectivity-verify config now reads form.maxOutputTokens for LLM/VLM (with parseMaxTokens fallback) since the standalone field is gone. - buildCapacityPayload mirrors maxOutputTokens into the deprecated maxTokens column so legacy readers that haven't been migrated yet still see the value, removing an implicit dependency on the SDK Pydantic alias firing on every backend code path. 2. Edit Model dialog yellow deprecation warning The warning "max_tokens 已废弃,请使用 max_output_tokens" fired even after the user typed a new max_output_tokens value, because the trigger read model.maxTokens / model.maxOutputTokens props instead of the live form state. capacityFormFromModel now auto-promotes a legacy model.maxTokens value into the form's maxOutputTokens on load so the operator sees the value pre-populated, and the warning condition adds a "&& !form.maxOutputTokens" check so it disappears as soon as the form has a value. Saving from there writes to the max_output_tokens column, which permanently clears the warning next time the row is loaded. Both invocations of ModelCapacityFields in ModelEditDialog (ModelEditDialog and ProviderConfigEditDialog) got the same correction. ProviderConfigInitialCapacity now exposes maxTokens so the helper can auto-migrate from the per-model gear path too; ModelDeleteDialog forwards selectedSingleModel.max_tokens. Locale strings added: - model.dialog.capacity.error.requiredMissing (en/zh) Verified: npm run type-check passes; locale JSON parses. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(W1 step 7): Add panel description gone; tokenizer shares row; Edit drops legacy max_tokens Two more UX corrections from W1 end-to-end testing: 1. Add Model panel cosmetic The "Optional Capacity Settings — used to override or confirm model capacity; leaving it empty will not block adding the model" header text sat above the capacity inputs in add mode but in 'add' mode the fields are part of the required form, so the "optional" framing was misleading and the body label/description duplicated info already on each input. Drop the header block in add mode; render content directly. Layout had four numeric inputs in a 2-column grid then a full-width tokenizer field underneath. That made row 1 = (context, input), row 2 = (output, ___), row 3 = tokenizer alone — an awkward orphan slot in row 2. In add mode the tokenizer now slots into the grid next to maxOutputTokens (no defaultOutputReserveTokens shown here), giving two tidy rows. Edit mode is unchanged: defaultOutputReserveTokens takes the fourth slot and tokenizer renders full-width below. 2. Edit Custom Model still showed both max_output_tokens and max_tokens Step 7 only stopped rendering the legacy maxTokens field in Add Dialog. The Edit Dialog continued to render it alongside the capacity panel's maxOutputTokens, defeating the merge the Add fix made. ModelEditDialog now hides the standalone maxTokens field when supportsCapacityFields is true, drops the corresponding isValidMaxTokens validation from isFormValid, and falls back to form.maxOutputTokens for the connectivity-probe maxTokens parameter (with parseMaxTokens(form.maxTokens) fallback so any pre-existing legacy value still works). Verified npm run type-check; locale untouched this commit. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs: clarify W4 step 4 and step 6 implementation details Step 4: Clarify that W4 verifies W5 schemas include identity columns rather than adding them (W5 owns the schema definition). Step 6: Keep deprecated APIs with deprecation notice for next version removal, rather than immediate removal. * fix(W1 step 7): required = context_window + max_output; drop Collapse; consistent across Add/Edit Corrections after the previous round's UX review: 1. Required fields were wrong. Previous commit required (contextWindowTokens, maxInputTokens). The correct W1 requirement is (contextWindowTokens, maxOutputTokens) — the two values that bound the request budget end-to-end. max_input_tokens stays optional because almost no real provider exposes a distinct hard input limit; the resolver falls back to context_window - requested_output when it's null. Updated three call sites: - ModelAddDialog: requiredFields and validateCapacityForm both ['contextWindowTokens', 'maxOutputTokens']. - ModelEditDialog inner panel: same requiredFields + same validation set. - ProviderConfigEditDialog inner panel: same. 2. Edit dialogs no longer Collapse the capacity panel. With context_window and max_output now required for both add and edit, hiding the inputs behind a Collapse hides the red asterisks until the user clicks the title. ModelCapacityFields drops the Collapse entirely and renders flat in both modes. The 'add' vs 'edit' formMode prop now only differentiates whether default_output_reserve_tokens is shown (it stays in edit, hidden in add) and where the tokenizer field sits (beside max_output in add, full-width in edit). 3. Empty-state hint suppressed when requiredFields is non-empty. The locale string `capacity.emptyHint` advised "you can fill these later", which contradicts required asterisks. Hide it whenever any requiredFields are passed; show only for the legacy advisory case. Verified npm run type-check. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs: refine W5 implementation plan with sub-steps and clarifications - Split step 1 into 3 ADR sub-steps (taxonomy/schema, ordering/idempotency, evolution) - Split step 3 into 4 code path sub-steps (agent loop, tool execution, error/cancel, answer) - Add 4-phase migration plan to step 7 (shadow, read switch, write switch, remove direct writes) - Clarify new event-log database module responsibilities in Repository Touchpoints - Add performance baseline test requirement * docs(W17): close three self-review gaps before implementation Applied the W1 retrospective checklist to W17 (which I wrote after the retrospective and which still hit the same lessons). Three corrections: 1. Repository touchpoints missed sibling frontend components. The original list named ModelAddDialog, ModelEditDialog, and ModelCapacityFields but omitted ProviderConfigEditDialog (the per-model gear icon dialog) and ModelDeleteDialog (the provider browser). Both are valid model-add entry points and the suggestion logic must reach them, or W17 reproduces W1 step 7's "only ModelEditDialog got the new fields" miss. 2. Frontend implementation plan was 3 items hiding 7 concerns. Expanded into 7 numbered items grouped by concern: service layer (4), form state machine with suggested/operator distinction (5), debounce trigger and no-match graceful fallback (6), match_explanation Alert rendering (7), coverage of all three add paths including provider browser (8), error-mode contract (9), and locale strings (10). 3. No operational dependencies section. Added a table covering which containers need rebuilding (nexent-runtime + nexent-northbound + nexent-config + nexent-mcp for backend; nexent-web for frontend; nexent-postgresql untouched), new env var CAPACITY_SUGGESTION_ENABLED, optional per-tenant flag in tenant_config_t for staged rollout, monitoring dashboards to add, rollout sequence (staging → one internal tenant → paid → all), and rollback procedure (env var off → no schema cleanup needed). These three corrections come from the W1 spec review checklist that this commit was the trigger to formalize. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(W2 review): formalize six-item checklist from W1 retrospective; apply to W2 Two new documents: SPEC_REVIEW_CHECKLIST.md — the reusable artifact. Codifies the W1 post-acceptance retrospective's six lessons as a checklist with concrete sub-questions per item: 1. User Journey — who sees what change end to end 2. Frontend Step Decomposition — ≥3 sub-items covering state / visual / service / validation / migration / siblings 3. End-to-End Demo Script in Acceptance — concrete, copy-pasteable, with negative path 4. Operational Dependencies — containers / migrations / env vars / flags / runbook / monitoring 5. Sibling Components Enumerated — every dialog / function / column / module-key sibling named or explicitly out of scope 6. Reverse-Test "Can the user actually use this" — operator can know feature is active, can reach values from UI, can observe fallback W2_REVIEW.md — applies the checklist to W2 + the four reader-surfaced issues the user spotted independently: Item 1: User Journey — 🔴 missing Operator-Visible Effects section Item 2: Frontend Decomposition — 🔴 no decision on UI for soft_limit_ratio / per-agent override Item 3: End-to-End Demo — 🟡 abstract, demo script proposed Item 4: Operational Dependencies — 🟡 nothing-to-do but unstated Item 5: Sibling Components — 🔴 six current local-reserve sites in agent_context.py not enumerated; W2→compaction handoff missing Item 6: Reverse Test — 🟡 no operator-visible activity indicator Issue A: soft_limit_ratio default unspecified — recommend 0.8 Issue B: requested_output_tokens override location undefined — per-agent (DB column + agent-edit UI) vs per-request (API body) are two distinct contracts buried in one sentence Issue C: W2 ↔ W13 compaction-model relationship undefined — each model call needs its own W1→W2 chain; W2 spec must say snapshots are per-model, not shared (same defect class as the W1 catalog problem) Issue D: Step 5 "consistent" semantics ambiguous — clarify it's the CM-013 trusted-dispatch enforcement contract, not a rename Verdict: W2 spec is not Ready to Implement; 7 of 10 items need updates. None invalidate the architecture — they are under-specifications that would reproduce W1-style post-acceptance surprises if shipped to implementation as-is. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(review): convert W2 post-acceptance review to CM-NNN format under review/ Removed W2_REVIEW.md from the workstreams folder — wrong location and wrong format, did not follow the established phase2-w*-review.md convention (concise per-W file + central findings-registry.md). Re-published in the correct shape: - review/findings-registry.md: added CM-027 through CM-030 with Severity / Delivery classification / Affected documents / Description / Minimum non-over-engineered response columns matching the existing 26 design-phase entries. Severity Summary updated (was 4/10/7/5 = 26, now 4/12/9/5 = 30). - review/phase6-w2-review.md: new file in the same concise format as phase2-w*-review.md. Phase 6 is defined here as the post-acceptance review track opened after the W1 retrospective, distinct from Phase 2 (design-phase per-W reviews) — same numbering convention, different trigger. The four findings translate the W1 retrospective lessons + user-surfaced W2 issues into CM-style entries: CM-027 Medium — soft_limit_ratio default unspecified; min response set default 0.8 with per-tenant override path. CM-028 Medium — per-agent vs per-request override are two contracts in one sentence; min response specify both and decide W2 scope. CM-029 High — per-model snapshot rule unstated; W13 compaction call needs its own W1->W2 chain (same defect class as W1 KL-1). CM-030 High — Step 5 "consistently" is the CM-013 trusted-dispatch enforcement contract, not a rename; min response add server-side assertion + negative test. The W17 follow-up workstream's KL-1/KL-2 references in W1 ADR and the production plans remain in the KL- namespace for now; migrating those to CM- can happen in a separate consistency pass if desired. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs: refine W6 with projection priority, ContextItem scope, and implementation clarifications - Add projection implementation priority (Release 1 required/optional/deferred) - Clarify which projections produce full ContextItem vs simple records - Define 'zero semantic mismatch' criteria for chat shadow comparison - Clarify W8 validation call pattern in Phase 3 step 3 - Add performance baseline test requirement in Phase 4 - Clarify backend projection registry responsibilities * docs: update W8 to align with CM-015 decision (remove content hashing) Replace content-based hashing with O(1) metadata-based validation: - compression.snapshot: partial_after_erasure flag + version field comparison - W6 materialized projections: snapshot validity + event count + version fields - Physical erasure: one-time partial_after_erasure flag propagation Updates: - Validity Contract: remove content hash, add metadata validation inputs - Implementation Plan step 2: replace streaming hashing with metadata validation - Implementation Plan step 4: use DerivedStateValidator (not CheckpointValidator) - Implementation Plan step 7: 'derived state' instead of 'checkpoint' - Validation and Invalidation Delivery: remove canonical serialization/hash algorithm - Add CM-015 finding reference * docs: unify finding namespace (KL-* → CM-*), close 9 review decisions, fix W13 dep stale W7 Three coordinated cleanups in one commit: 1. KL-* → CM-* migration (consistency with established review namespace) The KL- prefix was a one-off I introduced earlier to mark post-acceptance findings as distinct from the 26 design-phase CM- findings. Per the established review-folder convention (see review/findings-registry.md + review/finding-review-decisions.md), all findings should share one CM-NNN namespace regardless of when they were discovered. Renames: KL-1 → CM-031 (catalog miss for default model_factory) KL-2 → CM-032 (provider-level batch dialog cannot host per-model capacity) Updated references in: W1 ADR (Known Limitations section, kept the "formerly KL-1/KL-2" parenthetical as an audit trail), W17 spec, context-management-production-plan.md and -zh.md (§1.4 / §1.3), README workstream index W17 row, SPEC_REVIEW_CHECKLIST.md, and review/phase6-w2-review.md. Removed the "落地后局限使用 KL-N 前缀" explanation from both production plans since the namespace is now unified. 2. CM-027 through CM-032 added to review/finding-review-decisions.md Six new finding-decision sections written in the same format the team established for CM-001 through CM-026: Decision / Approved minimum / Rationale / Explicitly out of scope / Updated documents. Covers: CM-027 W2 soft_limit_ratio default = 0.8 CM-028 requested_output_tokens override = per-agent column + per-request API field, two distinct contracts CM-029 Per-model snapshot rule for secondary model dispatch (W13) CM-030 W2 Step 5 = CM-013 trusted-dispatch enforcement, not rename CM-031 catalog miss for default model_factory (formerly KL-1) CM-032 provider-level batch dialog cannot host per-model capacity (formerly KL-2) 3. README W13 dependency W7 → W5 After the team's W7 retirement merge, README line 49 still listed W13's dependencies as "W2, W3, W7". Updated to "W2, W3, W5" since W7's checkpoint/snapshot responsibilities are now W5 compression.snapshot events. 4. findings-registry.md Severity Summary updated Was 4/12/9/5 = 30 after merge. After adding CM-031 (Medium) and CM-032 (Low), now 4/12/10/6 = 32. 5. English production-plan W7 residuals checked The four W7 mentions remaining in context-management-production-plan.md (workstream-table row, w7 anchor, retired heading, retirement-context bullet listing what is NOT being adopted from W7) are intentional historical markers in the W7 retirement section and were left in place. Net change: ~20 lines across 9 files, no code, no migration. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs: update W9 with terminology fixes, resolve_ambiguous_effect, and subagent conflict check - Replace 'checkpoint' with 'compression.snapshot' throughout - Add resolve_ambiguous_effect to implementation order (step 4) - Add subagent conflict check: reject mutating lifecycle operations when parent session has pending subagent sessions, even after parent run's active_run_id is cleared (async subagent scenario) - Add subagent conflict test - Add subagent session query to repository touchpoints * docs: refine W10 with deprecation notice, subagent policy independence, and performance tests - Step 7: Mark bypass paths as deprecated (not immediate removal) - Add Subagent Policy Independence section: subagents resolve their own W10 policy; parent policy governs subagent result integration - Add performance baseline test requirement for policy resolution and context selection latency * docs: refine W11 with subagent reducer independence and step 3 clarification - Step 3: Clarify deterministic reducers (structured, pointer) generate on demand; semantic reducers (compressed) cache at creation/update since regeneration involves LLM calls - Add Subagent Reducer Independence section: subagents use their own reducer chain; parent reducers do not apply to subagent internal context - Add performance baseline tests to tests section (lower priority, after functional implementation is stable) * docs: refine W12 with offload threshold clarification, subagent artifact isolation, and performance tests - Step 6: Replace 'observation limits' with 'offload thresholds' — outputs exceeding threshold are stored as artifacts with pointers (full content preserved), not truncated. Context space decisions remain with W10/W3. - Add Subagent Artifact Isolation section: subagent artifacts scoped to subagent session; parent cannot directly access subagent artifacts. - Add performance baseline tests (lower priority, after functional implementation is stable). * docs: update W13 with current state gap analysis and implementation refinements - Add Current State and Gap Analysis section: maps current agent_context.py implementation against W13 requirements, identifies 21 gaps (16 critical) and 5 existing strengths - Add Compression Trigger Conditions: W2 soft_limit_ratio as primary trigger, two-phase thresholds as implementation details - Add Fallback Model Selection Strategy: primary → fallback → W11 hard reduction cascade - Step 4: Add measurable progress criteria (compressed tokens < source tokens, reject with no_progress if not) - Add Subagent Compression Independence section: subagent sessions use own CompactionPolicy independently - Add performance baseline tests (lower priority, after functional implementation is stable) * docs: refine W14 with deprecation notice, subagent governance, and performance tests - Step 9: Mark raw/direct write paths as deprecated (not immediate removal) - Add Subagent Governance section: subagent sessions apply W14 internally using their own agent configuration; subagent final answer is already governed output; parent W10 policy governs integration; W14 does not re-redact already-redacted content - Add performance baseline tests for redaction latency and deletion propagation latency (lower priority, after functional implementation) * docs: clarify W15 step 1 baseline timing and performance coordination - Step 1: Clarify that baseline measurements should be established before W1-W14 implementation starts (required to quantify improvement) - Required Deliverables: Add note that W15 coordinates performance baseline tests across W5, W6, W10, W11, W12, W13, and W14 (lower priority but W15 defines measurement standards and targets) * docs: add W16 subagent cache optimization and performance baseline priority - Add Subagent Cache Optimization section: subagent sessions apply W16 independently using their own agent configuration; cache partition plan scoped to subagent session - Add note that repeated-turn performance baseline tests are lower priority (after functional implementation is stable) * docs: renumber W-IDs to match new development sequence Renumbered all W-ID documents to follow the optimized development order: Original → New mapping: - W1 (Capacity Config) → W1 (unchanged) - W2 (Safety Reserve) → W2 (unchanged) - W4 (Tenant Isolation) → W3 - W5 (Event Log) → W4 - W6 (History Separation) → W5 - W8 (Cache Validation) → W6 - W9 (Lifecycle APIs) → W7 - W10 (Unified Policy) → W8 - W11 (Progressive Reduction) → W9 - W12 (Output Control) → W10 - W14 (Trust/Redaction) → W11 - W13 (Reliable Compaction) → W12 - W15 (Quality SLOs) → W13 - W16 (Cache-Aware Assembly) → W14 - W3 (Guaranteed Fit) → W15 This reordering ensures: - No forward dependencies (each W-ID only depends on earlier W-IDs) - W15 (Guaranteed Fit) comes after W14 (Cache-Aware Assembly) which it consumes - W12 (Reliable Compaction) comes after W11 (Trust/Redaction) which it depends on - W3 (Tenant Isolation) comes before W15 (Guaranteed Fit) which needs it Updated all internal W-ID references across all documents. * docs: update production plan with new W-ID order and phase structure - Update Section 1.1: 16→15 workstreams, module table W-IDs - Update Section 2.1.2: Checkpoint→Compression Snapshot terminology - Update Section 2.2: Architecture diagram (Checkpoints→Compression Snapshots) - Update Section 2.3: Workstream descriptions with all refinements - W15: Add dispatch bypass elimination (B1, B2) - W10: Clarify offload threshold vs truncation - W12: Add current state gap analysis reference - W14: Add subagent cache optimization - Update Section 3.1: Phased delivery plan for new W-ID order - Phase 1: W1, W2, W3 (Foundation) - Phase 2: W4, W5, W6 (Event Infrastructure) - Phase 3: W7, W8, W9, W10, W11 (Lifecycle and Policy) - Phase 4: W12, W14 (Compaction and Assembly) - Phase 5: W13, W15 (Quality and Fit) - Update Section 3.2: Gantt chart for new timeline - Update Section 3.3: Dependency diagram for new order * docs: fix all W-ID anchor links in production plan Fixed 52 incorrect anchor links throughout the production plan document. All [W\d+](#w\d+) links now correctly match the new W-ID numbering: - W1-W15 links now point to correct anchors (#w1-#w15) - Updated Section 0.1-0.3 comparison tables - Updated Section 1.2 detailed improvement table - Updated Section 2.3 memory control capabilities table - Updated Section 2.4 ClawVM adoption table - Updated Section 3.1 phase table All anchor links now follow the pattern [Wn](#wn) where n matches. * docs: revise W17 capacity suggestion spec * docs: rewrite Chinese production plan with new W-ID numbering - Translate updated English version (1296 lines → 1208 lines Chinese) - Move from doc/working/ to doc/working/context-management-workstreams/ - Update all W-ID references to new numbering (W1-W15) - W7 marked as retired (compression.snapshot merged into W4) - New phase structure (5 phases with correct W-ID groupings) - Professional terms kept in English where appropriate - Mermaid diagrams preserved in English - Old file deleted from previous location * docs(W2): add ADR for budget snapshot overrides and dispatch enforcement Add W2_ADR_Budget_Snapshot_Overrides_and_Dispatch_Enforcement.md defining: - Override precedence: operator column > model default > resolver fallback - Fingerprint algorithm: SHA-256 over W1 fingerprint + W2-specific fields - DB column: ag_tenant_agent_t.requested_output_tokens nullable positive int - SDK dispatch assertion: max_tokens must equal snapshot.requested_output_tokens This ADR formalizes the contracts identified in CM-028, CM-029, CM-030 and provides the design anchor for W2 implementation steps 3-5. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(W2): absorb CM-027-CM-030 findings into spec and production plan W2 spec updates: - CM-027: soft_limit_ratio default 0.8, per-tenant override via tenant_config_t - CM-028: two distinct override contracts (per-agent column + per-request API field) - CM-029: snapshots are per-model; W13 must invoke W1→W2 chain for compaction model - CM-030: CM-013 trusted-dispatch enforcement at provider call (assert max_tokens == snapshot.requested_output_tokens) Production plan updates: - Per-agent column and per-request API field documented - soft_limit_ratio default and override path - per-model snapshot chain for compaction (W13 dependency) - dispatch assertion contract All four findings from W2 post-acceptance review now integrated into the spec. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Add W2 capacity budget skeleton * docs: remove retired W7 strikethrough row from Chinese production plan table * Add W2 reserve policy configuration * Implement W2 safe input budget calculator * docs: add Chinese translations for all W-ID specification documents (W1-W17) * Resolve W2 request safe input budget * Apply W2 safe budgets to context manager * Enforce W2 output tokens at dispatch * Emit W2 budget snapshots to monitoring * Surface W2 uncertainty reserve warning * Verify W2 budget fingerprint at dispatch * Verify W1 capacity identity at W2 dispatch Defense-in-depth check per CM-013: the trusted dispatch boundary now rejects a W2 safe-input-budget snapshot whose `w1_fingerprint`, `provider`, or `model_name` disagrees with the active W1 capacity snapshot threaded alongside it. This closes the model-swap mid-flight, stale-cache, and cross-tenant snapshot-reuse failure modes that the prior self-only fingerprint check would silently let through. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Backfill W2 capacity from W1 catalog for legacy deployments W1 step 7 made context_window_tokens and max_output_tokens required at the Add/Edit forms, but pre-existing model_record_t rows in production deployments still have NULL capacity columns and silently disable W2's CM-030 dispatch enforcement. This migration auto-fills the eight W1 day-one catalog entries on rows where (LOWER(model_factory), model_name) matches and capacity is still NULL. It is idempotent (re-runs are no-ops) and ships as a regular docker/sql migration so every downstream deployment picks it up on upgrade. Rows whose model_factory does not match a catalog provider key (commonly the manual-add default 'OpenAI-API-Compatible' per CM-031) are left untouched; the resolver fallback log is upgraded to WARNING with an actionable remediation message so operators can identify exactly which models still need attention before W17 ships. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs: add codebase gap analysis, reorder priorities, mark deferred workstreams - Add §1.5 Codebase Gap Analysis to both EN/ZH production plans - Update §1.2 improvement table with Status column and new priority order - Move W14 (prompt cache) to Phase 1: high value, zero dependencies - Mark W5, W6(full), W8(full), W10(artifact), W11(full) as tentatively deferred - Update Phase table, descriptions, Gantt chart, and dependency diagram - Add gap analysis notes to W3, W4, W6, W8, W10, W11, W12, W14 docs - Restructure README workstream index: Active / Deferred / Retired sections * Make missing-capacity warning operator-friendly and dedup it Two fixes to the WARNING surfaced when a model has no capacity configured: 1. Drop internal design-doc jargon. The previous message mentioned CM-030, CM-013, and W17 — none of which are meaningful to an operator reading backend container logs. Replaced with plain English that names what is disabled (output token cap + budget consistency check) and the exact UI path to fix it. 2. Deduplicate per process per model_id. Without this, every agent run logged the same line, so a tenant with 1k daily messages on a bare model would emit 1k duplicate warnings per day and drown real signal. A module-level set tracks already-warned model_ids; the warning fires once per process per model and is cleared only on process restart. Includes the ResolverError branch which previously had a separate WARNING line — both branches now route through the same dedup helper. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(W17): add visibility surfaces for existing bare-capacity models W17's original scope was preventing new bare rows at add/edit time. It did not address the complementary problem: rows that already exist in a bare state silently disable W2 enforcement, and the only signal today is a backend WARNING that the people who can fix it (model administrators, agent authors) never see. Adds a new "Visibility for Existing Bare-Capacity Models" section specifying three UI touchpoints — model management list badge, agent-edit selector warning, and an operator dashboard widget — backed by a small read-only GET /api/v1/models/capacity-coverage endpoint. The visibility work is phase-tagged as 1.5 so it can ship behind a separate small flag without waiting for the connectivity-integration and provider-discovery work in later phases. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs: renumber W-IDs by priority, rename deferred to P-IDs Active workstreams renumbered by implementation priority: W1 (token capacity), W2 (output reserve) - unchanged W3 (prompt cache, was W14) - moved to Phase 1 W4 (tenant isolation, was W3) W5 (event log, was W4) W6 (compaction reliability, was W12) W7 (lifecycle APIs) - unchanged W8 (progressive reduction, was W9) W9 (quality SLOs, was W13) W10 (guaranteed fit, was W15) W11 (capacity suggestion, was W17) Deferred workstreams renamed W→P: P1 (history separation, was W5) P2 (cache validation, was W6) P3 (context policy, was W8) P4 (pollution control, was W10) P5 (trust/redaction, was W11) 58 files updated: spec files, translations, production plans, README, ADR, review documents, weekly summary. * Fix soft-delete column name in W2 catalog backfill migration The migration filtered on a non-existent column `deleted_flag = 0`, which never matched any row, so the backfill silently no-op'd on every deployment. The model_record_t soft-delete column is `delete_flag` (String(1), default 'N') per backend/database/db_models.py. Verified on the local cluster: with the corrected filter, the migration matched the one catalog-eligible row (glm-5.1 on dashscope) and populated context_window_tokens=200000, max_output_tokens=131072. Remaining bare rows on the cluster all carry model_factory='OpenAI-API-Compatible' (CM-031), confirming W17 as the remediation path for the default-factory population. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(W17): add bare-row production evidence and scope to LLM/VLM only Two additions to the W17 'Visibility for Existing Bare-Capacity Models' section: 1. Production evidence: a 2026-06-17 snapshot of model_record_t on a live dev cluster showed 6 of 7 non-deleted rows carrying the manual-add default model_factory ('OpenAI-API-Compatible'), and the W2 catalog backfill matched only 1 row — leaving the model the operator was actively chatting with (glm-5) bare. This grounds the workstream's motivation in a concrete observation rather than a projected concern. 2. Scope clarification: embedding, STT, and TTS rows share the same capacity columns but never traverse the W1/W2 path, so a NULL on those rows is not a missed enforcement. The badge, agent-edit selector notice, dashboard widget, and /capacity-coverage endpoint all apply a model_type IN ('llm', 'vlm') filter at the data layer to prevent noise on non-LLM rows. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Raise legacy fallback threshold to 81920 and explain output reserve in UI Two coordinated changes that both came out of W2 end-to-end validation against a bare-capacity model (glm-5): 1. Bump the W1/W2 unknown-capacity fallback from 8192 to 81920 in both backend (_TOKEN_THRESHOLD_LEGACY_FALLBACK) and frontend (TokenUsageIndicator.DEFAULT_THRESHOLD). 8192 was so small that any non-trivial conversation triggered compression almost immediately, masking real usage signal. 81920 fits the input budget of any modern 32K+ LLM; if the actual model is smaller and bare, the provider returns a clear token-overflow error at request time rather than the system silently truncating. Both sides match so the indicator denominator and the backend compression trigger stay in sync when the snapshot path is not available. 2. Add a tooltip on the agent-edit "Output Reserve" form item so model admins and agent authors understand the field's physical meaning: it carves output space out of the context window, and the trade-off between longer replies versus more retained history is explicit. Tooltip strings live in both zh and en common.json. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Retune legacy capacity fallback from 81920 to 32768 After bumping the bare-capacity fallback up from 8192 to 81920 in commit 689e3ec52, 81920 was on the optimistic side: it presumes most unknown models can absorb ~80K tokens of input. Many production deployments still rely on the 32K-context band (GPT-3.5 Turbo 16K, GLM-4 32K, Qwen2 32K, Llama 3 32K, Mistral 32K, etc.), and an 80K input on a 32K model produces a provider-side token-overflow rejection. 32768 is the conservative compromise: it covers the majority of production LLMs without inviting overflow on the still-common 32K class. Models with larger windows lose only a few extra compression cycles, which is the correct cost direction (slightly more work over silent overflow). Backend (_TOKEN_THRESHOLD_LEGACY_FALLBACK) and frontend (TokenUsageIndicator.DEFAULT_THRESHOLD) stay in sync so the indicator denominator matches the backend compression trigger when the W2 snapshot path is unavailable. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs: add capacity values explainer covering W1/W2/W3 number flow Single-file reference doc walking from UI-visible capacity columns (context_window, max_output, default_reserve) through W1 resolver output (provider_input_limit, fingerprint), W2 calculator output (soft / hard input budget, uncertainty reserve), and the four-tier override chain for requested_output_tokens (CM-028). Includes worked examples for the standard configuration, agent-level override, the RequestedOutputExceedsCap failure mode, and the bare-capacity fallback path. Intended audience: model admins, agent authors, and engineers reviewing W1/W2/W3 specs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Enforce output reserve ceiling at the agent-edit form Closes the UX gap where 'Output Reserve' accepted values exceeding the selected model's max_output_tokens. The capacity resolver caught the violation only at agent run time, raising RequestedOutputExceedsCap and failing the conversation with no surface signal to the agent author. Three additions on AgentGenerateDetail: - A conditional Form.Item rule that pins the field's max to the currently selected model's maxOutputTokens. The rule is omitted on bare-capacity models (maxOutputTokens undefined) where the resolver cannot enforce anything anyway. - A matching `max` prop on the InputNumber so the stepper UI also blocks the value, not just the validator. - A useEffect that re-runs validation on requestedOutputTokens whenever the selected model's maxOutputTokens changes, so switching from a 32K-output model down to an 8K-output one immediately surfaces the conflict rather than waiting until save. New i18n key agent.requestedOutputTokens.maxError interpolates the actual ceiling so the error message names the number. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Reject max_input_tokens > context_window_tokens on both ends Closes the audit gap noticed alongside the W2 UX fix: an operator fills max_input_tokens above context_window_tokens, the save succeeds, and the override is silently clipped at runtime because the resolver computes provider_input_limit = min(max_input, context_window - requested_output). The administrator's value never takes effect and no error or log surfaces. Backend fix in capacity_resolver: raise InvalidCapacityConfiguration with a message that names the silent-clipping mechanism so the operator understands why the override was rejected. The check sits right next to the sibling max_output_tokens > context_window check, keeping all cross-field invariants in one place. Frontend fix in validateCapacityForm: add the same cross-field check with a matching i18n key (model.dialog.capacity.error.inputExceedsWindow, zh + en). Surfaces inside the existing ModelEditDialog and ModelAddDialog save flow that already wires validateCapacityForm. Tests: two new cases on test_capacity_resolver — rejection of max_input above the window, and acceptance of the equality boundary (max_input == context_window is legal). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Raise SDK requested_output_tokens fallback from 1024 to 4096 The four-tier override chain for requested_output_tokens ends with a hard-coded SDK constant when neither the agent ('Output Reserve' field) nor the model record (default_output_reserve_tokens column) provides a value. The model-add UI does not render default_output_reserve_tokens at all (only edit mode does), so newly added rows always carry NULL in that column and most agents reach the SDK fallback at runtime. 1024 was too small in practice. Tool-using agents emit a few-hundred- token JSON tool call plus a few hundred tokens of thought per step; 1024 frequently truncated the JSON mid-emission, which then surfaced as a tool-call failure instead of a capacity-config issue. The W2 fingerprint chain stays green and the indicator denominator looks healthy, but replies and tool calls get silently chopped. 4096 covers the median single-turn output for tool chains, short reports, and modest code generation. Models with a smaller max_output_tokens are still safe: the existing RequestedOutputExceedsCap check at capacity_resolver.py:276-283 (and the matching agent-edit Form.Item rule from the prior commit) catches the violation explicitly rather than silently truncating. No tests assumed 1024; the full test_capacity_resolver suite stays green (17 passing). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs: refresh Capacity Values Explainer after UX gap fixes Sync the explainer with the just-landed capacity changes so the doc stops describing the older silent-failure behavior: - Override chain (§3) now names the SDK fallback as 4096 (was 1024) and includes a short note o…
1 parent 9b829f2 commit 4becd69

75 files changed

Lines changed: 8769 additions & 618 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

AGENTS.md

Lines changed: 127 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88

99
<!-- SKILLS_TABLE_START -->
1010
<usage>
11-
When users ask you to perform tasks, check if any of the available skills below can help complete the task more effectively. Skills provide specialized capabilities and domain knowledge.
11+
When users ask to perform tasks, check if any of the available skills below can help complete the task more effectively. Skills provide specialized capabilities and domain knowledge.
1212

1313
How to use skills:
1414
- Invoke: `npx openskills read <skill-name>` (run in your shell)
@@ -40,3 +40,129 @@ Usage notes:
4040
<!-- SKILLS_TABLE_END -->
4141

4242
</skills_system>
43+
44+
---
45+
46+
## Project Overview
47+
48+
Nexent is a zero-code platform for auto-generating AI agents. Monorepo with:
49+
- `backend/` - FastAPI HTTP API
50+
- `sdk/nexent/` - Core agent framework (pip package)
51+
- `frontend/` - Next.js web UI
52+
- `docker/` & `k8s/` - Deployment configs
53+
54+
---
55+
56+
## Developer Commands
57+
58+
### Backend (Python 3.10)
59+
60+
```bash
61+
# Setup
62+
cd backend && uv sync --extra data-process --extra test
63+
64+
# Install SDK for development
65+
cd backend && uv pip install -e "../sdk[dev]"
66+
```
67+
68+
### Run Tests
69+
70+
```bash
71+
# From project root, with backend venv activated
72+
source backend/.venv/bin/activate && python test/run_all_test.py
73+
74+
# Single test file
75+
pytest test/backend/apps/test_agent_app.py -v
76+
```
77+
78+
### Frontend (Next.js)
79+
80+
```bash
81+
cd frontend
82+
npm run dev # Development server
83+
npm run check-all # type-check + lint + format + build
84+
```
85+
86+
### Docker Deployment
87+
88+
```bash
89+
cd docker
90+
cp .env.example .env # Fill required configs
91+
bash deploy.sh # Interactive deployment
92+
```
93+
94+
---
95+
96+
## Architecture
97+
98+
### Environment Variables
99+
100+
**Single source of truth**: `backend/consts/const.py`
101+
102+
- NO direct `os.getenv()` / `os.environ.get()` outside this file
103+
- SDK (`sdk/nexent/`) NEVER reads env vars - accepts config via parameters
104+
- Services read from `consts.const` and pass to SDK
105+
106+
### Backend Layer Structure
107+
108+
| Layer | Path | Responsibility |
109+
|-------|------|----------------|
110+
| Apps | `backend/apps/` | HTTP boundary: parse input, call services, map exceptions to HTTP |
111+
| Services | `backend/services/` | Business logic orchestration, raise domain exceptions |
112+
| Consts | `backend/consts/` | Env vars (`const.py`), exceptions (`exceptions.py`), error codes |
113+
114+
**Exception flow**: Services raise domain exceptions → Apps map to HTTP status codes
115+
116+
---
117+
118+
## Database Migrations
119+
120+
**Location**: `docker/sql/*.sql` (versioned migration scripts)
121+
122+
**Critical rule**: When adding columns/tables via migration script:
123+
- Update `docker/init.sql` (Docker Compose fresh deploy)
124+
- Update `k8s/helm/nexent/charts/nexent-common/files/init.sql` (K8s fresh deploy)
125+
126+
**Version**: Tracked in `backend/consts/const.py` as `APP_VERSION`
127+
128+
---
129+
130+
## Testing Conventions
131+
132+
- pytest only (no unittest)
133+
- Mock at import site with fully-qualified path:
134+
```python
135+
mocker.patch("backend.services.agent_service.AgentService.run", return_value={...})
136+
```
137+
- Async tests: `@pytest.mark.asyncio`
138+
- Test structure: `test/backend/` and `test/sdk/`
139+
140+
---
141+
142+
## Code Style
143+
144+
- English-only comments and docstrings (enforced by `.cursor/rules/english_comments.mdc`)
145+
- Import order: stdlib → third-party → project
146+
- Line length: 119 (sdk ruff config)
147+
148+
---
149+
150+
## Key Files
151+
152+
| File | Purpose |
153+
|------|---------|
154+
| `backend/consts/const.py` | All env var definitions, APP_VERSION |
155+
| `backend/consts/exceptions.py` | Domain exceptions (AgentRunException, LimitExceededError, etc.) |
156+
| `docker/init.sql` | Database schema for Docker Compose |
157+
| `k8s/helm/.../init.sql` | Database schema for Kubernetes |
158+
| `test/run_all_test.py` | Test runner with coverage |
159+
160+
---
161+
162+
## Reference Files
163+
164+
Existing instruction files with detailed rules:
165+
- `CLAUDE.md` - Backend architecture, env var management, app/service layer rules
166+
- `.cursor/rules/environment_variable.mdc` - Env var centralization
167+
- `.cursor/rules/pytest_unit_test_rules.mdc` - Testing patterns
168+
- `.cursor/rules/english_comments.mdc` - Comment language enforcement

0 commit comments

Comments
 (0)