baiying/AMD395 integration (2026-06): ctx sizing, model knowledge, openclaw controls, engine-registry env, A1/A5/A6 fixes#94
Open
rjckkkkk wants to merge 5 commits into
Open
Conversation
…text A high catalog/user ctx_size (e.g. the 128000 default we now want for agent clients) would OOM llama-server at load on memory-constrained machines — AIMA previously passed ctx_size through unchanged (estimateVRAM ignores the KV cache; CheckFit only adjusts vLLM/SGLang gpu_memory_utilization, never llama.cpp ctx). On deploy, for llama.cpp GGUF models, size the context to the hardware: - Read the model's real architecture from the GGUF header (new model.ReadKVArch: block_count / head_count_kv / head_dim / context_length) and compute exact f16 KV bytes/token — not a guess. - Clamp ctx_size down so weights + projector + KV cache fit usable memory (GPU VRAM for discrete GPUs, RAM minus an OS reserve for unified/CPU hosts), and cap it at the model's trained context. Only ever lowers; never raises. - Graceful no-op when the GGUF arch can't be read or memory is unknown. So a high default degrades gracefully: big boxes get the full context, small ones auto-shrink to what fits instead of failing. Bump Qwen2.5-VL-3B default ctx_size to 128000 (its trained max) now that it's memory-safe; the agent floor (~9.3K of MCP tool schemas) always fits where the model fits. clampContextForMemory / usableMemoryMiB are pure and unit-tested; verified on the Strix Halo rig (request 200000 → clamped to trained 128000). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…en3-Embedding-4B Set each model's catalog default ctx_size to its full trained context (verified loading + serving on the Strix Halo iGPU, 2026-06-16): - glm-4.7-flash llamacpp variant: ctx_size 8192 -> 202752. - qwen3.6-35b-a3b: add a universal llamacpp GGUF variant (ctx_size 262144) + gguf source + scan-name aliases (Qwen3.6-35B-A3B-UD-Q4_K_M, qwen3.6-35b-a3b-q4_k_m); it previously had only vLLM/Blackwell variants. - new qwen3-embedding-4b.yaml: type embedding, llamacpp variant with embedding=true (-> llama-server --embedding) + ctx_size 40960; serves /v1/embeddings (2560-dim). Deploy-only — embedding models are not written into OpenClaw config by sync. These large ctx defaults are memory-safe via the deploy-time auto-clamp. Also fix usableMemoryMiB to prefer GPU VRAM over system RAM: an all-offloaded llama.cpp model is bounded by GPU memory, and on unified-memory APUs the OS-visible RAM is under-reported (Strix Halo: ~32GB OS vs ~110GB iGPU). The clamp now uses the iGPU pool, so it won't wrongly shrink contexts on such APUs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… (env) Two product-level controls so a partner integrating AIMA owns OpenClaw behavior without code changes on their side (both honor CLI sync AND the serve auto-sync loop): - AIMA_OPENCLAW_SET_DEFAULT=false — register the AIMA provider + models but do NOT set OpenClaw's primary/default chat model (leave the user's current primary alone). Unset = previous behavior (AIMA sets it). Threaded as Deps.SetDefaultModel *bool → SyncResult.SkipDefaultModel → mergeChatModelDefault early-returns (preserving prior ownership record so toggling back on works). - AIMA_OPENCLAW_CONFIG=<path>/openclaw.json — write to a custom config dir (e.g. a partner using .byClaw instead of .openclaw). Skills, extensions and managed-state all follow filepath.Dir(ConfigPath), so the whole set relocates. This is purely AIMA-side and independent of the OpenClaw/byClaw product's own layout. Verified on the Strix Halo rig: SET_DEFAULT=false → provider written, primary empty; default → primary set; custom config dir → config+skills land under .byClaw. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…log-miss fallback (code-only) Issue 4 (engine image registry / offline via env) + issue 5 (openclaw sync uses backend model type when catalog misses). dist artifacts excluded from develop PR. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ch, A6 undeploy hint A1 (P0) — deploy canonicalizes the model name (alias "Qwen2.5-VL-3B-Instruct-Q4_K_M" → "qwen2.5-vl-3b-instruct"), so undeploy/status/logs with the original alias failed to match. Add canonicalModelAlt(cat, name) and retry undeploy/status/logs with the canonical name; surface the original via a requested_model field in deploy results. A5 — make the serve OpenClaw auto-sync loop switchable: `aima serve --no-openclaw-sync` or AIMA_OPENCLAW_SYNC=manual|off|false|0|no disables it (sync then only on explicit `aima openclaw sync`). Default unchanged (auto). A6 — undeploy now prints a hint that model files are kept and how to free disk (`aima model remove --delete-files <name>`); undeploy semantics unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Consolidated, clean develop PR for the whole 2026-06 baiying/AMD395 integration stack (delivered to partners on the
amd395-winhandoff branch; this straightens it into one mergeable PR). Supersedes #92 and #93. Code only — dist exes/docs live onamd395-win, not develop.Commits (each builds + tests pass):
usableMemoryMiBprefers the iGPU pool (APU OS-RAM under-report). Lets large catalog defaults degrade gracefully.--embedding, 40960). (A7)AIMA_OPENCLAW_SET_DEFAULT(don't touch user's primary) +AIMA_OPENCLAW_CONFIG(custom config dir, e.g..byClaw).AIMA_ENGINE_REGISTRIES/AIMA_ENGINE_REGISTRY) + offline local bundle (issue 4 / B2); OpenClaw sync uses backend-reported model type when the catalog misses, so non-catalog models aren't skipped (issue 5 / A2).undeploy/status/logscanonicalize the model name so the original deploy-time alias works (+requested_modelin deploy result); A5:aima serve --no-openclaw-sync/AIMA_OPENCLAW_SYNC=manual; A6:undeployprints a disk-cleanup hint.Tests incl.: TestClampContextForMemory, TestUsableMemoryMiB, TestMergeSkipDefaultModel, TestProbeProxyReachable, TestBinaryManagerEnsureInstallsLocalZipBundle, TestBuildDownloadSourceListUsesEnterpriseMirrors, TestSyncUsesBackendModelTypeWhenCatalogMisses, TestOpenClawAutoSyncEnabled.
🤖 Generated with Claude Code