baiying/AMD395 integration (2026-06): ctx sizing, model knowledge, openclaw controls, engine-registry env, A1/A5/A6 fixes by rjckkkkk · Pull Request #94 · Approaching-AI/AIMA

rjckkkkk · 2026-06-23T04:34:38Z

Consolidated, clean develop PR for the whole 2026-06 baiying/AMD395 integration stack (delivered to partners on the amd395-win handoff branch; this straightens it into one mergeable PR). Supersedes #92 and #93. Code only — dist exes/docs live on amd395-win, not develop.

Commits (each builds + tests pass):

Hardware-aware ctx_size clamp — read GGUF arch, clamp ctx to fit memory + trained context; usableMemoryMiB prefers the iGPU pool (APU OS-RAM under-report). Lets large catalog defaults degrade gracefully.
Partner model context windows — GLM-4.7-Flash (202752), Qwen3.6-35B-A3B (262144, new llama.cpp variant + aliases), new Qwen3-Embedding-4B (--embedding, 40960). (A7)
OpenClaw sync controls — AIMA_OPENCLAW_SET_DEFAULT (don't touch user's primary) + AIMA_OPENCLAW_CONFIG (custom config dir, e.g. .byClaw).
06-18 round — engine image registry via env (AIMA_ENGINE_REGISTRIES/AIMA_ENGINE_REGISTRY) + offline local bundle (issue 4 / B2); OpenClaw sync uses backend-reported model type when the catalog misses, so non-catalog models aren't skipped (issue 5 / A2).
Integration fixes (2026-06-22 doc A-items) — A1: undeploy/status/logs canonicalize the model name so the original deploy-time alias works (+ requested_model in deploy result); A5: aima serve --no-openclaw-sync / AIMA_OPENCLAW_SYNC=manual; A6: undeploy prints a disk-cleanup hint.

Tests incl.: TestClampContextForMemory, TestUsableMemoryMiB, TestMergeSkipDefaultModel, TestProbeProxyReachable, TestBinaryManagerEnsureInstallsLocalZipBundle, TestBuildDownloadSourceListUsesEnterpriseMirrors, TestSyncUsesBackendModelTypeWhenCatalogMisses, TestOpenClawAutoSyncEnabled.

🤖 Generated with Claude Code

…text A high catalog/user ctx_size (e.g. the 128000 default we now want for agent clients) would OOM llama-server at load on memory-constrained machines — AIMA previously passed ctx_size through unchanged (estimateVRAM ignores the KV cache; CheckFit only adjusts vLLM/SGLang gpu_memory_utilization, never llama.cpp ctx). On deploy, for llama.cpp GGUF models, size the context to the hardware: - Read the model's real architecture from the GGUF header (new model.ReadKVArch: block_count / head_count_kv / head_dim / context_length) and compute exact f16 KV bytes/token — not a guess. - Clamp ctx_size down so weights + projector + KV cache fit usable memory (GPU VRAM for discrete GPUs, RAM minus an OS reserve for unified/CPU hosts), and cap it at the model's trained context. Only ever lowers; never raises. - Graceful no-op when the GGUF arch can't be read or memory is unknown. So a high default degrades gracefully: big boxes get the full context, small ones auto-shrink to what fits instead of failing. Bump Qwen2.5-VL-3B default ctx_size to 128000 (its trained max) now that it's memory-safe; the agent floor (~9.3K of MCP tool schemas) always fits where the model fits. clampContextForMemory / usableMemoryMiB are pure and unit-tested; verified on the Strix Halo rig (request 200000 → clamped to trained 128000). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…en3-Embedding-4B Set each model's catalog default ctx_size to its full trained context (verified loading + serving on the Strix Halo iGPU, 2026-06-16): - glm-4.7-flash llamacpp variant: ctx_size 8192 -> 202752. - qwen3.6-35b-a3b: add a universal llamacpp GGUF variant (ctx_size 262144) + gguf source + scan-name aliases (Qwen3.6-35B-A3B-UD-Q4_K_M, qwen3.6-35b-a3b-q4_k_m); it previously had only vLLM/Blackwell variants. - new qwen3-embedding-4b.yaml: type embedding, llamacpp variant with embedding=true (-> llama-server --embedding) + ctx_size 40960; serves /v1/embeddings (2560-dim). Deploy-only — embedding models are not written into OpenClaw config by sync. These large ctx defaults are memory-safe via the deploy-time auto-clamp. Also fix usableMemoryMiB to prefer GPU VRAM over system RAM: an all-offloaded llama.cpp model is bounded by GPU memory, and on unified-memory APUs the OS-visible RAM is under-reported (Strix Halo: ~32GB OS vs ~110GB iGPU). The clamp now uses the iGPU pool, so it won't wrongly shrink contexts on such APUs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… (env) Two product-level controls so a partner integrating AIMA owns OpenClaw behavior without code changes on their side (both honor CLI sync AND the serve auto-sync loop): - AIMA_OPENCLAW_SET_DEFAULT=false — register the AIMA provider + models but do NOT set OpenClaw's primary/default chat model (leave the user's current primary alone). Unset = previous behavior (AIMA sets it). Threaded as Deps.SetDefaultModel *bool → SyncResult.SkipDefaultModel → mergeChatModelDefault early-returns (preserving prior ownership record so toggling back on works). - AIMA_OPENCLAW_CONFIG=<path>/openclaw.json — write to a custom config dir (e.g. a partner using .byClaw instead of .openclaw). Skills, extensions and managed-state all follow filepath.Dir(ConfigPath), so the whole set relocates. This is purely AIMA-side and independent of the OpenClaw/byClaw product's own layout. Verified on the Strix Halo rig: SET_DEFAULT=false → provider written, primary empty; default → primary set; custom config dir → config+skills land under .byClaw. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…log-miss fallback (code-only) Issue 4 (engine image registry / offline via env) + issue 5 (openclaw sync uses backend model type when catalog misses). dist artifacts excluded from develop PR. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ch, A6 undeploy hint A1 (P0) — deploy canonicalizes the model name (alias "Qwen2.5-VL-3B-Instruct-Q4_K_M" → "qwen2.5-vl-3b-instruct"), so undeploy/status/logs with the original alias failed to match. Add canonicalModelAlt(cat, name) and retry undeploy/status/logs with the canonical name; surface the original via a requested_model field in deploy results. A5 — make the serve OpenClaw auto-sync loop switchable: `aima serve --no-openclaw-sync` or AIMA_OPENCLAW_SYNC=manual|off|false|0|no disables it (sync then only on explicit `aima openclaw sync`). Default unchanged (auto). A6 — undeploy now prints a hint that model files are kept and how to free disk (`aima model remove --delete-files <name>`); undeploy semantics unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

rjckkkkk and others added 5 commits June 23, 2026 04:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

baiying/AMD395 integration (2026-06): ctx sizing, model knowledge, openclaw controls, engine-registry env, A1/A5/A6 fixes#94

baiying/AMD395 integration (2026-06): ctx sizing, model knowledge, openclaw controls, engine-registry env, A1/A5/A6 fixes#94
rjckkkkk wants to merge 5 commits into
developfrom
feat/baiying-integration-2026-06

rjckkkkk commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

rjckkkkk commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant