Skip to content

baiying/AMD395 integration (2026-06): ctx sizing, model knowledge, openclaw controls, engine-registry env, A1/A5/A6 fixes#94

Open
rjckkkkk wants to merge 5 commits into
developfrom
feat/baiying-integration-2026-06
Open

baiying/AMD395 integration (2026-06): ctx sizing, model knowledge, openclaw controls, engine-registry env, A1/A5/A6 fixes#94
rjckkkkk wants to merge 5 commits into
developfrom
feat/baiying-integration-2026-06

Conversation

@rjckkkkk

Copy link
Copy Markdown
Collaborator

Consolidated, clean develop PR for the whole 2026-06 baiying/AMD395 integration stack (delivered to partners on the amd395-win handoff branch; this straightens it into one mergeable PR). Supersedes #92 and #93. Code only — dist exes/docs live on amd395-win, not develop.

Commits (each builds + tests pass):

  1. Hardware-aware ctx_size clamp — read GGUF arch, clamp ctx to fit memory + trained context; usableMemoryMiB prefers the iGPU pool (APU OS-RAM under-report). Lets large catalog defaults degrade gracefully.
  2. Partner model context windows — GLM-4.7-Flash (202752), Qwen3.6-35B-A3B (262144, new llama.cpp variant + aliases), new Qwen3-Embedding-4B (--embedding, 40960). (A7)
  3. OpenClaw sync controlsAIMA_OPENCLAW_SET_DEFAULT (don't touch user's primary) + AIMA_OPENCLAW_CONFIG (custom config dir, e.g. .byClaw).
  4. 06-18 round — engine image registry via env (AIMA_ENGINE_REGISTRIES/AIMA_ENGINE_REGISTRY) + offline local bundle (issue 4 / B2); OpenClaw sync uses backend-reported model type when the catalog misses, so non-catalog models aren't skipped (issue 5 / A2).
  5. Integration fixes (2026-06-22 doc A-items) — A1: undeploy/status/logs canonicalize the model name so the original deploy-time alias works (+ requested_model in deploy result); A5: aima serve --no-openclaw-sync / AIMA_OPENCLAW_SYNC=manual; A6: undeploy prints a disk-cleanup hint.

Tests incl.: TestClampContextForMemory, TestUsableMemoryMiB, TestMergeSkipDefaultModel, TestProbeProxyReachable, TestBinaryManagerEnsureInstallsLocalZipBundle, TestBuildDownloadSourceListUsesEnterpriseMirrors, TestSyncUsesBackendModelTypeWhenCatalogMisses, TestOpenClawAutoSyncEnabled.

🤖 Generated with Claude Code

rjckkkkk and others added 5 commits June 23, 2026 04:32
…text

A high catalog/user ctx_size (e.g. the 128000 default we now want for agent
clients) would OOM llama-server at load on memory-constrained machines — AIMA
previously passed ctx_size through unchanged (estimateVRAM ignores the KV cache;
CheckFit only adjusts vLLM/SGLang gpu_memory_utilization, never llama.cpp ctx).

On deploy, for llama.cpp GGUF models, size the context to the hardware:
- Read the model's real architecture from the GGUF header (new model.ReadKVArch:
  block_count / head_count_kv / head_dim / context_length) and compute exact f16
  KV bytes/token — not a guess.
- Clamp ctx_size down so weights + projector + KV cache fit usable memory (GPU
  VRAM for discrete GPUs, RAM minus an OS reserve for unified/CPU hosts), and cap
  it at the model's trained context. Only ever lowers; never raises.
- Graceful no-op when the GGUF arch can't be read or memory is unknown.

So a high default degrades gracefully: big boxes get the full context, small ones
auto-shrink to what fits instead of failing. Bump Qwen2.5-VL-3B default ctx_size
to 128000 (its trained max) now that it's memory-safe; the agent floor (~9.3K of
MCP tool schemas) always fits where the model fits.

clampContextForMemory / usableMemoryMiB are pure and unit-tested; verified on the
Strix Halo rig (request 200000 → clamped to trained 128000).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…en3-Embedding-4B

Set each model's catalog default ctx_size to its full trained context (verified
loading + serving on the Strix Halo iGPU, 2026-06-16):
- glm-4.7-flash llamacpp variant: ctx_size 8192 -> 202752.
- qwen3.6-35b-a3b: add a universal llamacpp GGUF variant (ctx_size 262144) + gguf
  source + scan-name aliases (Qwen3.6-35B-A3B-UD-Q4_K_M, qwen3.6-35b-a3b-q4_k_m);
  it previously had only vLLM/Blackwell variants.
- new qwen3-embedding-4b.yaml: type embedding, llamacpp variant with embedding=true
  (-> llama-server --embedding) + ctx_size 40960; serves /v1/embeddings (2560-dim).
  Deploy-only — embedding models are not written into OpenClaw config by sync.

These large ctx defaults are memory-safe via the deploy-time auto-clamp.

Also fix usableMemoryMiB to prefer GPU VRAM over system RAM: an all-offloaded
llama.cpp model is bounded by GPU memory, and on unified-memory APUs the OS-visible
RAM is under-reported (Strix Halo: ~32GB OS vs ~110GB iGPU). The clamp now uses the
iGPU pool, so it won't wrongly shrink contexts on such APUs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… (env)

Two product-level controls so a partner integrating AIMA owns OpenClaw behavior
without code changes on their side (both honor CLI sync AND the serve auto-sync loop):

- AIMA_OPENCLAW_SET_DEFAULT=false — register the AIMA provider + models but do NOT
  set OpenClaw's primary/default chat model (leave the user's current primary alone).
  Unset = previous behavior (AIMA sets it). Threaded as Deps.SetDefaultModel *bool →
  SyncResult.SkipDefaultModel → mergeChatModelDefault early-returns (preserving prior
  ownership record so toggling back on works).
- AIMA_OPENCLAW_CONFIG=<path>/openclaw.json — write to a custom config dir (e.g. a
  partner using .byClaw instead of .openclaw). Skills, extensions and managed-state
  all follow filepath.Dir(ConfigPath), so the whole set relocates. This is purely
  AIMA-side and independent of the OpenClaw/byClaw product's own layout.

Verified on the Strix Halo rig: SET_DEFAULT=false → provider written, primary empty;
default → primary set; custom config dir → config+skills land under .byClaw.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…log-miss fallback (code-only)

Issue 4 (engine image registry / offline via env) + issue 5 (openclaw sync uses
backend model type when catalog misses). dist artifacts excluded from develop PR.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ch, A6 undeploy hint

A1 (P0) — deploy canonicalizes the model name (alias "Qwen2.5-VL-3B-Instruct-Q4_K_M"
→ "qwen2.5-vl-3b-instruct"), so undeploy/status/logs with the original alias failed
to match. Add canonicalModelAlt(cat, name) and retry undeploy/status/logs with the
canonical name; surface the original via a requested_model field in deploy results.

A5 — make the serve OpenClaw auto-sync loop switchable: `aima serve --no-openclaw-sync`
or AIMA_OPENCLAW_SYNC=manual|off|false|0|no disables it (sync then only on explicit
`aima openclaw sync`). Default unchanged (auto).

A6 — undeploy now prints a hint that model files are kept and how to free disk
(`aima model remove --delete-files <name>`); undeploy semantics unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant