Skip to content

Commit 600df69

Browse files
mios-devclaude
andcommitted
fix(ai-plane): make a fresh irm|iex install bring up MiOS AI operational on GPU
Live-verified on a fresh dev VM: the :8640 front door now runs the full orchestration pipeline end-to-end and returns a clean answer from granite-4.1-8b on the RTX 4090 (181 tok/s). Root-caused + fixed the whole inert-AI chain: - system-sync-env.sh: generate_env ended on a false `[[ -n "$SECRET" ]] && echo` -> non-zero under set -e -> the install.env write was aborted BEFORE the mv, so the env bridge silently produced nothing on every secret-less host. Force `return 0`. Also emit resolved MIOS_PORT_* as their own numeric vars (systemd EnvironmentFile + Python don't expand ${...} from sibling lines). - mios-hermes-firstboot: VRAM probe used `command -v nvidia-smi`, but WSL2 ships it at /usr/lib/wsl/lib/nvidia-smi which is NOT on systemd's PATH -> a 24GB RTX 4090 read as 0GB -> small tier. Probe explicit candidate paths. - agent-pipe server.py: (1) _toml_section + the [agents.*] registry now expand ${MIOS_PORT_*} endpoint templates (systemd/Python don't); (2) _pick_agent is degrade-open -- a health_gate primary the liveness cache can't confirm has its endpoint blanked (-> BACKEND) and model reset; (3) at the proxy chokepoint, when dispatch resolves to the BACKEND light lane the model is pinned to BACKEND_MODEL (else llama-swap "no router for requested model"). - mios.toml: the :8643 heavy hermes-worker is health_gate=true so the orchestrator drops it when the heavy lane is gated off (degrade-open) instead of 502-ing the front door; it auto-rejoins when the lane is enabled. (firstboot EnvironmentFile=/etc/mios/install.env + userenv.sh-deploy gaps landed in a7cca48 / the overlay.) install-robustness 2026-06-21. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent a7cca48 commit 600df69

4 files changed

Lines changed: 106 additions & 14 deletions

File tree

usr/lib/mios/agent-pipe/server.py

Lines changed: 66 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -404,7 +404,23 @@ def _toml_section(section: str) -> dict:
404404
out.update(_layer)
405405
except Exception: # noqa: BLE001 -- best-effort; callers fall to literals
406406
log.warning("Failed to load overlay config section %s", section, exc_info=True)
407-
return out
407+
# Expand ${MIOS_PORT_*}/$VAR placeholders in string values against the
408+
# process env (install.env supplies MIOS_PORT_*). mios.toml stores endpoint
409+
# URLs as deferred-expansion templates ("http://localhost:${MIOS_PORT_HERMES_WORKER}/v1");
410+
# systemd EnvironmentFile and Python do NOT expand ${...}, so without this
411+
# the agent registry got a LITERAL "${MIOS_PORT_HERMES_WORKER}" port ->
412+
# httpx InvalidURL -> the :8640 front door 500'd on every request. expandvars
413+
# only touches $-prefixed tokens (ordinary values untouched; an unknown var
414+
# is left verbatim). install-robustness 2026-06-21.
415+
def _xpand(v):
416+
if isinstance(v, str):
417+
return os.path.expandvars(v) if "$" in v else v
418+
if isinstance(v, dict):
419+
return {k: _xpand(x) for k, x in v.items()}
420+
if isinstance(v, list):
421+
return [_xpand(x) for x in v]
422+
return v
423+
return _xpand(out)
408424

409425

410426
def _cfg_num(table: dict, env: str, key: str, default, cast=int):
@@ -3829,7 +3845,12 @@ def _load_agent_registry() -> dict[str, dict]:
38293845
if not isinstance(cfg, dict):
38303846
continue
38313847
registry[name] = {
3832-
"endpoint": str(cfg.get("endpoint", "")).rstrip("/"),
3848+
# expandvars: [agents.*].endpoint is stored as a deferred
3849+
# ${MIOS_PORT_*} template (e.g. the :8643 hermes-worker); the
3850+
# env supplies the numeric port (install.env). Without this the
3851+
# registry kept a literal "${MIOS_PORT_HERMES_WORKER}" -> httpx
3852+
# InvalidURL -> :8640 500 on every request. install-robustness 2026-06-21.
3853+
"endpoint": os.path.expandvars(str(cfg.get("endpoint", ""))).rstrip("/"),
38333854
"model": str(cfg.get("model", name)),
38343855
"role": str(cfg.get("role", "general")),
38353856
"default": bool(cfg.get("default", False)),
@@ -6765,18 +6786,44 @@ def _validate_enum_args(tool: str, args: dict) -> Optional[str]:
67656786

67666787
def _pick_agent(role: str) -> tuple[str, dict]:
67676788
"""Pick a sub-agent by role match. Order: exact-role -> default
6768-
-> first registered. Returns (name, cfg)."""
6789+
-> first registered. Returns (name, cfg).
6790+
6791+
Degrade-open (install-robustness 2026-06-21): if the chosen agent is a
6792+
health_gate (come-and-go) node -- e.g. the :8643 hermes-worker bound to the
6793+
heavy GPU lane, which is gated off by default -- that the liveness cache does
6794+
NOT confirm reachable, blank its endpoint so the caller's `endpoint or
6795+
BACKEND` falls back to the always-on local lane. Without this the PRIMARY
6796+
dispatch went to a dead gated worker -> httpx "All connection attempts
6797+
failed" -> 502 on EVERY turn on any host where that lane is down (a fresh
6798+
dev VM, a CPU host). The worker is still used the moment the probe confirms
6799+
it live (heavy lane enabled)."""
67696800
role = (role or "").lower().strip()
6801+
chosen = None
67706802
if role:
67716803
for name, cfg in _AGENT_REGISTRY.items():
67726804
if cfg.get("role", "").lower() == role:
6773-
return name, cfg
6774-
for name, cfg in _AGENT_REGISTRY.items():
6775-
if cfg.get("default"):
6776-
return name, cfg
6777-
# Whatever is first.
6778-
name = next(iter(_AGENT_REGISTRY))
6779-
return name, _AGENT_REGISTRY[name]
6805+
chosen = (name, cfg)
6806+
break
6807+
if chosen is None:
6808+
for name, cfg in _AGENT_REGISTRY.items():
6809+
if cfg.get("default"):
6810+
chosen = (name, cfg)
6811+
break
6812+
if chosen is None:
6813+
_n = next(iter(_AGENT_REGISTRY))
6814+
chosen = (_n, _AGENT_REGISTRY[_n])
6815+
name, cfg = chosen
6816+
if cfg.get("health_gate"):
6817+
_c = _NODE_LIVE.get(name)
6818+
if not (_c and _c[1]): # not confirmed reachable -> fall back to BACKEND
6819+
# Blank the endpoint AND swap the model: this agent's model (e.g.
6820+
# the worker's heavy "mios-heavy") is NOT served by BACKEND (the
6821+
# light llama-swap lane), so keeping it yields llama-swap "no router
6822+
# for requested model". Reset to MIOS_AI_MODEL (the light-lane
6823+
# default) so the fallback request routes. install-robustness 2026-06-21.
6824+
_fb_model = (os.environ.get("MIOS_AI_MODEL") or "").strip()
6825+
cfg = {**cfg, "endpoint": "", **({"model": _fb_model} if _fb_model else {})}
6826+
return name, cfg
67806827

67816828

67826829
# Trivial-input bypass regex -- short messages with no question
@@ -28911,6 +28958,15 @@ async def _stream_backend() -> AsyncGenerator[bytes, None]:
2891128958
# Non-streaming: run the enrich passes (no live emits on this path) and
2891228959
# build the proxy body -- same _finalize the streaming generator runs live.
2891328960
_sys_prefix, proxy_body = await _finalize()
28961+
# Pin the model to the lane this request is ACTUALLY dispatched to. The front
28962+
# door advertises a single virtual model ("MiOS AI") and sub-agents carry
28963+
# lane-specific models (e.g. the heavy worker's "mios-heavy"); when the
28964+
# primary resolves to the BACKEND light lane -- including the health-gate
28965+
# fallback when the heavy worker is down -- that incoming/heavy model is NOT
28966+
# served there, so llama-swap returns "no router for requested model". Force
28967+
# BACKEND_MODEL so the fallback request routes. install-robustness 2026-06-21.
28968+
if str(target_endpoint).rstrip("/") == str(BACKEND).rstrip("/"):
28969+
proxy_body["model"] = BACKEND_MODEL
2891428970
proxy_bytes = json.dumps(proxy_body).encode("utf-8")
2891528971
client = await _get_client()
2891628972
# Council fan-out on the NON-streaming path too (operator 2026-05-22

usr/libexec/mios/mios-hermes-firstboot

Lines changed: 19 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -149,12 +149,27 @@ _small_m="$(_mios_toml_value 'ai.host_thresholds' 'small_ram_model' "$_AI_MODEL_
149149
# neither exists or returns 0, the host is CPU-only and we stay
150150
# on small_ram_model.
151151
_vram_gb=0
152-
if command -v nvidia-smi >/dev/null 2>&1; then
153-
_vram_mib="$(nvidia-smi --query-gpu=memory.total --format=csv,noheader,nounits 2>/dev/null \
152+
# WSL2 ships nvidia-smi at /usr/lib/wsl/lib/nvidia-smi, which is NOT on the
153+
# systemd unit PATH (/usr/local/bin:/usr/bin). So `command -v nvidia-smi`
154+
# returned nothing under firstboot's systemd context and a 24 GB RTX 4090 was
155+
# mis-detected as 0 GB -> small tier instead of mid (operator-confirmed
156+
# 2026-06-21: "WHAT ARE YOU TALKING ABOUT CPU-ONLY"). Probe explicit
157+
# locations (incl. the WSL path) so detection works regardless of PATH.
158+
# install-robustness 2026-06-21.
159+
_nvsmi=""
160+
for _c in nvidia-smi /usr/lib/wsl/lib/nvidia-smi /usr/bin/nvidia-smi /opt/cuda/bin/nvidia-smi; do
161+
if command -v "$_c" >/dev/null 2>&1; then _nvsmi="$_c"; break; fi
162+
done
163+
_rocmsmi=""
164+
for _c in rocm-smi /opt/rocm/bin/rocm-smi /usr/bin/rocm-smi; do
165+
if command -v "$_c" >/dev/null 2>&1; then _rocmsmi="$_c"; break; fi
166+
done
167+
if [[ -n "$_nvsmi" ]]; then
168+
_vram_mib="$("$_nvsmi" --query-gpu=memory.total --format=csv,noheader,nounits 2>/dev/null \
154169
| awk 'BEGIN{m=0} {if ($1+0 > m) m=$1+0} END{print m}')"
155170
_vram_gb=$(( ${_vram_mib:-0} / 1024 ))
156-
elif command -v rocm-smi >/dev/null 2>&1; then
157-
_vram_mib="$(rocm-smi --showmeminfo vram --csv 2>/dev/null \
171+
elif [[ -n "$_rocmsmi" ]]; then
172+
_vram_mib="$("$_rocmsmi" --showmeminfo vram --csv 2>/dev/null \
158173
| awk -F, 'NR>1 {gsub(/[^0-9]/,"",$2); if ($2+0 > m) m=$2+0} END{print m+0}')"
159174
# rocm-smi reports bytes; convert
160175
_vram_gb=$(( ${_vram_mib:-0} / 1024 / 1024 / 1024 ))

usr/libexec/mios/system-sync-env.sh

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -151,6 +151,18 @@ EOF
151151
[[ -n "${MIOS_PORT_SGLANG:-}" ]] && echo "MIOS_AI_HEAVY_ENDPOINT=\"http://localhost:${MIOS_PORT_SGLANG}/v1\""
152152
[[ -n "${MIOS_PORT_VLLM:-}" ]] && echo "MIOS_AI_HEAVY_ALT_ENDPOINT=\"http://localhost:${MIOS_PORT_VLLM}/v1\""
153153

154+
# Resolved service ports (SSOT [ports].*). Emitted as NUMERIC vars so
155+
# EnvironmentFile= consumers (agent-pipe, hermes) AND ${MIOS_PORT_*}
156+
# templates in mios.toml endpoint URLs can resolve -- systemd and Python
157+
# do NOT expand ${...} from sibling env lines, so the ports must exist as
158+
# their own vars. Without this the agent-pipe read a LITERAL
159+
# "${MIOS_PORT_HERMES_WORKER}" worker port -> httpx InvalidURL -> :8640 500.
160+
# install-robustness 2026-06-21.
161+
for _pk in MIOS_PORT_LLM_LIGHT MIOS_PORT_HERMES MIOS_PORT_HERMES_WORKER MIOS_PORT_AGENT_PIPE MIOS_PORT_PREFILTER MIOS_PORT_OPENCODE MIOS_PORT_SGLANG MIOS_PORT_VLLM; do
162+
_pv="${!_pk:-}"
163+
if [[ -n "$_pv" ]]; then echo "${_pk}=\"${_pv}\""; fi
164+
done
165+
154166
# Image
155167
[[ -n "${MIOS_IMAGE_REF:-}" ]] && echo "MIOS_IMAGE_REF=\"${MIOS_IMAGE_REF}\""
156168
[[ -n "${MIOS_BRANCH:-}" ]] && echo "MIOS_BRANCH=\"${MIOS_BRANCH}\""

usr/share/mios/mios.toml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -876,6 +876,15 @@ font_size = 14
876876
endpoint = "http://localhost:${MIOS_PORT_HERMES_WORKER}/v1"
877877
model = "mios-heavy"
878878
role = "general"
879+
# health_gate: the :8643 hermes-worker is a SEPARATE service bound to the heavy
880+
# GPU lane (mios-heavy), which is gated OFF by default (VRAM / operator opt-in).
881+
# Marking it health-gated makes the orchestrator liveness-probe it and DROP it
882+
# when unreachable (degrade-open) instead of dispatching the FINAL answer to a
883+
# dead endpoint -> "All connection attempts failed" / 502. It auto-rejoins once
884+
# the operator enables the heavy lane and the worker comes up. Without this the
885+
# :8640 front door 502'd on every turn on any host where the worker is down
886+
# (e.g. a fresh dev VM with the heavy lane gated). install-robustness 2026-06-21.
887+
health_gate = true
879888
# WS-2 per-agent RBAC (optional; default = NO restriction = unchanged behaviour):
880889
# cap THIS agent's tool surface to what its role should touch, enforced by
881890
# _agent_rbac_filter at dispatch. denied_verbs drops the named verbs; allowed_verbs

0 commit comments

Comments
 (0)