Skip to content

Commit e3b17a0

Browse files
mios-devclaude
andcommitted
mios-daemon + agent-pipe router: repoint at the iGPU lane (:11435)
With mios-ollama-igpu standing up the micro-LLM lane in the preceding commit, repoint the two known micro-LLM clients at it so dGPU/CUDA traffic (qwen2.5-coder:7b polish, big Hermes inference) stops competing with classifier + nudger requests for queue slots. usr/libexec/mios/mios-daemon MIOS_DAEMON_ENDPOINT default flips from :11434 -> :11435. The daemon IS the iGPU micro-LLM agent per the operator's stated architecture ("iGPU micro-llm(s)/mios-daemon agent collects Linux systems logs, journals, relevant AI files--etc-etc"); the prior default had it talking to the dGPU/CUDA lane instead. usr/lib/mios/agent-pipe/server.py MIOS_AGENT_PIPE_ROUTER_ENDPOINT default flips from :11434 -> :11435. The Layer-1 router classifier (qwen3:1.7b) now lives exclusively on the iGPU lane; under dGPU saturation router latency stays bounded because it never queues behind big-model inference. usr/lib/systemd/system/mios-agent-pipe.service Adds explicit Environment=MIOS_AGENT_PIPE_ROUTER_ENDPOINT line so an operator inspecting the unit (`systemctl cat mios-agent-pipe.service`) sees the routing decision without having to read the Python defaults. Live-verified on podman-MiOS-DEV: /health.router.endpoint -> http://localhost:11435 ✓ Chat fast-path end-to-end -> 7.2s first-call (cold ollama-rocm runner, CPU fallback because WSL kernel doesn't expose AMD /dev/kfd + /dev/dri) -> "Hello!" returned cleanly ✓ podman logs mios-ollama-igpu shows the POST landed at the new lane: [GIN] 200 | 7.195948393s | POST "/v1/chat/completions" ✓ On bare-metal MiOS-bootc with the AMD iGPU exposed in the host kernel, the same setup activates ROCm on the iGPU and router / nudger latency drops further. Same code path; deployment-time capability difference. Operator overrides remain available -- a deployment with only an NVIDIA dGPU (no AMD iGPU lane) can point both endpoints back at :11434 via /etc/mios/agent-pipe.env + the matching mios-daemon env override. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 5462671 commit e3b17a0

3 files changed

Lines changed: 19 additions & 2 deletions

File tree

usr/lib/mios/agent-pipe/server.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,8 +75,13 @@
7575
ROUTER_ENABLED = os.environ.get("MIOS_AGENT_PIPE_ROUTER_ENABLED",
7676
"true").lower() not in {"false", "0", "no"}
7777
ROUTER_MODEL = os.environ.get("MIOS_AGENT_PIPE_ROUTER_MODEL", "qwen3:1.7b")
78+
# Router runs the micro-LLM classifier (qwen3:1.7b) on the iGPU lane
79+
# (mios-ollama-igpu at :11435) -- isolates micro-LLM workload from the
80+
# dGPU/CUDA queue so router latency stays sub-second even when big-model
81+
# inference is saturating :11434. Falls back to the CUDA-ollama lane
82+
# if the iGPU instance is down (operator override via the env).
7883
ROUTER_ENDPOINT = os.environ.get(
79-
"MIOS_AGENT_PIPE_ROUTER_ENDPOINT", "http://localhost:11434"
84+
"MIOS_AGENT_PIPE_ROUTER_ENDPOINT", "http://localhost:11435"
8085
).rstrip("/")
8186
ROUTER_TIMEOUT_S = int(os.environ.get("MIOS_AGENT_PIPE_ROUTER_TIMEOUT_S", "12"))
8287
ROUTER_MAX_TOKENS = int(os.environ.get("MIOS_AGENT_PIPE_ROUTER_MAX_TOKENS", "200"))

usr/lib/systemd/system/mios-agent-pipe.service

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,12 @@ EnvironmentFile=-/etc/mios/agent-pipe.env
1818
Environment=MIOS_PORT_AGENT_PIPE=8640
1919
Environment=MIOS_AGENT_PIPE_BACKEND=http://localhost:8642/v1
2020
Environment=MIOS_AGENT_PIPE_BACKEND_MODEL=hermes-agent
21+
# Router calls qwen3:1.7b on the iGPU micro-LLM lane (:11435), not
22+
# the dGPU/CUDA lane (:11434). Keeps router latency sub-second under
23+
# dGPU load. Override via /etc/mios/agent-pipe.env if a deployment
24+
# doesn't have mios-ollama-igpu standing up (e.g., bare-metal with
25+
# only NVIDIA dGPU).
26+
Environment=MIOS_AGENT_PIPE_ROUTER_ENDPOINT=http://localhost:11435
2127
Environment=MIOS_DB_URL=http://localhost:8000
2228
Environment=MIOS_DB_USER=root
2329
Environment=MIOS_DB_PASS=root

usr/libexec/mios/mios-daemon

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,13 @@ log = logging.getLogger("mios-daemon")
7272
# iGPU CDI lane (wsl2-amd.yaml / wsl2-intel.yaml) when present, freeing
7373
# the dGPU for big-model work. Override via MIOS_DAEMON_MODEL.
7474
MODEL = os.environ.get("MIOS_DAEMON_MODEL", "qwen3:1.7b")
75-
ENDPOINT = os.environ.get("MIOS_DAEMON_ENDPOINT", "http://127.0.0.1:11434")
75+
# mios-ollama-igpu (sibling at :11435) is the canonical micro-LLM lane
76+
# for the daemon. Falls back to the CUDA-ollama lane (:11434) if the
77+
# iGPU instance isn't running. Operator directive 2026-05-18: "iGPU
78+
# micro-llm(s)/mios-daemon agent collects Linux systems logs, journals,
79+
# relevant AI files" -- the daemon IS the iGPU micro-LLM agent; this
80+
# default points it at the right lane.
81+
ENDPOINT = os.environ.get("MIOS_DAEMON_ENDPOINT", "http://127.0.0.1:11435")
7682
STATE_DIR = Path(os.environ.get("MIOS_DAEMON_STATE_DIR", "/var/lib/mios/daemon"))
7783
STATE_FILE = STATE_DIR / "state.json"
7884
CLASSIFY_BATCH_S = float(os.environ.get("MIOS_DAEMON_CLASSIFY_S", "30"))

0 commit comments

Comments
 (0)