Skip to content

Commit 4631a4f

Browse files
mios-devclaude
andcommitted
auto-sync SOUL.md on hermes restart + tighter loop caps (max_turns=20)
Two recurring pain points from the last few operator chats: 1. SOUL.md propagation. Every code change to /usr/share/mios/ai/hermes-soul.md required a manual touch + systemctl restart on every host because the firstboot seed- logic only runs at boot. Worse: `cp` preserves mtime by default, so the firstboot-style content-diff check said "no change" and the new prose never reached the agent (operator-flagged earlier today -- WSLg rule didn't take effect after the deploy). Fix: new `mios-hermes-soul-sync` shim does ONLY the SOUL re-seed: * Compares source body (BEFORE marker) byte-by-byte with the matching prefix of target -- mtime-independent * cp + chown + chmod on diff * Preserves any `### MIOS-RUNTIME-CONTEXT-BEGIN` block firstboot appended (Discord defaults, etc.) * Exit 0 on noop OR sync; safe as an ExecStartPre * 1 = source missing, 2 = ownership failure (rare) Wired via a hermes-agent.service drop-in (etc/systemd/system/hermes-agent.service.d/50-soul-sync.conf) so every `systemctl restart hermes-agent.service` re-seeds. Verified live: restart logged "[mios-hermes-soul-sync] body current (23195 bytes) -- noop". 2. Tool-loop guardrails were too loose. Recent chats had agent running 19-24 tool calls before giving up. Two layers fixed: a. New `agent.max_turns: 20` config (was using upstream default 90). This caps the OPERATOR-facing turn at 20 iterations. delegate_task children still get their own 90-turn budget. Most operator asks resolve in 1-3 calls; 20 is generous for the multi-step ones. b. tool_loop_guardrails thresholds tightened: warn_after: exact 2 -> 1, same_tool 3 -> 2, no_progress 2 -> 1 hard_stop_after: exact 5 -> 3, same_tool 8 -> 4, no_progress 5 -> 3 Original incident class: agent called web_extract 4 times in a row against searxng (which can only search, not extract) before giving up. New thresholds stop after 2 same-tool failures (warn) / 4 (hard stop). Idempotent no-progress (browser_navigate without CDP) cuts off at 3 calls. Both changes mirrored in mios-hermes-firstboot so fresh image builds get them out of the box. Live config patched in-place via yaml.safe_dump; old config saved to config.yaml.bak-caps. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent f47cc6b commit 4631a4f

3 files changed

Lines changed: 158 additions & 8 deletions

File tree

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# /etc/systemd/system/hermes-agent.service.d/50-soul-sync.conf
2+
#
3+
# Re-sync /var/lib/mios/hermes/SOUL.md from
4+
# /usr/share/mios/ai/hermes-soul.md on every service start so SOUL.md
5+
# updates propagate without a manual touch+restart. Operator-flagged
6+
# 2026-05-17: WSLg rule didn't take effect after my deploy because
7+
# `cp` preserved mtime, so the firstboot-style content-diff check
8+
# said "no change" -- the new prose never reached the agent.
9+
#
10+
# mios-hermes-soul-sync exits 0 on noop OR successful sync, so this
11+
# ExecStartPre is safe to keep enabled even when SOUL.md is current.
12+
# It also preserves any operator-appended runtime-context block (the
13+
# ### MIOS-RUNTIME-CONTEXT-BEGIN section firstboot adds for Discord
14+
# defaults, etc.).
15+
16+
[Service]
17+
ExecStartPre=/usr/libexec/mios/mios-hermes-soul-sync

usr/libexec/mios/mios-hermes-firstboot

Lines changed: 24 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -816,19 +816,28 @@ approvals:
816816
817817
# Tool-loop guardrails. The agent can loop forever on bad tool calls;
818818
# hard_stop_enabled aborts after N consecutive same-kind failures so
819-
# a stuck delegate doesn't burn the whole agent budget. Thresholds
820-
# match the upstream-recommended wizard defaults.
819+
# a stuck delegate doesn't burn the whole agent budget.
820+
#
821+
# Operator-flagged 2026-05-17: chats hitting 19-24 tool calls were
822+
# below upstream's 5/8/5 thresholds (because each call was a
823+
# DIFFERENT tool -- agent thrashing across browser/web_extract/
824+
# skill_view rather than failing the same one repeatedly). The
825+
# total-call cap (agent.max_turns: 20 above) handles thrashing
826+
# across different tools; the per-failure thresholds below are now
827+
# tightened so the agent backs off SOONER on a single tool that
828+
# keeps erroring (web_extract returning "searxng is search-only"
829+
# four times before the agent gives up was the original incident).
821830
tool_loop_guardrails:
822831
warnings_enabled: true
823832
hard_stop_enabled: true
824833
warn_after:
825-
exact_failure: 2
826-
same_tool_failure: 3
827-
idempotent_no_progress: 2
834+
exact_failure: 1
835+
same_tool_failure: 2
836+
idempotent_no_progress: 1
828837
hard_stop_after:
829-
exact_failure: 5
830-
same_tool_failure: 8
831-
idempotent_no_progress: 5
838+
exact_failure: 3
839+
same_tool_failure: 4
840+
idempotent_no_progress: 3
832841
833842
# Sessions: prune old sessions automatically. Keeps $HERMES_HOME clean
834843
# without manual \`hermes session prune\` after every long-running build
@@ -853,6 +862,13 @@ sessions:
853862
# ~2-3s typical restart, 30s worst case. Operator directive 2026-05-15.
854863
agent:
855864
restart_drain_timeout: 20
865+
# Hard cap on tool-call iterations per chat. Upstream default is 90;
866+
# operator-flagged 2026-05-17 had chats running 19-24 tool calls
867+
# before giving up. For an OWUI conversational agent 20 is plenty
868+
# (most asks resolve in 1-3 calls; multi-step in ~10). Sub-agents
869+
# spawned via delegate_task still get their own 90-turn budget --
870+
# this cap is for the OPERATOR-facing turn, not the children.
871+
max_turns: 20
856872
EOF
857873
umask 0022
858874
chown 820:820 "$HERMES_DATA_CFG" 2>/dev/null || true
Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
#!/bin/bash
2+
# /usr/libexec/mios/mios-hermes-soul-sync
3+
#
4+
# Idempotent SOUL.md propagation. The hardened vendor SOUL.md lives at
5+
# /usr/share/mios/ai/hermes-soul.md and is updated by code changes
6+
# (commit/rebuild). The agent reads $HERMES_HOME/SOUL.md fresh on every
7+
# message, but only after firstboot has SEEDED that path. Without this
8+
# shim, SOUL.md updates between firstboot runs require a manual
9+
# touch+restart on every operator host (operator-flagged 2026-05-17:
10+
# WSLg rule didn't propagate after my deploy because cp preserved
11+
# mtime).
12+
#
13+
# What this does:
14+
# 1. Resolve $HERMES_HOME (defaults to /var/lib/mios/hermes).
15+
# 2. If target SOUL.md exists AND lacks 'MiOS-managed' marker, leave
16+
# it alone -- the operator has taken ownership.
17+
# 3. If source body (BEFORE any appended runtime-context block) is
18+
# byte-identical to target body, noop (preserves operator's
19+
# appended Discord context, etc.).
20+
# 4. Otherwise: cp source over target, restore ownership +
21+
# permissions, log.
22+
#
23+
# Wired as ExecStartPre to hermes-agent.service via a drop-in so SOUL.md
24+
# is always fresh when hermes boots. Also runnable from firstboot, the
25+
# operator console, or an "update SOUL.md" automation.
26+
#
27+
# Exit codes:
28+
# 0 = noop or successful sync
29+
# 1 = source missing (logged, non-fatal -- hermes still starts)
30+
# 2 = target ownership/permission failure (rare)
31+
32+
set -uo pipefail
33+
34+
SRC="${MIOS_HERMES_SOUL_SRC:-/usr/share/mios/ai/hermes-soul.md}"
35+
HERMES_HOME="${HERMES_HOME:-/var/lib/mios/hermes}"
36+
TARGET="${HERMES_HOME}/SOUL.md"
37+
SOUL_USER="${MIOS_HERMES_USER:-mios-hermes}"
38+
SOUL_GROUP="${MIOS_HERMES_GROUP:-mios-hermes}"
39+
# Match the marker firstboot uses; preserve appended context across syncs.
40+
SOUL_CTX_MARKER="### MIOS-RUNTIME-CONTEXT-BEGIN"
41+
42+
log() { echo "[mios-hermes-soul-sync] $*"; }
43+
44+
if [ ! -r "$SRC" ]; then
45+
log "source missing: $SRC -- nothing to sync"
46+
exit 1
47+
fi
48+
49+
# If target absent or shorter than source, just copy.
50+
if [ ! -f "$TARGET" ]; then
51+
install -d -m 0755 -o "$SOUL_USER" -g "$SOUL_GROUP" "$HERMES_HOME" 2>/dev/null || \
52+
install -d -m 0755 "$HERMES_HOME"
53+
cp -f "$SRC" "$TARGET"
54+
chown "$SOUL_USER:$SOUL_GROUP" "$TARGET" 2>/dev/null || true
55+
chmod 0644 "$TARGET"
56+
log "seeded $TARGET (target absent)"
57+
exit 0
58+
fi
59+
60+
# Operator-owned? Don't touch.
61+
if ! grep -q 'MiOS-managed' "$TARGET" 2>/dev/null; then
62+
log "target lacks 'MiOS-managed' marker -- operator-owned, leaving alone"
63+
exit 0
64+
fi
65+
66+
# Compare body-only. Body is everything BEFORE the runtime-context
67+
# marker in the TARGET; everything in SOURCE is body (source has no
68+
# runtime-context, firstboot appends it after copy).
69+
SRC_BYTES=$(wc -c < "$SRC")
70+
# Extract the matching prefix of the target (up to the marker line).
71+
TARGET_BODY_TMP=$(mktemp)
72+
awk -v m="$SOUL_CTX_MARKER" '
73+
{ if (index($0, m) > 0) { exit } else { print } }
74+
' "$TARGET" > "$TARGET_BODY_TMP"
75+
TARGET_BODY_BYTES=$(wc -c < "$TARGET_BODY_TMP")
76+
77+
# Strip trailing whitespace from target body so the comparison ignores
78+
# the blank-line padding firstboot adds before the marker.
79+
sed -i -e '$ { /^$/d; }' "$TARGET_BODY_TMP" 2>/dev/null || true
80+
# Re-measure after the trim.
81+
TARGET_BODY_BYTES=$(wc -c < "$TARGET_BODY_TMP")
82+
83+
# If sizes match within a small tolerance AND cmp matches on the
84+
# source byte length, the body is current.
85+
if [ "$TARGET_BODY_BYTES" -ge "$SRC_BYTES" ] && \
86+
cmp -s -n "$SRC_BYTES" "$SRC" "$TARGET"; then
87+
log "body current ($SRC_BYTES bytes) -- noop"
88+
rm -f "$TARGET_BODY_TMP"
89+
exit 0
90+
fi
91+
92+
# Body differs -- preserve any appended runtime-context block, then
93+
# rewrite the body from source.
94+
RUNTIME_CTX_TMP=$(mktemp)
95+
# Lines from marker onwards (inclusive of marker). awk '/marker/,EOF'
96+
# would include the marker; cleaner: print from marker hit onwards.
97+
awk -v m="$SOUL_CTX_MARKER" '
98+
BEGIN { keep = 0 }
99+
{ if (index($0, m) > 0) keep = 1; if (keep) print }
100+
' "$TARGET" > "$RUNTIME_CTX_TMP"
101+
102+
NEW_TMP=$(mktemp)
103+
cat "$SRC" > "$NEW_TMP"
104+
if [ -s "$RUNTIME_CTX_TMP" ]; then
105+
printf '\n\n' >> "$NEW_TMP"
106+
cat "$RUNTIME_CTX_TMP" >> "$NEW_TMP"
107+
fi
108+
109+
# Atomic write, preserving target ownership.
110+
cp -f "$NEW_TMP" "$TARGET"
111+
chown "$SOUL_USER:$SOUL_GROUP" "$TARGET" 2>/dev/null || true
112+
chmod 0644 "$TARGET"
113+
rm -f "$TARGET_BODY_TMP" "$RUNTIME_CTX_TMP" "$NEW_TMP"
114+
115+
NEW_BYTES=$(wc -c < "$TARGET")
116+
log "synced $TARGET (body=$SRC_BYTES bytes, total=$NEW_BYTES bytes with runtime-context)"
117+
exit 0

0 commit comments

Comments
 (0)