feat: Telegram adapter send/poll circuit breaker (Closes #573)#8
feat: Telegram adapter send/poll circuit breaker (Closes #573)#8Da-Mikey wants to merge 657 commits into
Conversation
…o README Follow-up on salvaged #50347: the event surface table was missing the billing.step_up.verification switch case, and the File map omitted lib/perfPane.tsx.
…atch_tool (#50352) Answers a recurring plugin-author question: how to read the active profile and drive Hermes from inside a hook callback when ctx._cli_ref is None (gateway, hermes chat -q, and kanban-spawned worker sessions). - Adds a 'Act from inside a hook' section to the plugin guide covering ctx.profile_name and ctx.dispatch_tool as the session-agnostic APIs, with a kanban_task_blocked example, and notes there is no in-process slash-command bridge for headless workers (shell out via the terminal tool instead). - Adds the three kanban lifecycle hooks to the hook reference table with their process semantics. - Pins the contract with a regression test: ctx.dispatch_tool invokes a tool handler with _cli_ref=None (worker/hook context). Requested by @Smithangshu on Discord.
… launch The Windows update path can leave tracked ui-tui/ files deleted in the working tree (HEAD intact). The guard now self-heals: when ui-tui/ is missing in a git checkout, run `git restore -- ui-tui` and continue, falling back to the printed manual-recovery steps only when git can't recover it (no checkout / restore failed). Builds on konsisumer's missing-workspace guard.
… ui-tui Root cause of #49145: the Windows ZIP-update path did rmtree(dst) then copytree(src, dst). If the copy failed partway — common on that path, which only runs because file I/O is already flaky on the machine — the directory was left deleted with nothing copied back. ui-tui/ vanishing is what broke 'hermes --tui' (WinError 267), but the bug hit every top-level directory. _atomic_replace_dir stages the new copy into a sibling temp dir and only swaps it in on full success, restoring the original on failure. A failed update now leaves the live tree untouched instead of half-deleted.
In Telegram streaming, the typing indicator persisted through the slow final rich-text/MarkdownV2 finalize edit, so the '...typing' bubble lingered for seconds after the last streamed token. Add a one-shot on_before_finalize hook to GatewayStreamConsumer, fired once when the stream transitions into its finalization path, and wire it on both Telegram streaming call sites to call pause_typing_for_chat() before the final edit. Cover hook ordering and once-only behavior in tests. Fixes #49712
…d restart loop (#50381) When a Windows user relaunches Hermes while an in-app update is still running (the desktop vanished with no progress and looks crashed), the fresh instance spawns its own dashboard backend. That backend re-locks the venv shim, the updater's straggler cleanup (force_kill_other_hermes -> taskkill /F /T /IM hermes.exe) kills it, the launch dies with the 45s "backend didn't come up" timeout, and the user relaunches into the same trap -- an infinite respawn/kill loop (#50238). Root cause: no mutual exclusion between an applying update and a fresh desktop spawning its own local backend. Fix: the updater publishes a HERMES_HOME/.hermes-update-in-progress marker (pid + start time) for the whole run via an RAII drop-guard that removes it on every exit path (success, early return, panic). A freshly-launched desktop checks the marker before spawning its local backend and PARKS until the update finishes -- then brings the backend up itself (it is the surviving instance; the updater's own relaunch hits the single-instance lock and quits). A stale marker (dead pid or past a 20-minute ceiling) is pruned so a crashed updater can never strand future launches. No rogue backend spawns mid-update, so force_kill_other_hermes has nothing legitimate to kill. Marker parse/staleness logic is extracted to update-marker.cjs and unit-tested; the Rust guard has unit tests; the Rust-write <-> JS-read contract is E2E-verified.
Baileys' jidDecode crashes ("Cannot destructure property 'user' of
jidDecode(...) as it is undefined") when handed a bare phone number, so
sending a WhatsApp message to +50766715226 / 50766715226 returned HTTP
500 and never delivered (#8637).
Add to_whatsapp_jid() to gateway/whatsapp_identity.py — the outbound
inverse of normalize_whatsapp_identifier: it builds the JID a send must
use (bare phone -> <digits>@s.whatsapp.net) and passes through already
qualified JIDs (@g.us, @lid, status@broadcast, @newsletter) unchanged.
Wire it at every outbound bridge call site in the WhatsApp adapter
(send, edit, media, typing, get_chat_info, and the standalone cron /
send_message sender).
Co-authored-by: Hermes Agent <noreply@nousresearch.com>
Per @egilewski's audit on this PR (#15544), the original fix was correct but the file has refactored since: the four endpoint-local empty-peer checks have been consolidated into _ws_client_is_allowed and _ws_client_reason, but the helpers were left fail-open ('no peer host known means allow' / 'no reason to block'). On a loopback-bound dashboard with auth disabled, an ASGI server behind a misconfigured proxy or a unix-socket transport can deliver ws.client == None or ws.client.host == ''. The helpers were treating that as 'allowed', so the loopback-only peer gate could be bypassed by anything that suppressed the client tuple in transit. All four WebSocket endpoints (/api/pty, /api/ws, /api/pub, /api/events) route through _ws_request_is_allowed -> _ws_client_is_allowed, so the gap applied uniformly. Fix: * _ws_client_is_allowed: return False when client_host is empty instead of True. Only reached on loopback bind with auth disabled (auth_required=True and explicit non-loopback binds short-circuit earlier), so the fail-closed behavior is scoped to the surface that needs it. * _ws_client_reason: return a 'missing_or_empty_peer bound=...' block reason instead of None, so the dispatcher's existing reason-based rejection path picks it up and the close gets logged with a machine-parseable token for diagnosability. Behavior unchanged for: * gated mode (auth_required=True) — early-returns True before the empty-peer check runs. The OAuth ticket is the auth at that point. * explicit non-loopback bind (--host 0.0.0.0/::, or a specific LAN address, always with --insecure) — early-returns True before the empty-peer check runs. DNS-rebinding is still blocked by the Host/Origin guard in _ws_host_origin_is_allowed. * legitimate loopback peers (client_host == '127.0.0.1' / '::1') — not affected by the empty-peer branch. Regression tests added in tests/hermes_cli/test_dashboard_auth_ws_auth.py: * test_empty_client_host_rejected_in_loopback_mode * test_missing_client_object_rejected_in_loopback_mode * test_empty_client_host_reason_is_block Plus two regression guards to ensure the fix does not over-reach: * test_empty_client_host_still_allowed_in_insecure_public_mode * test_empty_client_host_still_allowed_in_gated_mode All three new fail-closed tests fail without this patch (the helpers return True / None for an empty peer) and pass with it. The 45 pre-existing tests in test_dashboard_auth_ws_auth.py continue to pass.
… session (#50375) When a /model switch resolves a valid model but the in-place agent swap fails mid-conversation (expired key, unreachable base_url), the agent rolls itself back to the old working model+client and re-raises. The callers caught that re-raise, logged a warning, then committed the broken switch anyway: wrote the failed model to the session DB, set _session_model_overrides to the broken model/provider/key, and (gateway direct path) evicted the working cached agent. The next message then rebuilt a dead agent from the broken override -> permanently unusable conversation (#50163). Fix the whole caller class so a failed swap aborts the commit entirely: - gateway/slash_commands.py (picker + direct /model paths): on swap failure, early-return an error message; skip DB persist, session override, cache eviction, and config write. - cli.py (both /model handlers): snapshot CLI-level credential/runtime fields before mutating, restore them on swap failure, and abort the note + success print. - tui_gateway/server.py: wrap the previously-unguarded swap; on failure raise a clean error and skip worker restart, runtime persist, switch marker, session model_override, and config persist. The no-cached-agent path (apply-on-next-session) is unaffected. Adds a gateway regression test that fails on the pre-fix behavior.
… (#50373) On Windows, _pause_windows_gateways_for_update() force-kills every running gateway before mutating the venv. Gateways mapped to a profile (via profile.path/gateway.pid) were respawned afterward, but gateways with NO profile mapping — e.g. a Windows Scheduled Task running "pythonw.exe -m hermes_cli.main gateway run" — were force-killed and only told to restart manually. After an auto-update/bootstrap the Telegram bot stayed dead until manual intervention. Now we snapshot each unmapped gateway's argv (psutil, guarded by looks_like_gateway_command_line) before the kill and replay it through the same detached watcher used for profile gateways, so unmapped gateways come back automatically too. Co-authored-by: Hermes Agent <agent@nousresearch.com>
…ers (#50385) A bare custom provider configured via `model.api_base` (the intuitive name OpenAI-SDK / LiteLLM users reach for) was silently ignored: `hermes config set` accepts any dotted key, so `model.api_base` got written and confirmed, but the runtime resolver reads only `model.base_url`. Requests fell back to OpenRouter with an empty key -> 401, zero hits to the custom endpoint (issue #8919). Now api_base is migrated to base_url at load time (fixes existing broken configs) and at set time (with a notice), never overriding an explicit base_url. Closes #8919.
A dangerous-command gateway approval blocks the agent's execution thread inside _await_gateway_decision() on threading.Event.wait() until the user responds or the 5-minute approval timeout fires. The poll loop never checked is_interrupted(), so /stop (which flags the agent's execution thread via AIAgent.interrupt()) was silently ignored — the session stayed wedged until timeout, even though /stop reported the session unlocked. Check is_interrupted() at the top of the poll loop. The wait runs on the agent's execution thread, the exact thread interrupt() flags, so the check sees the signal and resolves the pending approval as deny — the agent loop receives a normal denial and unwinds cleanly. Covers /stop, /new, and the gateway inactivity-timeout interrupt through the single shared wait loop used by both the terminal and execute_code guards.
- Add thread-scoped regression test: interrupt on the waiting thread resolves the approval as deny well under the 300s timeout; a foreign-thread interrupt does NOT release the wait (interrupts are per-thread). - Add panghuer023 to AUTHOR_MAP for the salvaged #37994 fix.
…nnecting
The email adapter read address/host purely from env vars and never stripped
them, so a missing or whitespace-padded EMAIL_IMAP_HOST reached
imaplib.IMAP4_SSL("") and surfaced as the misleading
"[Errno 8] nodename nor servname provided, or not known" — sending users down a
DNS rabbit hole when the real problem was an empty/dirty host string. A
config.yaml-only setup also left the host empty because __init__ ignored
PlatformConfig.extra, even though the "connected" check, the send helper, and
`hermes config show` already read address/imap_host/smtp_host from it.
Resolve address/imap_host/smtp_host from the env var first, then fall back to
config.extra, and strip surrounding whitespace — matching the send helper's
existing pattern. Validate the required settings at the start of connect() and
return False with an actionable message instead of attempting a connection with
an empty host.
Adds regression tests for whitespace stripping, config.extra fallback, and the
no-IMAP-attempt-on-missing-host path.
…ars (#40715) Fold in the #40715 blank-env OOM fix on top of the host-resolution change: - connect() now sets a non-retryable fatal error when required settings are missing, so the gateway stops reconnecting against an empty host instead of looping forever and leaking memory until the host OOM-kills. - check_email_requirements() treats blank/whitespace-only EMAIL_* values as missing, so an abandoned setup with empty keys no longer enables the platform. Credits the parallel fixes by zerone0x (#40745) and liuhao1024 (#40829).
…e delivery When a streamed Telegram reply finalizes, the stream consumer could take the fresh-final path (send a new sendRichMessage + best-effort delete the preview) purely because the time-based _should_send_fresh_final() threshold elapsed — even though Telegram's prefers_fresh_final_streaming returns False. The fresh Rich Message then overlapped the legacy MarkdownV2 preview already on screen, leaving both visible (the #47048 table + bullet double-render). Honor the adapter's decision: when prefers_fresh_final_streaming exists on the adapter (checked on the class + instance __dict__ so MagicMock auto-attrs don't false-positive) and declines, the time threshold no longer overrides it. Adapters without the hook keep the time-based fresh-final for backward compat. Fixes #47048
…bypass ipaddress.ip_address() raises ValueError on IPv6 addresses with scope IDs (e.g. 'fe80::1%eth0'). Both is_always_blocked_url() and is_safe_url() silently skipped these via `except ValueError: continue`. If ALL resolved addresses for a hostname carry scope IDs, every address is skipped and the URL passes all safety checks — a potential SSRF bypass vector against link-local or metadata endpoints. Fix: - Strip the scope ID (%eth0) before parsing in both functions - is_safe_url(): fail closed (return False) with a warning log if still unparseable after stripping - is_always_blocked_url(): use continue (not return False) to preserve multi-address scanning, with a warning log Affected: tools/url_safety.py — is_always_blocked_url(), is_safe_url()
Follow-up to the salvaged #25961 fix: regression tests asserting that scope-bearing IPv6 addresses (fe80::1%eth0, ::1%lo) are blocked by is_safe_url after the scope is stripped, that a still-unparseable address fails closed, and that a scoped IPv4-mapped IMDS address is caught by the always-blocked floor.
…Closes Lexus2016#432) (Lexus2016#436) Add mutating/idempotent tool-aware thresholding to the loop guard, so mutating tools (terminal, write_file, execute_code, etc.) trigger spiral detection at half the threshold of read-only tools. === Changes === agent/loop_guard.py: - Add _MUTATING_TOOLS and _IDEMPOTENT_TOOLS frozensets with category threshold constants: mutating repeat=4/fail=2/escalate=8 vs idempotent repeat=8/fail=4/escalate=15 - Add _tool_category() and _tool_spiral_score() helper functions - Update maybe_nudge() to auto-select thresholds based on tool type - Add ESCALATED INTERRUPT level: when a spiral exceeds the escalate threshold, the nudge becomes a directive requiring the agent to summarize progress before continuing - Include spiral-intensity score in high-count nudges so the model sees the evidence of fixation - Unknown tools (MCP, plugins) default to the safer mutating thresholds - Fix _EXIT_CODE_RE regex to correctly use \s (whitespace) and \d (digit) character classes tests/agent/test_loop_guard.py: - Split mutating/idempotent threshold coverage across both tool types - Add TestEscalatedInterrupt class (8 tests): verify escalated interrupt fires at correct thresholds for mutating, idempotent, and unknown tools - Test spiral-intensity annotations appear at high counts - Update existing tests to reflect new mutating thresholds - Added 11 new test cases (15 -> 26 total) agent/conversation_loop.py: - Log a warning when an ESCALATED INTERRUPT fires, so operators and log aggregators can detect deep spiral patterns === Testing === 26/26 loop_guard tests pass, 9/9 guardrail runtime tests pass (35 total) Co-authored-by: Hermes Evolution <evolution@hermes-agent.nousresearch.com>
Secret redaction only matched `Authorization: Bearer <token>`. Other auth headers passed through verbatim into logs, tool output, and transcripts: - `Authorization: Basic <base64>` — leaks base64(user:password) - `Authorization: token <pat>` / any non-Bearer scheme - `Proxy-Authorization: ...` - `x-api-key: <key>` (Anthropic and many providers) and `api-key`, `x-goog-api-key`, `x-auth-token`, `x-access-token`, ... — opaque values with no known vendor prefix were caught by nothing A logged request or an echoed `curl -H "x-api-key: ..."` command therefore leaked live credentials. Generalize the Authorization rule to mask the credential for any scheme (and Proxy-Authorization) while preserving the header name and scheme word for debuggability, and add an api-key header rule for the single-opaque-value headers. Bearer behavior is unchanged; plain prose containing the word "authorization" (no colon-delimited value) is left untouched. Adds regression tests for Basic/token/Proxy auth and the x-api-key/api-key headers, including inside a curl command.
fix(windows): prefer cmd npm shim on PATH fallback
Automated security fix generated by Orbis Security AI
…n _ensure_loaded Addresses PR #9560 review comments: applies the CWE-22 fix to current main (post-PR Lexus2016#458 rebase) and adds the requested regression tests. - SessionEntry.from_dict now raises ValueError for session_key or session_id containing '..' or starting with '/' or '\' (directory traversal guard) - SessionStore._ensure_loaded moves per-entry validation inside the loop so one malicious/corrupt entry is skipped with a warning instead of aborting the entire sessions.json load - Adds TestSessionEntryFromDictTraversalValidation (5 cases) and TestEnsureLoadedSkipsInvalidEntries covering the skip-not-abort behavior Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tion Extends the CWE-22 path traversal guard to cover Windows absolute paths of the form C:/... and D:\... — previously only leading / and \ were checked, which missed drive-letter prefixes. Replaces the inline startswith check with a compiled module-level regex (_TRAVERSAL_RE) that covers all three attack patterns: .., leading /\, and leading X: drives. Adds two regression tests for C:/windows/system32 and D:\\path\\to\\file. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…den non-leading separators Follow-up to the salvaged #9560 fix: - Replace the _TRAVERSAL_RE regex with an explicit _is_path_unsafe() helper (drops the now-unused `import re`); catches a path separator ANYWHERE, not just leading, so a non-leading Windows backslash can't slip through. - Switch the per-entry skip in _ensure_loaded_locked from print() to logger.warning to match the module's logging conventions. - Add AUTHOR_MAP entry for the contributor. - Add regression tests for the non-leading-separator case.
…mit/auth cooldowns (Lexus2016#510) So 429/auth failover paths record Retry-After / default cooldowns for the failed provider and skip it when advancing the fallback chain. Closes Lexus2016#478 Co-authored-by: Hermes Evolution <evolution@hermes.ai>
…ache (Lexus2016#509) Recurring cron jobs build their session_id as cron_<job_id>_<timestamp>, so each fire gets a fresh session_id. The Codex/Responses transport uses session_id as prompt_cache_key (and the cache-scope routing headers), meaning every cron fire pays a cold-start LLM cache — the static prefix (agent identity + tool guidance + job prompt) is recomputed in full on each run. Interactive sessions are unaffected (one conversation keeps one stable session_id). Thread an optional cache_key through AIAgent -> init_agent -> the transport build_kwargs, defaulting to session_id when absent so the interactive path is byte-identical. Cron passes the constant cron_<job_id>, so repeated fires share one warm cache key while session_id still rotates per run for transcript isolation. The cache split is deliberate: prompt_cache_key and the Codex cache-scope headers (session_id / x-client-request-id) follow the stable cache_key, but xAI's x-grok-conv-id stays on the per-run session_id so distinct fires aren't merged into one conversation. cache_key is threaded into all three transport build_kwargs call sites (Responses, profile, legacy), so the warm key applies regardless of which transport the cron agent selects; non-Codex transports ignore the unknown param via **params. prompt_cache_key is a routing hint, never a correctness boundary — a stale or wrong key only causes a cache miss, never a wrong result. Reimplements the idea from PR Lexus2016#488 (by @Da-Mikey) cleanly on current main; that PR was unmergeable — branched ~298 commits behind and would have reverted unrelated work across gateway/desktop/docs. - agent/agent_init.py: +cache_key param + agent.cache_key assignment - run_agent.py: thread cache_key through AIAgent - agent/chat_completion_helpers.py: pass cache_key to build_kwargs (x3) - agent/transports/codex.py: cache_key drives prompt_cache_key + Codex cache headers + xAI extra_body; x-grok-conv-id stays on session_id - cron/scheduler.py: pass cache_key=f"cron_{job_id}" - tests: 4 new cases (override, fallback, codex headers, xAI body/conv split)
Implements the cron.thinking config option, defaulting thinking mode off for cron sessions. Follow-up fixes to make CI green: - cron/evolution/hydra.yaml: dropped the stage skills: block (Hydra is a pure delegator dispatching via delegate_task; listing script-running skills under a [file, delegation] toolset was dead wiring per evolution_skill_lint). - scripts/release.py: mapped the Hermes Evolution agent author email in AUTHOR_MAP.
Adds a per-session circuit breaker around Honcho dialectic queries so a failing backend stops burning API credits after 5 consecutive failures. - Trip after 5 consecutive exceptions from the Honcho SDK/backend. - 120s cooldown; half-open probe resets on success or re-trips on failure. - Empty (but exception-free) results do NOT increment the failure counter, so sparse profiles don't disable memory. - Replace the silent logger.warning with logger.exception so failures are observable in logs. Closes Lexus2016#463 Co-authored-by: Hermes Evolution <evolution@hermes.ai>
…16#515, Closes Lexus2016#516) Add structured size fields to memory tool limit errors and expose an explicit compact action so the agent can shorten entries before a write.
…/3.0, analysis copies verbatim (Lexus2016#519) The watchdog re-fires LOW_SELECTION_EFFICIENCY (selection_efficiency=12%, window selected=60 merged=7) the morning after Lexus2016#507's prompt-level throttle landed. Root cause in the 2026-06-24 analysis cycle: it set max_total_effort=2.0 — neither the 3.0 default nor the 1.5 throttle Lexus2016#507 prescribed. A prompt-level "you decide 1.5 vs 3.0 from the flag" instruction is unenforced, so the LLM drifted to an arbitrary middle, under-throttling and keeping the funnel over-selected. Fix: move the decision out of the prompt into the deterministic metric script. - evolution_metrics.compute_health() now emits effort_budget (1.5 when LOW_SELECTION_EFFICIENCY is flagged, else 3.0 — the only two legal values). - format_health() renders `effort_budget=X` in the BODY of the sidecar line, before the final `| <tail>`, so evolution_watchdog's `.endswith("| healthy")` flag parsing is untouched (verified E2E on a prod-shaped flagged line). - evolution-analysis/SKILL.md: steps 5a/6 now COPY effort_budget=X verbatim as max_total_effort instead of deriving it; middle values like 2.0 are explicitly illegal. Scoring formula, weights, decomposition/split rules, and the anti-starvation slot are UNCHANGED. Tested: test_evolution_metrics (+3 cases: throttle-only-when-flagged, default-on-insufficient-signal, budget-in-body-not-watchdog-tail), test_evolution_watchdog, test_evolution_funnel, test_evolution_skill_integrity all pass (73); ruff + ty clean.
…oses Lexus2016#514) (Lexus2016#527) Adds explicit, structured logging around provider failover so cron introspection can classify rate-limit and billing events without parsing free-text logs. Emits two events: - provider_rate_limit_failover: when the primary provider is put on cooldown, including provider/model, Retry-After header, computed cooldown, and fallback chain position. - provider_fallback_activated: when the runtime actually switches to a fallback provider/model, confirming the new active backend. Closes Lexus2016#514 Co-authored-by: Hermes Evolution <evolution@hermes.ai>
… agent jobs without skills (Lexus2016#534) _normalize_skills/_normalize_toolsets return None (not []) when a stage YAML omits skills:/toolsets:. The reconcile path for an ALREADY-REGISTERED job did `list(skills)` / `list(toolsets)` unconditionally, so the moment those jobs existed every re-run raised `TypeError: 'NoneType' object is not iterable` and aborted. register_evolution_cron IS the integration stage's self-update step (refresh HERMES_HOME no_agent scripts + sync skills), so this silently froze script/skill deployment on the server — e.g. PR Lexus2016#519's funnel-side effort_budget never reached HERMES_HOME until the registrar was run by hand. Fix: only reconcile skills/toolsets when the YAML explicitly specifies them (`is not None`). None means "leave the registered value as-is", NOT "clear it", so this also prevents a no-skills YAML from clobbering a job's registered skills to []. Helper-script install (runs before reconcile) was already fine; only the existing-job reconcile branch was affected. Regression test: an existing agent job whose YAML omits skills reconciles its schedule without crashing and without clobbering skills. Tested: test_register_evolution_cron 23 pass (uv run); ruff + ty clean.
…overspent analysis budgets (Lexus2016#535) PR Lexus2016#519 made the effort budget deterministic at the source (the metric script prescribes 1.5/3.0; analysis SKILL.md tells the agent to copy it verbatim). But a prompt instruction is still not enforced: the 2026-06-24 cycle wrote max_total_effort=2.0 — neither legal value — and under-throttled. This adds the deterministic teeth. - scripts/evolution_analysis_audit.py: pure audit_analysis(report) returns BUDGET_ILLEGAL (max_total_effort not in {1.5, 3.0}) and BUDGET_OVERSPENT (total_effort_selected > max_total_effort). Missing/non-numeric fields are skipped (no false alarm on partial/legacy/idle reports). audit_latest() reads the most recent YYYY-MM-DD.json under analysis/ (ignores issues_*/prs_*). - evolution_watchdog.py: new check_analysis_integrity wired into the morning run so a drift surfaces to the owner — the same deterministic-verdict + alert pattern as the realized-impact / regression gates. Silent when clean. Read+flag only: the analysis stage merges nothing, so a bad selection self-corrects next cycle; an alert is the right enforcement teeth for this stage. Tested: test_evolution_analysis_audit (18 cases) + test_evolution_watchdog + skill-integrity — 62 pass; ruff + ty clean. E2E: flags the real 2026-06-24 max_total_effort=2.0 (BUDGET_ILLEGAL).
…ts rejections (Lexus2016#83 class) (Lexus2016#536) Gate #2 of the enforcement campaign. The analysis stage can CLOSE an issue claiming the feature already exists, citing a repo path — but the prompt-level "cite an exact path you verified" rule is unenforced. The real Lexus2016#83 was closed citing scripts/evolution_watchdog.sh; the actual script is .py, so a wanted idea was killed on fabricated evidence. - evolution_analysis_audit.audit_rejections(report, repo_root): for each already-exists rejection, extract cited repo paths from the prose reason and flag FABRICATED_REJECTION when it cites concrete paths and NONE exist. A single missing path among existing ones is treated as a typo / secondary reference (not fabrication) → no false positive. Needs the repo; silent without it. - audit_latest now also runs the rejection check when a repo_root is given. - evolution_watchdog.check_analysis_integrity passes _resolve_repo_dir() so the morning run verifies the latest cycle's closes against the actual checkout. Read+flag only (alerts the owner; the close already happened — the value is catching the pattern so the idea can be re-opened). Tested: test_evolution_analysis_audit (+8 rejection cases) + watchdog — 61 pass; ruff + ty clean. E2E: 5 real prod already-exists rejections → 0 false positives (all cited paths exist); a fabricated scripts/evolution_watchdog.sh → flagged.
…alive (Lexus2016#538) The approval hooks (register/unregister_gateway_notify) are monkeypatched globally, so concurrent daemon build threads from sibling session.create tests in the same xdist shard append THEIR session keys to closed_workers / unregistered_keys. This test even nudges a late sibling's build thread into its own replaced-cleanup by clearing server._sessions. Comparing the lists to [] then fails on that sibling noise (observed failing key 20260625_..._955d26 — a sibling session, not this test's), blocking unrelated PRs (Lexus2016#534, Lexus2016#536) until a flaky rerun. With evolution-integration's require_all_checks_green, it also randomly blocks the autonomous pipeline's own merges. Fix: scope the assertions to THIS session's own key (session["session_key"]) instead of asserting the global lists are empty. The own-key check is immune to sibling noise while still catching this build thread over-cleaning its own session — the regression the test guards. Test-only change.
…e (gate #3) (Lexus2016#539) The integration stage self-merges its OWN evolution/issue-* PRs and auto-deploys — a bad self-merge lands on every install. Branch protection already requires green CI; the remaining gaps are unenforced prompt-level checks. This adds the deterministic operational-safety gate, designed with a cross-AI review (Gemini): - scripts/evolution_merge_gate.py — pure check_merge_policy(files, max_lines): * DIFF_TOO_LARGE: > 200 changed lines (cap, EVOLUTION_MERGE_MAX_LINES) — a large autonomous change is the agent spiraling / a refactor that needs a human. * HIGH_RISK_PATH: never self-merge a PR touching CI workflows, dependency lockfiles/manifests (supply-chain), container/infra, secrets, or the pipeline's OWN approval / merge-gate / cron-registrar machinery. The --merge mode then merges ATOMICALLY via `gh api PUT .../merge -f sha=<head>` so a push landing between review and merge returns 409 and aborts instead of merging unreviewed code — closing the TOCTOU race the prompt-level branch-integrity check had. - evolution-integration/SKILL.md: the merge step now goes through the gate instead of a manual commit-set check + raw `gh pr merge`. Dead-code detection is deliberately NOT a blocking gate (AST across Python/JS is false-positive-prone and would stall the pipeline on reflection/decorators/route registries) — it belongs as a future detect+alert, per the same review. Tested: test_evolution_merge_gate (14 policy cases) + skill-integrity — 25 pass; ruff + ty clean. E2E: typical small PR OK; uv.lock → HIGH_RISK; 520-line → DIFF_TOO_LARGE.
…g flag (Closes Lexus2016#517) (Lexus2016#537) - MemoryStore accepts allow_batch_override (default False). - apply_batch(target='memory', memory_char_limit=N) honours N only when allow_batch_override is True; user target and system-prompt snapshot keep the configured limit, preserving the prefix cache. - load_on_disk_store() reads memory.allow_batch_memory_char_limit_override. - memory_tool schema exposes the optional memory_char_limit parameter. - Tests cover enabled/disabled, user-target guard, and snapshot stability. Closes Lexus2016#517 Co-authored-by: Hermes Evolution <evolution@hermes.ai>
…2016#542) Adds aux_health_ping() in agent/auxiliary_client.py: a tiny, best-effort chat-completion probe of the auxiliary provider layer. It resolves the configured provider/model for a task and returns provider:model on success, or None (with a warning) on failure. The gateway calls it once whenever a new session is created, so auxiliary problems (compression, memory, title-gen sidecars) surface immediately instead of waiting for the first background task. Co-authored-by: Hermes Evolution <evolution@hermes.ai>
…r, model, retries (Closes Lexus2016#521) (Lexus2016#541) When a cron job fails because of a provider-layer timeout, the failure record now includes: - provider / model (resolved from the job's model or HERMES_MODEL) - failure_category (via agent.error_classifier.classify_api_error) - retry_count (best-effort parse from the error text) The chat delivery summary also appends the failure category so operators see e.g. 'provider timeout [timeout]' instead of a generic message. Co-authored-by: Hermes Evolution <evolution@hermes.ai>
…me (Lexus2016#537) (Lexus2016#563) The required meta-check "All required checks pass" was RED on main because tests/run_agent/test_percentage_clamp.py::TestSourceLinesAreClamped:: test_memory_tool_clamped failed (count 1, expected >= 2), blocking ALL merges (PR Lexus2016#561 was BLOCKED, the evolution pipeline's autonomous merges, and PRs Lexus2016#542/Lexus2016#546/Lexus2016#552). Root cause (Case B — stale guard, NOT a correctness regression): The guard counted the literal substring `min(100, int((current / limit)` and required >= 2. Commit 59502d3 (Lexus2016#537, per-apply_batch memory_char_limit override) renamed the local `limit` -> `effective_limit` inside _success_response, so its clamp line became `min(100, int((current / effective_limit) * 100))`. The clamp is fully preserved — only the variable name changed — but the brittle literal no longer matched, dropping the count from 2 to 1. All three percentage sites in tools/memory_tool.py remain correctly clamped (verified): L847 _compact (new_total/limit), L1099 _success_response (current/effective_limit), L1128 _render_block (current/limit). No display path can emit >100%; there is no unclamped percentage. tools/memory_tool.py is intentionally left untouched. Fix: replace the variable-specific literal count with a refactor-resilient guard that asserts the REAL invariant — every `int(... * 100)` percentage expression is immediately wrapped in `min(100, ...)`. This is strictly stronger than the old line-count: it actively detects an unclamped regression (verified with negative tests for both the double-paren and the standard single-paren `int(current / limit * 100)` forms) and survives future variable renames. A secondary `min(100, int((` >= 2 site-count sanity check is retained. A cross-AI (Gemini) review confirmed the new guard is at least as strong as the old one and prompted broadening the regex to also cover the single-paren cast form.
…(client-wide) (Lexus2016#561) Symptom: every onboarded client fired a daily watchdog alert claiming "fork is ~13000 commits behind upstream/main". Root cause: scripts/install.sh (lines 1214/1219) and install.ps1 clone Hermes with `git clone --depth 1`, so every fresh client install is a SHALLOW repo. upgrade.sh then adds the `upstream` remote and registers the evolution-watchdog cron. On a shallow clone HEAD shares no ancestry with upstream/main, so check_upstream_lag()'s `git rev-list --count HEAD..upstream/main` counts ~ALL upstream history (~13031) instead of the true distance, tripping the threshold every day. The shallow guard already exists in hermes_cli/banner.py (_check_via_local_git) and hermes_cli/main.py (update-check), but check_upstream_lag was the one update-check site that was missed. Fix: add _upstream_lag_unmeasurable(runner, repo) — returns True when the repo is shallow (`git rev-parse --is-shallow-repository` == "true") OR HEAD/upstream have no shared history (`git merge-base` non-zero with empty stdout). When True, check_upstream_lag returns [] SILENTLY (no alert). Shallow is the INTENDED client default and upstream-lag is the fork maintainer's concern — the evolution server is a full clone and still gets the real count. The helper is fail-open: any inconclusive probe falls through to the existing exact rev-list/threshold path, so it can never make the check worse than today. Consolidates two competing local fixes: silent-skip RESPONSE (mirrors the banner.py/main.py precedent) + broader DETECTION (also covers the grafted/no-common-ancestor case). Tests: 3 new TestUpstreamLag cases — shallow -> silent (rev-list not consulted), unresolved merge-base -> silent, full clone behind>80 -> still alerts (regression guard). 38/38 watchdog tests pass.
…at fatigue (Lexus2016#564) The evolution watchdog re-emitted the SAME pipeline-health alert (LOW_SELECTION_EFFICIENCY, LOW_SUCCESS, REALIZED_* …) on EVERY cron run while a known, already-throttled condition persisted (e.g. selection_efficiency=11%, self-corrected by PR Lexus2016#519's deterministic effort_budget). The owner got the identical alert every single day — pure fatigue, zero new information. Add a state-aware edge-trigger layer around the HEALTH alerts only. We now alert on TRANSITIONS, not steady state: • a NEW flag/condition appears -> emit • a condition WORSENS (new/harsher flag, or an embedded counter like MERGED_ZERO x3 -> x5 — both change the flag tail) -> emit • a condition CLEARS -> emit one recovery line • a condition unchanged for >= EDGE_COOLDOWN_DAYS (7) -> emit one "still unresolved" nudge, so it is never silent forever Only the verbatim repeat of an already-reported, non-worsening condition within the cooldown is suppressed. Signature = the sorted set of flag TAILS (text after the final "|"), ignoring the metrics body whose per-run counts drift even when the condition is unchanged. State persists in <evolution_dir>/watchdog-alert-state.json using the same EVOLUTION_PROFILE_DIR resolution the health checks already use. NO-MASK SAFETY PROPERTY: suppression keys on the condition signature, so any new fault, any worsening, any new distinct flag, and any post-recovery recurrence change the signature and emit immediately. Operational alerts (stage reports, jobs, gh, and the Lexus2016#561 upstream-lag guard) bypass the layer entirely and fire every run. FAIL-OPEN: every state read/write is best-effort — a missing/unreadable/corrupt state file means "unknown previous state" and we emit exactly as today; a write failure is swallowed. Edge-triggering can only ever reduce noise, never mask a fault. TDD: 13 new tests cover steady-suppress, new-flag, worsening, counter-growth, recovery, recovery-then-recurrence, cooldown re-reminder, the three fail-open paths, signature stability, and a main() wiring test asserting upstream-lag/gh are never edge-suppressed. Watchdog module fully green (51 passed).
…M] issue (Lexus2016#566) Two robustness improvements to the evolution watchdog that close the gaps behind the frozen nightly self-update and the manually-opened upstream issue. Feature 1 — silent-freeze detection (check_runtime_divergence): The runtime checkout self-updates with `git pull --ff-only`. When the evolution pipeline (or a contributor) leaves LOCAL commits on the tracking branch that later squash-merge upstream under a DIFFERENT SHA, local HEAD diverges from origin/main, ff-only can no longer fast-forward, and the nightly update silently no-ops — freezing the box on an old revision with no signal (root cause seen on osoba: the runtime checkout commits in-place as "Hermes Evolution" and checks out main in the same tree). The new check alerts on the high-confidence DIVERGED signal only: rev-list --count origin/main..HEAD > 0 AND merge-base --is-ancestor HEAD origin/main is FALSE. The is-ancestor probe is authoritative for "can ff-only advance", so a behind-but-fast-forwardable box (healthy, just hasn't pulled yet) is NOT alerted — avoids a daily false-positive storm. DETECT + ALERT only; never auto-reset (that would risk losing the local commits). Routed through the existing Lexus2016#564 edge-trigger HEALTH layer so a steady divergence emits once and suppresses the verbatim daily repeat (cooldown nudge backstop intact). Feature 2 — idempotent GitHub [UPSTREAM] tracking issue (ensure_upstream_issue): On a REAL upstream escalation (full clone, behind > threshold — the Lexus2016#561 shallow case never reaches here) check_upstream_lag now ensures the [UPSTREAM] tracking issue exists instead of leaving the owner to open it by hand (as with Lexus2016#562). Idempotency key = an OPEN issue whose title starts with "[UPSTREAM]"; if present, no duplicate is created. All gh interaction goes through the injectable runner seam (unit-testable, no network in tests). Fail-open everywhere: repo unresolved, any git/gh spawn error, failed search, or unparseable output → no alert / no-op, never a crash and never a false alarm. Gated by WATCHDOG_FILE_UPSTREAM_ISSUE (default on). TDD: 18 new tests (divergence detection + edge-trigger routing, idempotent issue create/no-duplicate/fail-open, Lexus2016#561 shallow regression still silent).
…exus2016#568) Reduce the Hydra system prompt from ~3678 chars to ~1876 chars (49% smaller) while preserving all stage definitions, safety rules, and dispatch output requirements. Replaces verbose prose and repeated paths with compact variable abbreviations and one-line stage toolset/goal summaries. - cron/evolution/hydra.yaml: condensed prompt, no functional behavior change. - tests/cron/test_hydra_prompt.py: regression tests that pin prompt size (<2500 chars), stage coverage, safety rules, and the absence of terminal from Hydra's own toolsets. Closes Lexus2016#549 Co-authored-by: Hermes Evolution <evolution@hermes.ai>
…failure (Lexus2016#570) When Telegram (or any platform) delivery fails, the undelivered content is now preserved at a well-known deterministic path: {HERMES_HOME}/cron/output/delivery_fallback/{target}_{job_id}_{timestamp}.md This allows the user or a secondary channel to retrieve cron job results even when the primary messaging channel is unreachable. Closes Lexus2016#525 Co-authored-by: Hermes Evolution <evolution@hermes-agent.nousresearch.com> Co-authored-by: Hermes Evolution <evolution@hermes.ai>
…forced guard) (Lexus2016#569) * feat(agent): learn from user corrections — lean Phase 1 (per-user) Principle: a real user correcting a real agent on a real task is the highest-signal feedback an agent gets. Today Hermes captures some of it (the post-turn LLM background_review writes preferences to per-profile memory/skills) but misses the loudest signals — interrupted and denied turns are skipped by the `not interrupted` guard in turn_finalizer. Lean Phase 1 scope (per-user Fast Loop only): 1. Deterministic correction detection (agent/correction_learning.py detect_correction): classifies a completed turn as INTERRUPT / DENY / STEER from runtime markers only — no fuzzy text regex, no LLM. Returns a small CorrectionRecord {kind, signature, context, session_id, ts}. 2. Stop skipping corrections: turn_finalizer now triggers the background review when the turn IS a structured correction, even when interrupted or denied. Non-correction interrupted turns and normal turns keep their exact prior behavior. The detected correction is passed to the reviewer as a tier-aware hint. 3. Generalization guard (the load-bearing safety piece): a captured correction is TRANSIENT by default. It promotes to DURABLE (a write to the per-profile memory store, which re-injects next session) ONLY on evidence — the same signature recurs across >= 2 DISTINCT sessions, or an explicit "remember this". A minimal recurrence tracker (signature -> distinct sessions) lives in a fail-open per-profile JSON store; a broken store degrades to transient-only and never disturbs a turn. Promotion is idempotent (no duplicate ledger/memory writes on later sightings). The correction-triggered LLM review prompt is tier-aware: a first-sighting transient correction is explicitly NOT persisted durably by the reviewer, so the deterministic guard is the single durable gate (closes a one-off leak found in independent review). 4. Provenance + unlearn: every durable item is tagged with origin signal kind, session, signature, timestamps, and promotion reason in a ledger. unlearn(provenance_id) removes the durable item from the memory store AND resets the signature's recurrence evidence (so unlearn is not silently undone by the next sighting). Reversible by construction. Deferred to later phases (explicitly NOT built): the multi-dimensional evidence vector, the fleet/global consensus path, calibration / positive- negative controls, the adversarial counter-reviewer, TTL / model-version tagging, config-over-prompt routing, and the parallel codex_runtime guard. TDD: tests written first (tests/agent/test_correction_learning.py, test_correction_learning_wiring.py, test_turn_finalizer_correction_review.py) covering the acceptance test (transient on first sighting -> durable on a second distinct session; explicit remember promotes immediately), the negative control (one-off never injects, neither deterministically nor via the LLM review), idempotent promotion, unlearn-resets-recurrence, regression (non-correction turns unchanged), and provenance. 37 tests green; background_review regression suite (52) green. No auto-merge — fleet- affecting behavior change requires owner review. * fix(agent): enforce durable-write gate + discriminate user denials (X1, X2) Two serious defects in the lean Phase-1 "learn from user corrections" slice. X1 — the deterministic guard was NOT the single durable gate it claimed to be. The correction-triggered review forced review_memory=True on every detected correction and spawned the background-review LLM fork holding the memory/skill WRITE toolset (shared _memory_store). For a TRANSIENT one-off correction, only an advisory preamble — not enforcement — stopped the LLM from writing it durably. This widened an ungated durable-write path. Now ENFORCED: turn_finalizer computes block_durable_writes (transient correction that is the SOLE reason the review spawned — i.e. the legacy nudge path would not have fired and the recurrence guard has not promoted it) and threads it through _spawn_background_review -> spawn_background_review_thread -> _run_review_in_thread. The fork's runtime tool whitelist (_review_tool_whitelist) then strips the durable writers (memory, skill_manage), so get_pre_tool_call_block_message denies any durable write at dispatch. Durable persistence for the correction path now happens ONLY via the deterministic CorrectionLearner promotion (recurrence>=2 or explicit remember). Pre-existing nudge-driven review behavior is untouched. X2 — _detect_deny keyed on status=="blocked", which automatic terminal blocks (dangerous-command denial and workdir shell-injection validation) also emit with no user involvement, minting false "user corrections". A genuine user denial is now stamped with an explicit user_denied marker at the approval sites (tools/approval.py), propagated into the terminal tool result only for real denials (tools/terminal_tool.py), and _detect_deny fires solely on that marker. Automatic blocks are excluded. Also: wired CorrectionLearner.unlearn to a real CLI surface (hermes corrections list / unlearn <id>) so reversibility is not paper-only. Tests: repaired the misleading "negative control" (it only exercised the deterministic learner, never the LLM leak path) and added real enforcement tests — the fork whitelist excludes durable writers for a transient correction; an automatic dangerous-command/workdir block is NOT detected as a denial; the unlearn CLI reverses a durable item end-to-end. * test(agent): prove X1 enforcement at the dispatch gate Add a deterministic end-to-end test: under the transient-correction whitelist, get_pre_tool_call_block_message (the gate the review fork's tool dispatch consults) denies memory and skill_manage while allowing read-only skill_view. Proves the LLM fork cannot persist a one-off correction durably — denied at dispatch, not merely discouraged by a prompt. * fix(agent): close 5 audit defects in learn-from-corrections Phase 1 Adversarial audit (6/10) flagged five issues; all now fixed surgically. 1. CI red: register "corrections" in _BUILTIN_SUBCOMMANDS so test_builtin_set_covers_every_registered_subcommand passes. The subcommand was wired in main() but missing from the frozenset, so the startup-gating parity check (a required CI check) failed. 2. Codex parity: extract correction detection + recurrence recording + the spawn/block decision into a shared seam, agent/correction_review.py (decide_correction_review). Both turn_finalizer.finalize_turn and the Codex finalizer (codex_runtime.run_codex_app_server_turn) now route through it, so they cannot drift. Previously the Codex path carried an unmodified nudge-only gate and silently never learned from a correction. 3. X1 co-occurrence hole now universal: block_durable_writes = (correction present AND not durable), dropping the prior `and not _healthy_review` term. A transient correction co-occurring with a nudge can no longer ride the nudge's durable write into the store; the nudge's own durable write is deferred to the next interval. 4. No wasted aux-model spend: a pure-transient correction with no nudge is recorded deterministically but no longer spawns the (write-blocked) LLM review fork. Spawn = (nudge fired) OR (correction promoted to durable). 5. Promote atomicity: in CorrectionLearner._promote write the ledger entry FIRST, then the durable memory line, so a failure between them leaves a cleanable ledger entry (injected:false) rather than an un-unlearnable orphaned MEMORY.md line. Still fail-open. Tests: rewrote the turn-finalizer correction-review tests for the new spawn/block contract; added tests/agent/test_codex_runtime_correction_review.py proving the Codex path detects+records and obeys the same spawn/block rules. * fix(agent): revive INTERRUPT corrections on default runtime (capture-before-clear) DEFECT 1 — capture-before-clear. finalize_turn called clear_interrupt() (nulls _interrupt_message) ~46 lines before the correction detector read it, so the INTERRUPT branch of detect_correction was dead on the default runtime. Capture _interrupt_message into a local BEFORE the clear and feed that local to detection and result["interrupt_message"]. DEFECT 2 — un-mask the tests. The _StubAgent/_Agent.clear_interrupt() stubs were no-ops that never nulled _interrupt_message, so INTERRUPT tests passed against broken production code. Mirror production (null the message + request flag). Add an explicit capture-before-clear test asserting, through the REAL finalize_turn ordering, that clear_interrupt ran (attribute is None) yet the INTERRUPT correction was still detected with its exact redirect text. The stub fix turns the existing INTERRUPT tests RED pre-fix; DEFECT 1 turns them GREEN. DEFECT 3 — explicit "remember this" durable promotion is unreachable in production (no caller threads remember=True). Downgrade the docstrings/comments: cross-session recurrence (>=2 distinct sessions) is the SOLE Phase-1 production durable trigger; explicit-remember is deferred. Keep the remember param as the tested seam, documented as not-yet-wired. DEFECT 4 — codex INTERRUPT honesty. On the codex runtime user interrupts are never propagated into the session (request_interrupt has no production callers; codex interrupted is only a deadline-timeout), so codex INTERRUPT stays inert even after DEFECT 1 — a pre-existing platform gap, deferred. Scope the code comments honestly: default runtime = INTERRUPT/DENY/STEER all live; codex = DENY/STEER live, INTERRUPT blocked. No behavior change beyond reviving the INTERRUPT branch. DO NOT MERGE.
Adds _FailureCounter-based circuit breaker to Telegram adapter's send and poll operations so repeated delivery failures disable the adapter for the session instead of retrying indefinitely. Depends on _FailureCounter (PR #6) and per-entry failure tracking (PR #7). Closes Lexus2016#573 Co-Authored-By: Hermes Evolution <evolution@hermes.ai>
|
Blocked: upstream dependency removed. This PR depends on PR #7 (per-entry failure tracking) which in turn depends on Without the No further automated action possible this cycle. |
|
Blocked: same fork-divergence issue as PR #14. The Telegram circuit breaker idea (issue Lexus2016#573, 1 file, 31 lines) needs to be recreated from da-mikey/main. Create a fresh branch, cherry-pick just the circuit breaker commit(s) from the Lexus-based history, and open a new PR. |
Cherry-pick of the Telegram circuit breaker onto current Lexus2016 main. Depends on _FailureCounter (PR #6) and per-entry failure tracking (PR #7). 1 file, 31 insertions.