Description
Multiple issues observed with Hermes Gateway causing cron job outputs to silently fail delivery to Telegram, combined with OpenRouter credential exhaustion breaking the primary model and all auxiliary services simultaneously.
Issues
1. Silent Telegram delivery failure (live adapter bug)
Cron jobs complete successfully and log "delivered to telegram:<chat_id> via live adapter" — but the message never arrives. The live adapter object exists in memory but the underlying Telegram socket is disconnected (due to DNS failure or polling conflict). The adapter's send() does not raise an exception on a dead connection, so adapter_ok stays True, delivery is logged as successful, and the fallback standalone HTTP path is never reached.
Relevant code path: cron/scheduler.py lines ~588–629
2. Telegram polling conflict — multiple gateway instances
When the gateway crashes and launchd/systemd restarts it, the old Telegram polling connection is not fully torn down before the new one starts. This causes repeated:
Conflict: terminated by other getUpdates request; make sure that only one bot instance is running
Both instances then fail to poll, and any cron delivery during this window is dropped.
3. OpenRouter credential pool — cascading failure
When all OpenRouter keys in the credential pool are exhausted or invalid (401/429), the entire system fails: primary model, all 10 auxiliary services (compression, vision, session_search, etc.) are all routed through OpenRouter. A single provider failure takes everything down simultaneously with no fallback to other configured providers (Nous, Copilot, OpenAI Codex).
Steps to Reproduce
Issue 1:
- Start gateway with Telegram connected
- Trigger a network blip or DNS failure that drops the Telegram socket
- Wait for a cron job to fire
- Observe: agent log says
delivered to telegram via live adapter, but message never arrives on Telegram
Issue 2:
- Configure gateway as a launchd service with
KeepAlive: { SuccessfulExit: false }
- Cause the gateway to crash (e.g. OpenRouter 429 flood)
- Observe repeated polling conflict warnings in gateway.log as old and new instances race
Issue 3:
- Configure all auxiliary providers to use
openrouter
- Have both OpenRouter keys in the credential pool go exhausted simultaneously
- Observe: primary model fails, all auxiliary services fail, no fallback occurs
Expected Behavior
- If the live adapter send fails silently (no exception but connection is dead), fall through to the standalone HTTP delivery path
- Gateway restart should cleanly terminate the previous Telegram polling session before starting a new one
- When OpenRouter is unavailable, auxiliary services should fall back to other configured providers (Nous, Copilot, etc.)
Actual Behavior
- Silent delivery failure logged as success
- Polling conflict storm, both instances eventually time out
- Complete system failure when OpenRouter credential pool is exhausted
Logs / Error Output
INFO cron.scheduler: Job 'AI Tools & Features Daily Brief' completed successfully
INFO cron.scheduler: Job 'dcad58f7f082': delivered to telegram:6456820456 via live adapter
# ^ message never arrived on Telegram
WARNING gateway.platforms.telegram: Telegram polling conflict (1/3), will retry in 10s. Error: Conflict: terminated by other getUpdates request
WARNING agent.auxiliary_client: Auxiliary: marking openrouter unhealthy for 60s (payment / credit error)
WARNING agent.auxiliary_client: resolve_provider_client: openrouter requested but OpenRouter credential pool has no usable entries
WARNING root: Fallback to openrouter failed: provider not configured
Environment
- OS: macOS 15 (Darwin 25.5.0)
- Hermes: latest (gateway mode, launchd service)
- Telegram: polling mode (not webhook)
- OpenRouter: free-tier models (
arcee-ai/trinity-large-thinking:free, openai/gpt-oss-20b:free)
- Multiple providers configured in credential pool: openrouter, nous, copilot, openai-codex
Description
Multiple issues observed with Hermes Gateway causing cron job outputs to silently fail delivery to Telegram, combined with OpenRouter credential exhaustion breaking the primary model and all auxiliary services simultaneously.
Issues
1. Silent Telegram delivery failure (live adapter bug)
Cron jobs complete successfully and log
"delivered to telegram:<chat_id> via live adapter"— but the message never arrives. The live adapter object exists in memory but the underlying Telegram socket is disconnected (due to DNS failure or polling conflict). The adapter'ssend()does not raise an exception on a dead connection, soadapter_okstaysTrue, delivery is logged as successful, and the fallback standalone HTTP path is never reached.Relevant code path:
cron/scheduler.pylines ~588–6292. Telegram polling conflict — multiple gateway instances
When the gateway crashes and launchd/systemd restarts it, the old Telegram polling connection is not fully torn down before the new one starts. This causes repeated:
Both instances then fail to poll, and any cron delivery during this window is dropped.
3. OpenRouter credential pool — cascading failure
When all OpenRouter keys in the credential pool are exhausted or invalid (401/429), the entire system fails: primary model, all 10 auxiliary services (compression, vision, session_search, etc.) are all routed through OpenRouter. A single provider failure takes everything down simultaneously with no fallback to other configured providers (Nous, Copilot, OpenAI Codex).
Steps to Reproduce
Issue 1:
delivered to telegram via live adapter, but message never arrives on TelegramIssue 2:
KeepAlive: { SuccessfulExit: false }Issue 3:
openrouterExpected Behavior
Actual Behavior
Logs / Error Output
Environment
arcee-ai/trinity-large-thinking:free,openai/gpt-oss-20b:free)