sync: upstream daily catch-up 2026-06-21b (9 commits → 0 behind)#427
Merged
Conversation
…#47056, #43014)
Consolidates three cron-delivery defects in cron/scheduler.py::_deliver_result
that all stem from how the live-adapter send result is interpreted.
#38922 — duplicate message on confirmation timeout.
future.result(timeout=60) raising TimeoutError bubbled to the outer
except handler, which left delivered=False, so `if not delivered:` re-sent
the identical message via the standalone path. future.cancel() cannot
un-send a request already in flight on the wire, so a slow confirmation
deterministically produced a duplicate. The send was already dispatched onto
the gateway loop, so a bare timeout is now treated as delivered
(assume-delivered is safer than guaranteed-duplicate) and the standalone
fallback is skipped. The live-adapter media attempt is also skipped on
timeout since the contended loop would re-block each 30s media budget.
#47056 — silent drop when the gateway has an active session.
The old check `if send_result is None or not getattr(send_result,
"success", True)` let a result object missing a `success` attribute default
to True = counted as a successful delivery, so the scheduler logged
"delivered via live adapter" while the gateway never processed the message.
Delivery is now confirmed via _confirm_adapter_delivery(): only an explicit,
truthy `success` attribute counts; None or a `success`-less object falls
through to the standalone path so the message actually arrives.
A genuine send Exception (not a slow confirmation) still falls through to
the standalone path, and is caught by run_job's outer handler — it is
recorded as the job's last_error and never crashes the cron ticker.
#43014 — deliver=origin fails to resolve in CLI sessions.
A CLI-created job has no {platform, chat_id} origin, so deliver=origin (and
auto-detect / deliver=None) was unresolvable and emitted "no delivery target
resolved" on every run. An unresolvable origin with no configured home
channel is now treated as local (output stays in last_output), matching the
documented auto-deliver contract; a concrete unresolvable platform target
still reports a real error.
Salvaged from #41007 (timeout discriminator), folding in #47127's
_confirm_adapter_delivery hardening and #38937 / #43063's origin→local
fallback. Tests rewritten as behavior contracts (timeout => no duplicate;
None / success-less result => standalone fallback; confirmed success => no
fallback; CLI origin => local, explicit platform => still errors).
Co-authored-by: Evi Nova <66773372+Tranquil-Flow@users.noreply.github.com>
Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>
`cronjob(action='run')` (and `hermes cron run`) only set `next_run_at = now` and returned success, relying on the scheduler ticker to actually execute the job on its next tick. When no gateway/ticker is running — a CLI-only setup, or the Windows case in #41037 — the job never executed: `run` reported success, but `last_run_at` stayed null forever, no output, no delivery. A manual `run` should actually run. `_execute_job_now` now: - **claims the job via `claim_job_for_fire`** — the same at-most-once CAS the scheduler/external-provider fire path uses. This both advances `next_run_at` for recurring jobs and blocks a concurrently-running gateway ticker from double-firing the same job; if the claim is lost, the run is skipped (the tool reports `execution_skipped`). This closes the double-fire race that a bare `advance_next_run` left open (a tick whose `get_due_jobs` already captured the job between trigger and advance would still fire it). - **delegates firing to `run_one_job`** — the single shared execute→save→deliver→mark body the ticker and external providers use — so failure delivery, `[SILENT]` handling, and live-adapter delivery stay identical across paths and can't drift. (The original salvage re-implemented this sequence inline and had already dropped failure delivery + `[SILENT]`.) The tool response carries `executed`, `execution_success`, and either `execution_error` or `execution_skipped`. The `hermes cron run` CLI message no longer claims "It will run on the next scheduler tick" — it reports the actual "Ran now: succeeded/failed" outcome (or the skip). Salvaged from #41130 by @kyssta-exe (authorship preserved); reworked to reuse `claim_job_for_fire` + `run_one_job` per review rather than re-implementing the fire sequence inline. Adds tests for the claim-then-fire path, claim-lost skip, failure reporting, and exception capture. Fixes #41037 Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>
fix(cron): make live-adapter delivery confirmation reliable (#38922, #47056, #43014)
A recurring cron job persists `next_run_at` as an absolute timestamp with a
UTC offset (e.g. `2026-05-19T21:00:00+10:00`). Cron expressions, however,
describe *local wall-clock* intent ("run at 21:00"). When Hermes/system
timezone changes after the timestamp was persisted, the stored instant is
re-interpreted in the new zone: `21:00+10:00` is the instant `13:00+02:00`,
which is `<= now` (13:02+02:00) — so the job fires HOURS EARLY, then
`compute_next_run` advances it via croniter to `21:00+02:00` the same day,
producing a SECOND fire. (#28934, recurrence of #24289.)
`_get_due_jobs_locked` now detects this precise migration case before the
due check: for a `cron` job whose converted instant looks due, whose stored
UTC offset differs from the current zone's, AND whose stored *wall-clock*
time is still in the future (distinguishing a migrated offset from a
genuinely missed run), it recomputes `next_run_at` from the schedule and
skips the early fire — preserving the local wall-clock intent.
Verified against the issue's reproducer: stored `21:00+10` under runtime
`+02:00` at wall-clock `13:02` is rescheduled to `21:00+02` instead of
firing early + again.
Salvaged from #28941 by @Tranquil-Flow (authorship preserved). Chosen over
the alternative approaches (#28951 normalize-to-UTC, #28985 rebase-and-match)
because UTC-normalization does not change the absolute-instant comparison and
so does not fix the early fire, and this guard is the tightest: it only acts
when all four conditions hold and reuses the existing `compute_next_run`.
Fixes #28934
…er (#22773)
PR #22410 added three-mode Telegram topic routing to the live message path
(TelegramAdapter.send via the gateway DeliveryRouter), but the cron delivery
path never got it. cron/scheduler.py::_deliver_result sent through the live
adapter with a bare ``{"thread_id": ...}`` and fell back to the standalone
_send_telegram, neither of which addresses Bot API Direct Messages topics
correctly. After Bot API 10.0 (2026-05-08), sending to a private chat with a
bare ``message_thread_id`` is rejected/mis-routed, so cron deliveries to a
private DM topic landed in the General topic instead of the requested lane.
Fix: the cron live-adapter branch now routes the text send through the
gateway's ``DeliveryRouter._deliver_to_platform`` — the same canonical path
live messages use — so it inherits all three Telegram routing modes:
1. Forum/supergroup (negative chat_id) -> message_thread_id
2. Bot API DM topics (private chat_id + numeric topic id) ->
direct_messages_topic_id (the case #22773 reported)
3. Hermes-created named private DM-topic lanes -> ensure_dm_topic +
reply anchor
For mode 2, a private-chat target with a numeric topic id is passed as
``direct_messages_topic_id`` metadata (verified end-to-end:
TelegramAdapter._thread_kwargs_for_send turns it into
``{message_thread_id: None, direct_messages_topic_id: <int>}``), instead of a
bare message_thread_id. Forum/supergroup and home-channel deliveries are
unchanged. The standalone fallback (gateway down) is preserved.
No new config knob and no duplicated routing logic — this reuses the existing
DeliveryRouter rather than reimplementing topic routing in the cron path.
Salvaged from #42051 (stepanov1975) and #23249 (devsart95), which both
diagnosed the missing three-mode routing in the cron/standalone path;
reimplemented onto the canonical DeliveryRouter that landed since those PRs
were opened.
Co-authored-by: Alex <9785479+stepanov1975@users.noreply.github.com>
Co-authored-by: devsart95 <devsart95@gmail.com>
fix(cron): route Telegram DM-topic cron delivery through DeliveryRouter (#22773)
fix(cron): execute job immediately on action=run
…pair fix(cron): repair migrated timezone offsets to prevent double-fire
…dialog
#43496 added a per-provider hide-all sentinel ('provider::') so emptying a provider in the Edit Models dialog stopped re-expanding its defaults. That fixed the single-provider case, but the dialog's toggle handler seeds its working set from effectiveVisibleKeys(), which strips ALL sentinels before returning. So persisting after any toggle silently dropped every OTHER provider's hide-all sentinel; those providers then looked 'never customized' and re-enabled all their models on the next render.
Split resolution into two functions:
- resolveVisibleKeys(): stored keys + curated default expansion, with hide-all sentinels PRESERVED — the canonical working set the toggle handler mutates and persists.
- effectiveVisibleKeys(): resolveVisibleKeys() then strips sentinels, for display only (unchanged contract).
Move the toggle set-computation into a pure, unit-tested toggleModelVisibility() that seeds from resolveVisibleKeys(), so sibling sentinels survive the persist. Add regression tests that drive the real toggle handler across multiple providers.
Follow-up to #43496; completes the fix for #43485 (cross-provider case).
Follow-up to the salvaged #47450 fix: - Extract expandProviderDefaults() so the curated-default expansion rule lives in one place (was duplicated between defaultVisibleKeys and resolveVisibleKeys). - Drop the redundant new Set() wrap in toggleModelVisibility (resolveVisibleKeys already returns a fresh Set; effectiveVisibleKeys already relied on this). - Document the intentional re-enable behavior (re-enabling one model of a hidden-all provider restores only that model, not the curated defaults) and tighten the toggleModelVisibility JSDoc. - Add 7 hardening tests: re-enable-restores-only-that-model, full hide/re-enable round-trip, empty-non-null stored, single toggle-off from null defaults, zero-model provider, and direct resolveVisibleKeys null/empty assertions.
…-cross-provider-47450 fix(desktop): preserve other providers' hide-all in model visibility dialog (salvage #47450)
A server that doesn't implement the optional 'ping' utility answers a keepalive ping with JSON-RPC method-not-found. _is_method_not_found_error latches that condition so the probe falls back to list_tools instead of reconnect-looping. The substring fallback only matched 'method not found' / '-32601' / 'not found: ping'. Servers that surface method-not-found as the common 'Unknown method: <name>' phrasing without a structural -32601 code (e.g. agentmemory's MCP server) slipped through, so the fallback never latched and the keepalive reconnect-looped every cycle. Add 'unknown method' to the substring fallback so the ping->list_tools keepalive fallback latches for these servers too. Fixes #50028.
Two regression tests for the agentmemory reconnect-loop: - _is_method_not_found_error matches the plain 'Unknown method: ping' phrasing (no structural -32601 code). - _keepalive_probe latches _ping_unsupported and falls back to list_tools when send_ping raises 'Unknown method: ping', instead of propagating (which would reconnect-loop).
Routine daily sync now that the pipeline is healthy. Brings the 9 upstream commits that landed after the big #405 sync: desktop model-visibility cross-provider fix (#47450/#50100), MCP ping keepalive 'unknown method' fallback, and cron fixes (tz-offset repair, run-immediate, Telegram DM-topic delivery, live-adapter delivery confirmation). One conflict: tools/cronjob_tools.py action='run'. Upstream now EXECUTES the job immediately via _execute_job_now (#41037) instead of only scheduling it (trigger_job). Took upstream's behavior and kept our _reset_cron_failure(task_id). trigger_job import dropped by upstream.
Contributor
🔎 Lint report:
|
| Rule | Count |
|---|---|
invalid-assignment |
2 |
invalid-argument-type |
1 |
First entries
tests/cron/test_scheduler.py:2837: [invalid-assignment] invalid-assignment: Object of type `def never_dispatched_cancel() -> Unknown` is not assignable to attribute `cancel` of type `def cancel(self) -> bool`
cron/scheduler.py:907: [invalid-argument-type] invalid-argument-type: Argument to `DeliveryRouter.__init__` is incorrect: Expected `dict[Platform, Any]`, found `Unknown | None`
tests/cron/test_scheduler.py:2776: [invalid-assignment] invalid-assignment: Object of type `def in_flight_cancel() -> Unknown` is not assignable to attribute `cancel` of type `def cancel(self) -> bool`
✅ Fixed issues: none
Unchanged: 6083 pre-existing issues carried over.
Diagnostics are surfaced as warnings — this check never fails the build.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Answers the version-parity check: after #405 the fork was 9 commits behind upstream (normal daily drift). These 9 include fixes in BOTH areas we don't own (apps/desktop) and areas we do (cron, mcp) — exactly the upstream changes we want. Pulling them brings the fork to 0 behind.
What's in the 9
action='run'(#41037), Telegram DM-topic delivery routing, live-adapter delivery confirmation.Conflict (1)
tools/cronjob_tools.pyaction='run': upstream now executes the job immediately via_execute_job_now(#41037) instead of only scheduling it (trigger_job). Took upstream's behavior, kept our_reset_cron_failure(task_id).trigger_jobimport dropped by upstream;get_jobalready imported.Verification
git rev-list --count HEAD..upstream/main= 0 after merge.package.jsonstays 0.16.0 — deliberate fork branding (our docs/chore(fork): MIT license, fork metadata, English EVOLUTION_README, pytest marks #162), not a sync gap.test_script_timeoutfailure is the local os.kill live-guard, env-only). Full suite on CI.