Skip to content

sync: merge upstream/main (286 commits) — 2026-06-23#487

Merged
Lexus2016 merged 289 commits into
mainfrom
sync/upstream-2026-06-23
Jun 23, 2026
Merged

sync: merge upstream/main (286 commits) — 2026-06-23#487
Lexus2016 merged 289 commits into
mainfrom
sync/upstream-2026-06-23

Conversation

@Lexus2016

Copy link
Copy Markdown
Owner

Upstream sync: merge upstream/main (286 commits) — 2026-06-23

Authorship-first merge (git merge upstream/main, real merge commit, ancestry preserved). NOT auto-merged — escalated to owner review per the big-sync policy (>80 commits / core touch; PR #405 precedent).

  • Commits merged: 286 (origin was 286 behind, 273 ahead).
  • Real conflicts resolved: 11 (far fewer than the ~48 overlapping files — git auto-merged the rest).
  • Merge parents: 7cafa44a1 (origin/main) + 6cc07b6cd (upstream/main).

Conflict resolutions

File Resolution Notes
.github/workflows/build-windows-installer.yml UPSTREAM (delete) Upstream removed the unused job (c820eb6a5); origin only had dependabot bumps.
.github/workflows/docker-publish.yml MERGED Upstream's if: github.event_name != 'pull_request' build-on-main-only structure (2977e7454) + kept fork's newer dependabot action pins (buildx v4.1.0, build-push v7.2.0, setup-uv v8.2.0, login v4.2.0).
.github/workflows/tests.yml UPSTREAM Converted to workflow_call sub-workflow of new ci.yml orchestrator.
.github/workflows/supply-chain-audit.yml UPSTREAM Same: path-gating moved to ci.yml's detect job via new .github/actions/detect-changes. Fork's inline changes/#108 logic superseded.
scripts/release.py UNION AUTHOR_MAP: kept both fork sync-authors and upstream PR-salvage authors. 0 new duplicate keys introduced.
tools/lazy_deps.py UNION Kept fork's telemetry.otel (#167) + upstream's tool.computer_use.
tools/terminal_tool.py UPSTREAM (superset) Upstream wraps fork's watcher-routing in async_delivery_supported() guard (#10760). Fork routing fully preserved + bug fix gained.
hermes_cli/tools_config.py MERGED Took upstream's safer shell=use_shell + env=_cua_driver_env() (Windows uses argv list, no shell); kept fork's #165 security-review comment.
run_agent.py UPSTREAM See gemini-cli note below.
agent/agent_runtime_helpers.py UPSTREAM (1) memory mirror → notify_memory_tool_write (consolidates fork's on_memory_write, adds success-gating); (2) gemini-cli client branch removed — see below.
agent/conversation_loop.py UPSTREAM #39550 token-aware 413/context compression retry. Fork loop-guard #432/#436 + cron-digest hooks untouched (auto-merged, verified intact).

⚠️ Notable divergence resolved — google-gemini-cli provider (#50492)

Upstream deleted the google-gemini-cli / google-antigravity OAuth providers wholesale (#50492). The fork had this as a flagship feature wired across ~23 files. During the merge, git auto-merged away the fork's gemini-cli wiring in 13+ non-conflicting files (auth.py 12→0 refs, models.py 5→0, providers.py 4→0, …) and deleted agent/google_oauth.py — leaving only 2 conflicted hunks where the fork branch survived.

Keeping the provider would have required surgical re-integration across 13+ files + undeleting google_oauth.py — out of scope for a sync PR and a guaranteed source of perpetual future conflicts. Resolution: followed upstream's clean removal — reverted the 2 keep-ours hunks and dropped the fork's gemini_cloudcode_adapter.py / test_gemini_cloudcode.py / plans/gemini-oauth-provider.md. The merged tree now has zero google-gemini-cli references and is internally consistent.

👉 Owner decision required: if the google-gemini-cli provider must stay in the fork, re-add it as a dedicated feature branch, not in this sync.

Owner verification items

  1. Branch protection (from CI refactor): update required checks to require only the single all-checks-pass gate from the new ci.yml. The old per-job statuses (test (1..6), etc.) no longer report directly — leaving them required will permanently block PRs. (all-checks-pass uses if: always() and treats skipped as success, so the fork's [FIX] Markdown-only PRs are permanently BLOCKED: tests.yml paths-ignore vs required test (1-6) statuses #108 markdown-only-PR concern is handled.)
  2. google-gemini-cli removal: confirm dropping the provider is acceptable (see above).

Evolution features verified intact

Tests

  • tests/agent/test_system_prompt.py + tests/hermes_cli/test_tqmemory_setup.py: 29 passed
  • tests/agent/test_gemini_fast_fallback.py (+ above): 36 passed
  • Syntax OK on all 11 resolved files + critical evolution files; bash -n setup-hermes.sh OK
  • Import smoke test on the 5 de-gemini'd modules (auth, models, providers, provider_catalog, runtime_provider): all import cleanly (no dangling refs)
  • Full suite not run (large; left for CI under the new ci.yml).

🤖 Prepared by automated sync agent. Independent second opinions (Gemini via consilium) consulted on the CI-orchestration and gemini-cli removal decisions.

teknium1 and others added 30 commits June 21, 2026 11:27
… an empty summary (#50297)

When an OpenAI-compatible proxy (e.g. cmkey.cn, one-api Anthropic channels)
returns a well-formed HTTP 200 whose summary content is null or empty/
whitespace-only, _generate_summary coerced it to "" and stored a prefix-only
summary — silently replacing the compacted turns with nothing. The model then
lost all in-progress context after compression (#11978, #11914).

_validate_llm_response already guards None / empty-choices, so those never
reach the compressor; the gap was a well-formed response with empty *content*.
Now treat empty content as a summary failure: raise so it routes through the
existing main-model fallback then transient cooldown, dropping the turns
without a summary rather than wiping context with an empty one.

Also narrow the bare 'except RuntimeError' so only genuine 'No LLM provider
configured' errors take the 600s no-provider cooldown; empty/invalid-response
RuntimeErrors from a configured provider now correctly get the main-model
fallback instead of being misrouted into the long no-provider cooldown.

Reported by @Hung2124; area identified by @annguyenNous in #39590.
…#36908)

The 'Session compressed N times — accuracy may degrade' warning went
through _vprint (CLI stdout only), so the Ink TUI / Telegram / Discord
never saw it — unlike the two other compression warnings in the same
module, which route through _emit_status (and store _compression_warning
for late-bound gateway status_callback replay).

Set agent._compression_warning + call agent._emit_status() for this
warning too, matching the sibling pattern. _emit_status still _vprints
for the CLI, so CLI output is unchanged; TUI / gateway surfaces now
receive it via status_callback (and replay_compression_warning can
re-deliver it once a late-bound gateway callback is wired).

Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
Funnel session finalization through AIAgent.close() — the single terminal
path every agent (CLI, gateway, subagent, cron) funnels through — so finished
agents stop leaving rows with ended_at IS NULL. The biggest leak source was
delegate_task subagent + background-review forks whose close() never ended
their row.

end_session() is first-reason-wins and no-ops on an already-ended row, so a
'compression'/'cron_complete'/'cli_close' reason set by an earlier terminal
path is never clobbered. /resume already calls reopen_session(), so
finalizing-on-close does not break resumability.

Temporary helper agents that rotate/share the session forward (manual
compression, gateway session-hygiene) opt out via _end_session_on_close=False.

Also stop the long-running gateway heartbeat once the executor is done or the
session slot is rebound to a different agent, preventing a stale
'running: delegate_task' bubble from outliving its run.

Closes #12029.
The background-review fork (fires ~every 10 turns) pins
review_agent.session_id = agent.session_id — the parent's LIVE id — for
prefix-cache parity, then calls close(). With session finalization now in
close(), that would end the still-active parent session mid-conversation.
Set _end_session_on_close = False on the fork so the real owner (CLI close /
gateway reset / cron) finalizes the session instead.

Follow-up to the #12029 fix.
The TUI /compress slash side-effect compressed the session, synced the
key, and emitted session.info — but returned an empty string, so the
user saw no 'Compressed: N → M messages / ~X → ~Y tokens' feedback. The
CLI (_manual_compress) and gateway (slash_commands) paths both already
call summarize_manual_compression; the TUI slash path was the lone gap.

Snapshot history + rough token estimate before and after compaction and
return the formatted summarize_manual_compression() feedback, mirroring
the session.compress RPC handler. The estimate uses the same
estimate_request_tokens_rough(system_prompt, tools) inputs as the RPC
path, re-reading the system prompt after compaction (it may be rebuilt).

Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
PostgreSQL's initdb refuses to run as root, so the embedded Hindsight
daemon could never initialize its data directory under root. The
daemon-start thread would fail, retry, and loop forever — each cycle
reloading embedding models (~958MB RAM, ~33% CPU) with no user-visible
error, leaving Hermes sluggish on a common VPS/cloud root setup.

initialize() now detects root (os.geteuid() == 0) before spawning the
daemon thread, disables local_embedded mode, and surfaces a clear
warning to both the log and the terminal so the user knows to run as a
non-root user or switch to cloud / local_external mode.

Closes #13125.

Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
Bedrock Claude routes through the AnthropicBedrock SDK and injects
cache_control, so cached tokens are always reported — but the pricing
table had no cache cost fields for any Bedrock model, so /usage showed
"cost unknown" on every cached session. Also, cross-region inference
profiles (us./global./eu. prefixes) never matched the bare pricing keys.

- Add cache_read/cache_write rates to the four Bedrock Claude rows
  (read 0.1x input, write 1.25x input per the Bedrock pricing page).
- Normalize the cross-region prefix in the Bedrock pricing lookup,
  mirroring is_anthropic_bedrock_model's prefix list.

Closes #50295.
hermes config show printed the model dict raw via print(), bypassing the
logging redactor; a custom-provider api_key (e.g. Cloudflare cfut_...) was
shown in plaintext even with security.redact_secrets=true. Opaque tokens
don't match any vendor-prefix regex, so structural key-name masking is
required.

- Add redact_config_value(): recursively masks credential-shaped keys
  (api_key/token/secret/... exact-match) via mask_secret.
- Wrap the show_config model dump in it.
- Mask the set_config_value echo when the leaf key is credential-shaped
  (config set model.api_key routes to config.yaml, lowercase misses the
  .env allowlist).
The salvaged #19820 unifies the write_file guard under
_is_internal_file_tool_content with the message 'internal read_file
display text'. Two tests added to test_file_read_guards.py after the PR
branch point still asserted the old 'status text' wording. Update them
to match the new (correct, more general) message.
…stion

Inbound image/audio/video payloads were buffered fully into process memory
before being written to the cache, with no size limit. A large upload
(Discord Nitro allows 500 MB) or a remote media URL in an inbound message
pointing at a huge file could spike RAM and OOM-kill the gateway.

Enforce a configurable cap in the shared cache helpers (gateway/platforms/
base.py) so the protection holds across every platform adapter, not one:

- cache_image/audio/video_from_bytes reject oversized payloads before writing
  (video was the gap in the original report — now covered).
- cache_image/audio_from_url stream the body, rejecting on an oversized
  Content-Length header and re-checking the running total per chunk so an
  absent/lying header can't smuggle an unbounded body past the cap.
- Discord's _read_attachment_bytes checks att.size up front, so an oversized
  attachment is rejected before any bytes are pulled into memory.

Configurable via gateway.max_inbound_media_bytes in config.yaml (default
128 MiB; 0 disables). No new env var — non-secret config lives in config.yaml.

Salvaged and extended from @sgaofen's PR #13341 (the original report and the
shared-helper approach). Reapplied onto current main (Discord adapter has
since moved to plugins/platforms/discord/), the configurable knob moved from
an env var to config.yaml, and the video cache helper added.

Co-authored-by: Hermes Agent <noreply@nousresearch.com>
…timeout (#50312)

A turn forcibly interrupted by the drain-timeout escalation never reaches
turn_finalizer.finalize_turn (the only place that flushes the turn to
state.db). Its in-flight tool rounds live only in the in-memory
_session_messages, so the immediate pre-restart turn was silently dropped
from load_transcript() on resume.

_finalize_shutdown_agents now flushes _session_messages to the SQLite
session store before teardown. The flush is idempotent (identity-tracked
in _flush_messages_to_session_db), so agents that finished gracefully
re-flush nothing. The resume_pending / fresh-tool-tail branches in
_handle_message_with_agent already expect a transcript whose tail may be a
pending tool result.

Fixes #13121.
Rich messages are not ready for primetime: current Telegram clients can
render Bot API 10.1 rich messages as blank/unsupported bubbles and make
them hard to copy as plain text, which is worse than the legacy
MarkdownV2 path for command snippets and mobile handoffs. Default the
rich_messages toggle to False so replies stay on the copyable legacy
path; users opt in per bot via platforms.telegram.extra.rich_messages:
true. Updates adapter, gateway config default, example config, English +
zh-Hans docs, and the default/opt-in tests.
… (#50325)

hermes backup only walks HERMES_HOME, so memory providers that keep
config/credentials in home-anchored dotdirs (honcho -> ~/.honcho,
hindsight -> ~/.hindsight, openviking -> ~/.openviking) lost that data
across a backup/import cycle — the peer IDs, session pairings, and API
keys never made it into the archive.

Add an optional MemoryProvider.backup_paths() hook (default []). The
active provider declares its external paths; backup resolves them from
config only (no init, no network), archives the ones under the home dir
into a reserved _external/ subtree encoded relative to home, and import
restores them to their original location with a home-anchored traversal
guard and 0600 on credential-shaped files. Paths outside home are
skipped as non-portable.

honcho, hindsight, and openviking override the hook. E2E-validated full
backup->import cycle plus 7 new tests.
… DB corruption (#50331)

A shell-launched 'hermes gateway run --replace' / 'gateway restart' on a
systemd/launchd host can leave an orphan gateway whose kanban dispatcher
escapes the service cgroup, survives 'systemctl restart', and becomes a
second long-lived writer on the shared kanban.db. Two dispatchers that each
believe they own the file both pass SQLite busy_timeout and then race on WAL
frames — the documented root cause of multi-writer corruption (issue #35240).

The existing _guard_supervised_gateway_conflict startup guard blocks the
common way an orphan is born, but does nothing once a second dispatcher
already exists. This adds the defense-in-depth: dispatch_once now wraps every
tick in a non-blocking, board-scoped flock (_dispatch_tick_lock). A losing
dispatcher returns DispatchResult(skipped_locked=True) and does zero DB writes
this tick — so two dispatchers can never run a reclaim/spawn/write sequence
concurrently regardless of how the second one got there.

- Non-blocking (LOCK_NB): never stalls the gateway's async watcher.
- Board-scoped: lock file is a .dispatch.lock sibling of each board's
  kanban.db, so unrelated boards tick in parallel.
- POSIX + Windows (fcntl / msvcrt LK_NBLCK), no-op degrade where neither
  exists — mirrors the existing _cross_process_init_lock pattern.

Verified with a real two-process orphan repro: while a separate process holds
the lock, dispatch_once skips; after release it runs.
_collect_delegate_child_ids() walks the _delegate_from marker chain to
gather delegate subagents for cascade deletion, but started its visited
set empty. When the chain loops back onto a parent — a delegation cycle,
or a parent that is also another parent's delegate child when several ids
are deleted together — that parent was collected as one of its own
descendants and then permanently deleted, along with all of its messages,
by _delete_delegate_children().

Seed the visited set with the parent ids so they can never be re-collected,
and exclude them from the returned child set. Callers (delete_session,
bulk delete) remove the parents separately, so this only prevents the
unintended parent deletion; legitimate child collection is unchanged.

Add regression tests (in-memory sqlite) covering single/multi-level
delegate chains, the parent_session_id+marker branch, untagged children
(orphan-don't-delete contract), and the cycle case that previously leaked
the parent into the deletion set.

Fixes #49148
…HTTP path (#50319)

* fix(api-server): stop silently promising async delivery on stateless HTTP path

terminal(notify_on_complete=True / watch_patterns) and delegate_task(background=True)
silently no-op'd on the API server / WebUI path (#10760): the watcher / detached
child registered, but every API-server route (OpenAI-spec /v1/chat/completions
and /v1/responses, plus the proprietary /v1/runs SSE stream) tears down its
channel when the turn ends, and APIServerAdapter.send() is a no-op stub. A
completion that fires after the response closed had nowhere to go — from the
agent side, indistinguishable from a hang.

There is no spec-compliant surface to wake the agent later on a stateless HTTP
client, so make the no-op honest instead of silent:

- Add a per-adapter capability flag supports_async_delivery (default True;
  APIServerAdapter = False), propagated into a HERMES_SESSION_ASYNC_DELIVERY
  contextvar via async_delivery_supported(). Toggle on the adapter, not a
  hardcoded platform string — a future stateless adapter is correct-by-default.
- terminal: when delivery is unsupported, skip watcher registration, force
  notify_on_complete off, and return a notify_unsupported note telling the
  agent to process(action='poll').
- delegate_task: when delivery is unsupported, fall back to SYNCHRONOUS
  execution (work runs and returns in the same response) with a note, instead
  of handing out a handle that never resolves.

CLI (in-process completion_queue) and the real gateway platforms are unchanged.

Fixes #10760

* refactor(api-server): route session binding through a single no-delivery chokepoint

Add APIServerAdapter._bind_api_server_session() and route both agent-entry
paths (_run_agent for /v1/chat/completions + /v1/responses, and the /v1/runs
_run_sync path) through it. The helper hardwires platform="api_server" and
async_delivery=False with no async_delivery parameter to pass, so a future
route added to the API server physically cannot reintroduce the silent
no-op (#10760) by forgetting to mark the channel as non-delivering.

The binding stays request-scoped (cleared per turn), so a session resumed
later on a delivering interface (CLI / gateway platform) re-binds fresh and
is NOT blocked — the no-delivery decision tracks the interface handling the
current turn, never the session.
…cooldown

Closes #50185

Two independent gaps let a transient Photon/Spectrum upstream overflow
degrade message delivery and amplify gRPC pressure:

1. _is_retryable_error did not recognise Photon- or Envoy-specific error
   strings ("internal sidecar error", "upstream connect error",
   "reset reason: overflow"), so _send_with_retry fell through to the
   plain-text fallback immediately instead of backing off and retrying.

2. send_typing had no rate gate, so a burst of typing-indicator calls
   during an overflow event kept hitting the upstream gRPC connection and
   widened the failure window.

Fix:
- Add _PHOTON_RETRYABLE_PATTERNS with the three high-specificity Envoy /
  sidecar substrings and override _is_retryable_error on PhotonAdapter to
  check them after delegating to the base-class patterns.  base.py and all
  other adapters are untouched.
- Add a 5 s per-chat cooldown in send_typing backed by _typing_last_sent.
  stop_typing clears the entry so the next start after a completed turn
  fires immediately — only rapid consecutive starts without a stop are
  suppressed.
- Reduce PhotonAdapter._send_with_retry default max_retries from 2 to 1
  (single 2 s back-off check) — enough to confirm whether the Envoy
  circuit-breaker has opened, without adding unnecessary latency.

All changes are scoped to plugins/platforms/photon/adapter.py.
When the Node spectrum-ts sidecar process exited mid-session (crash,
OOM, upstream overflow escalation), _supervise_sidecar returned
silently — readline hit EOF, the log-pump loop broke, and nothing
notified the gateway. _inbound_loop entered an infinite retry loop
against a dead port, _running stayed True, and the adapter remained
in self.adapters with no path to self-recovery short of a manual
gateway restart.

Add a death-detection tail to _supervise_sidecar: after the log-pump
exits (EOF or exception), guard on _inbound_running to distinguish
unexpected death from a deliberate disconnect(). On unexpected exit,
call _set_fatal_error("SIDECAR_CRASHED", retryable=True) followed by
_notify_fatal_error() so the reconnect watcher picks up the platform
within 30 s and retries with exponential backoff (30 s → 300 s cap)
until the sidecar comes back up. All other platforms remain unaffected.

The _inbound_running guard is safe against races: disconnect() sets
_inbound_running = False before _stop_sidecar() cancels the supervisor
task. CancelledError is BaseException, not Exception, so it bypasses
the except clause and propagates normally — the detection block never
runs during a clean shutdown.
…tection

Follow-up for salvaged PR #50256. Unit tests for the three behaviors:
retryable classification of Envoy/sidecar overflow strings, per-chat typing
cooldown with stop_typing reset, and the _supervise_sidecar crash-detection
path that raises a retryable fatal (and the clean-shutdown no-op).
The read_file device guard now walks symlink hops before the file operation
layer, but that hop walk still interpreted relative paths against the Python
process cwd. In sessions where TERMINAL_CWD points at the task workspace, a
relative workspace symlink to a blocked alias such as /dev/../dev/stdin could
therefore miss the intermediate device target before later task-cwd resolution.

Anchor relative device checks to the task base before symlink-hop inspection so
the pre-I/O guard sees the same workspace path that read_file would otherwise
read. Absolute device paths and the existing final realpath fallback remain
unchanged.

Refs #10141
Refs #29158
…50341)

On resource-contended hosts the embedded Hindsight daemon can exceed a
single 2s /health check; upstream then waits a grace window before
treating it as stale and killing+restarting it (hindsight-embed reads
HINDSIGHT_EMBED_PORT_HEALTH_GRACE_TIMEOUT, default 30s, into a
module-level constant at import time). Users on busy boxes had no
Hermes-side way to raise it short of hand-setting an env var.

Add a 'port_health_grace_timeout' config.json option to the Hindsight
plugin. When set, initialize() exports it to the process env BEFORE
daemon_embed_manager is imported (the import-time read is the contract).
setdefault() so an explicit operator env override always wins. Exposed
in 'hermes memory setup' for local_embedded mode.

Follow-up to #50308 / issue #13125 comment thread.
…imeout

Fixes a regression introduced by the prior approach (synchronous import
hermes_cli.gateway inside _lifespan) that caused a new failure mode:
the blocking import stalled the asyncio event loop before uvicorn could
bind its port, pushing HERMES_DASHBOARD_READY past the desktop shell's
45 s announcement deadline and triggering a respawn loop that accumulated
orphaned backend processes.

Two-part fix:

_lifespan: replace the blocking import with a fire-and-forget
run_in_executor call (_warm_gateway_module).  The import runs in a
worker thread while the server socket is already open, so
HERMES_DASHBOARD_READY fires without delay.

get_status: replace the inline lazy import with
await run_in_executor(None, _resolve_restart_drain_timeout).  This is
the root fix for the original 15 s socket-timeout: the blocking
.pyc-compilation + Defender scan is offloaded to a thread, keeping the
event loop free for every /api/status probe.  After the first call the
module is in sys.modules and the executor returns in microseconds.

Both helpers are extracted as module-level sync functions so they can
be unit-tested independently of FastAPI or uvicorn.

Closes #50209

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three tests covering the scenarios from issue #50209 that could not be
validated with real Defender on a fresh install:

1. test_lifespan_warmup_is_nonblocking
   Patches _warm_gateway_module to sleep 3 s. Measures TestClient startup
   time — must complete in < 1.5 s, proving the fire-and-forget
   run_in_executor does not block the event loop before port binding
   (HERMES_DASHBOARD_READY timing proxy).

2. test_get_status_does_not_block_event_loop
   Patches _resolve_restart_drain_timeout to sleep 3 s. Fires concurrent
   GET /api/status and GET /api/version requests. /api/version must
   respond in < 3 s while /api/status waits — proving the event loop
   stays free during the slow import (15 s socket timeout would not fire).

3. test_concurrent_status_probes_all_respond
   Three simultaneous /api/status probes with the slow patch — all must
   return HTTP 200 (no connection resets, no orphan accumulation).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The port-announcement clock in waitForDashboardPort starts the instant the
backend process is spawned — before uvicorn binds its socket. On a cold
install the child first compiles and imports the whole hermes_cli.main ->
web_server -> FastAPI/uvicorn chain, and on Windows real-time AV scans every
freshly written .pyc. That pre-bind cost can exceed the old hardcoded 45s
deadline, so the desktop killed a healthy-but-still-starting backend and
respawned it, piling up orphaned processes (#50209).

Raise the default to 90s and make it overridable via
HERMES_DESKTOP_PORT_ANNOUNCE_TIMEOUT_MS, clamped to a 45s floor so a bad
override can't reintroduce the loop. Warm starts still announce in well under
a second; both call sites inherit the new default with no change. Adds
backend-ready.test.cjs (wired into test:desktop:platforms).
…) (#50342)

Add a platform-neutral send-failure vocabulary so consumers can branch on a
typed category instead of substring-matching the raw provider message.

- base.py: SEND_ERROR_KINDS + classify_send_error() (too_long / bad_format /
  forbidden / not_found / rate_limited / transient / unknown), and an optional
  SendResult.error_kind field (defaults None — fully backward compatible).
- telegram.py: populate error_kind on send() failures; message_too_long keeps
  its existing error token plus error_kind='too_long'.

Purely additive: no behavioral change to the existing degrade-and-deliver
paths (MarkdownV2->plain-text fallback, overflow split, retry classification
all untouched). 22 new tests + 210 adapter regression tests green.
After the agent's final response, the '...typing' bubble persisted ~5s.
send() re-triggers send_typing() after every delivery so the bubble
survives intermediate progress messages (Telegram clears typing on each
delivered message). But that re-trigger also fired on the FINAL send,
re-arming Telegram's ~5s timer AFTER the gateway had already torn down
its typing-refresh loop — and Telegram exposes no stop-typing API, so
nothing cancelled it.

Gate the post-send re-trigger on the absence of metadata['notify'] (set
only on the final user-visible reply via _mark_notify_metadata). Both
the rich-message and legacy send paths are covered; intermediate
progress sends still re-trigger so the bubble stays alive mid-response.

Fixes #48678
OutThisLife and others added 21 commits June 22, 2026 19:23
Drop the inline-code border; halve the expanded tool block radius.
…leanup

fix(desktop): manual tool previews via status stack
…+ #50993) (#51116)

* Revert "fix(cron): scope job execution to its owning profile (#32091 follow-up) (#50993)"

This reverts commit 660e36f.

* Revert "fix(cron): anchor cron storage at the default root home (not the active profile)"

This reverts commit a5c09fd.
… project.facts RPC (#51259)

Follow-up to the coding-context posture (#43316): that PR detects each repo's
verify loop (manifests, package manager, exact test/lint/build commands, context
files) and bakes it into the system-prompt snapshot — but only as a string, for
the model. Non-prompt consumers (the desktop verify UI) had no way to read it
without re-sniffing and drifting from the prompt.

Split detection from rendering, keeping one source of truth:

- `detect_project_facts(root) -> ProjectFacts` (frozen) holds the structured
  facts; `_project_facts()` now renders it into the same snapshot lines, so the
  prompt block stays byte-identical (cache-safe).
- `project_facts_for(cwd)` resolves the workspace root (git, else marker) and
  returns the structured facts, or None outside a workspace.
- `project.facts` gateway RPC surfaces it to any client (desktop/TUI/ACP).

Tests assert the structured output and that the UI-facing commands never drift
from what the prompt block renders (one detector feeds both).
A "one-shot" is a single stateless model call that runs OUTSIDE any conversation:
it never touches session history, never breaks prompt caching, and returns plain
text. UI surfaces need this for small generative chores — a commit message from a
diff, a rename suggestion, a summary — where an agent turn would pollute the
thread and hand-rolling an LLM call at every call site would be worse.

- `agent/oneshot.py`: `run_oneshot(...)` over the existing auxiliary-client
  plumbing (same path as title generation). Two call shapes: explicit
  instructions/input, or a registered `template` + `variables` (templates own the
  prompt engineering so it stays consistent across CLI/TUI/desktop). Ships a
  `commit_message` template. Model selection inherits the live session via
  `main_runtime`, else the configured aux `task` backend.
- `tui_gateway/server.py`: `llm.oneshot` RPC (long-handler) inheriting the
  session's model when `session_id` resolves.

Stateless by construction — no session mutation, cache untouched.
… management plane (#51248)

The gateway half of Phase 6 Unit ζ: project the agent's existing relevance
knobs into the connector's platform-agnostic vocabulary and declare them at boot
over the /relay/policy route, so the SAME mention-gating / free-response /
allow-bots behavior the agent applies directly also governs relay delivery (and
excluded chatter never wakes a scaled-to-zero agent).

- gateway/relay/__init__.py:
  - relay_relevance_policy(): project require_mention -> requireAddress,
    free_response_channels -> freeResponseScopes, {PLATFORM}_ALLOW_BOTS in
    {mentions,all} -> allowOtherBots. Reads the fronted platform's config block
    + bridged top-level keys. Returns None when all-default (the connector's
    quiet default already matches) or no concrete platform is fronted.
  - send_relay_policy(): POST /relay/policy authenticated with the gateway's own
    per-gateway upgrade token (make_upgrade_token — same bearer as the WS
    upgrade), so the connector attaches it to the authenticated instance, never
    a body-asserted id. Re-declares every boot (self-healing, full replace).
    NEVER raises, NEVER blocks boot — relevance is an optimization layered on
    the δ/ε authorization gate. Reuses the per-gateway secret + the
    /relay/provision host; no new inbound surface, no new credential.
  - _policy_url(): ws(s)://…/relay -> http(s)://…/relay/policy.
- gateway/run.py: call send_relay_policy() after register_relay_adapter()
  succeeds (the secret is resolved by then).
- docs/relay-connector-contract.md: new §7 documenting per-instance delivery +
  the management plane (/manage/* + /relay/policy) + the relevance-declaration
  contract; versioning renumbered to §8. Contract conformance test stays green
  (§2/§3 tables untouched).

Tests: +12 (projection mapping incl. comma-string + top-level fallback; send
auth/skip/fail-soft/non-200). Full relay suite 118 pass. The connector route is
already E2E-proven (connector repo gateway_policy_driver.py); this adds the real
gateway send-path it pairs with.

This completes Phase 6 (Team Gateway per-user isolation) end to end.
…ailing

Slack in-app voice clips ("record a clip") arrive as MP4/AAC containers
(mimetype audio/mp4, filename audio_message*.mp4), and Slack sometimes
labels them video/mp4. The inbound audio handler derived the cache
extension from the mimetype and fell back to ".ogg" for anything not in
{.ogg,.mp3,.wav,.webm,.m4a} — so audio/mp4 voice messages were cached as
.ogg. OpenAI STT (whisper-1, gpt-4o-transcribe) sniffs the container from
the FILENAME extension, so it received MP4 bytes named .ogg and rejected
them. WhatsApp .ogg and uploaded .m4a worked only because their extension
happened to match the bytes.

Fix:
- _resolve_slack_audio_ext(): pick the cache extension from the real
  filename first, then a mimetype map (audio/mp4 -> .m4a), defaulting to
  .m4a — never the bogus .ogg fallback. Mirrors the video branch and the
  audio map already in gateway/platforms/bluebubbles.py.
- _is_slack_voice_clip(): detect audio-only clips mislabeled video/mp4
  via the slack_audio subtype / audio_message* filename, and route them
  through the audio path (cached as audio, reported as audio/*) so they
  reach STT instead of video understanding. Genuine videos (and
  slack_video screen recordings) are left on the video path.

Verified end-to-end against a real audio-only MP4: old path cached it as
.ogg (ffprobe shows MP4 bytes -> container mismatch -> OpenAI rejects);
new path caches it as .mp4 (extension matches bytes -> accepted).

Adds inbound-audio tests (previously none): helper unit tests plus
_handle_slack_message E2E coverage for audio/mp4, video/mp4-mislabeled
voice clips, and a real video staying on the video path. Confirmed the
two voice-message tests fail without the fix (mutation check).
Follow-up to the salvaged voice-clip fix: the rerouted video/mp4 branch
used {".m4a": "audio/mp4"}.get(ext, "audio/mp4"), whose sole key's value
equals the default, so it always returned "audio/mp4" regardless of the
cached extension (dead lookup + a throwaway dict per inbound voice clip).

Replace it with a module-level _SLACK_EXT_TO_AUDIO_MIME map so the reported
media_type matches the bytes we cached (e.g. a clip cached as .wav now
reports audio/wav instead of audio/mp4). STT routing already keys on the
audio/ prefix + cached filename extension, so behavior is unchanged; this
just removes the dead construct and keeps the reported mimetype coherent.
…(#51121)

A Medium-integrity Hermes agent cannot drive High-integrity (admin)
windows on Windows — UIPI blocks UIA enumeration and mouse injection
(SOM returns 0 elements, clicks silently no-op, screenshots still work,
keyboard partially bypasses). OS constraint affecting every Windows
automation stack, not a cua-driver bug. Document the symptom + the
run-elevated workaround. Closes #49067.
Heavy PR checks run on every PR because the workflows deliberately avoid
`on.paths` filters — a path-gated workflow leaves its required check pending
forever when no matching file changes, blocking merge. So a docs-only PR
still spins up the TypeScript matrix, the full Python suite, and ruff/ty.

Keep every workflow triggering on every PR (checks always report) but gate
the expensive *steps* on what the PR touches. Skipping a step (not the job)
leaves the job green, so required checks never hang — the same idiom already
proven in contributor-check.yml.

A classifier (scripts/ci/classify_changes.py) maps the PR diff to three
lanes — python, frontend, site — surfaced as step outputs by a composite
action (.github/actions/detect-changes). Fail-open: an empty diff or any
.github/ change runs everything; python is a denylist (skipped only when
every file is provably prose or a frontend-only package); skills/**/SKILL.md
counts as python-relevant since the skill-doc tests read that tree. Non-PR
events always run the full pipeline.
The image build + smoke test + integration suite are the heaviest jobs in CI
(~9-11 min) and ran on every PR. Gate them to push-to-main and release: a
broken build surfaces on the main push, while the cheap pre-merge guards
(docker-lint hadolint/shellcheck, uv-lockfile-check) still run on PRs to
catch the common Dockerfile/lockfile breakage. Steps skip on PRs so the job
stays green; the dead PR-only arm64 cache-warm build is removed.
`npm ci` / `uv sync` / toolchain header fetches occasionally die on
transient network blips — e.g. node-pty's node-gyp fetching Node headers
(an undici assert) during the typecheck job's `npm ci`, which killed the job
before `tsc` ever ran. "Re-run and it goes green" is exactly what CI should
do itself.

- New reusable `.github/actions/retry` composite action wraps a command and
  retries on failure (3x / 10s, command passed via env so it can't inject).
  Applied to every PR-path network install: npm ci (typecheck, desktop
  build, docs site), uv sync (tests, e2e), uv tool install (lint),
  pip install (docs site).
- typecheck now runs `npm ci --ignore-scripts`: `tsc` needs only sources +
  type defs, so skipping install scripts drops node-pty's native rebuild
  (whose header fetch was the flake) and is faster. Validated locally — tsc
  passes for ui-tui, apps/shared, and apps/desktop with scripts skipped.
- ripgrep download uses `curl --retry`.

Docker (main-only) and the release/windows workflows are intentionally left
for a follow-up.
ci: centralize path-gating behind single orchestrator + all-checks-pass
gate

Replace the scattered per-workflow detect-changes pattern with a single
ci.yml orchestrator that runs the classifier once, then conditionally
calls sub-workflows via workflow_call based on lane outputs. A final
all-checks-pass job (if: always()) aggregates all results so branch
protection only needs to require one check.

Changes:
- New .github/workflows/ci.yml orchestrator (detect + conditional calls
  + all-checks-pass gate)
- Extend classify_changes.py with scan/deps/mcp_catalog lanes, absorbing
  supply-chain-audit's internal changes job
- Update detect-changes/action.yml to expose the new lane outputs
- Convert all 10 PR-gated sub-workflows to workflow_call-only triggers,
  removing their push/pull_request triggers and per-step detect-changes
  guards (gating now happens at the orchestrator level)
- lint.yml + supply-chain-audit.yml receive event_name as a
workflow_call
  input to replace github.event_name (which is "workflow_call" inside
  called workflows)
- supply-chain-audit.yml: remove internal changes job + *-gate jobs
  (orchestrator handles gating, booleans arrive as inputs)
- contributor-check.yml: remove internal filter step
- Update test_classify_changes.py for 6-lane output + new supply-chain
  test cases
…verride

The installer scanned PATH/well-known locations for a Chrome/Chromium binary
and, when found, skipped the bundled Playwright Chromium download and wrote that
path into ~/.hermes/.env as AGENT_BROWSER_EXECUTABLE_PATH. On Snap-based systems
`command -v chromium` resolves to /snap/bin/chromium, whose sandbox blocks
agent-browser's control socket under /tmp -- so every browser_navigate hung
until the 60s timeout fired ("opening web page failed").

Drop the system-browser fallback entirely (per maintainer direction):
find_system_browser()/Find-SystemBrowser now honor ONLY an explicit, user-set
AGENT_BROWSER_EXECUTABLE_PATH override -- no PATH scan, no well-known-path scan.
A /snap/* path is rejected even when set explicitly, since its confinement is
the bug. Applied to both install.sh (Linux/macOS) and install.ps1 (Windows).

Crucially, also auto-repair already-affected installs: the bad snap path
persists in .env and is read directly by the runtime, and the installer skips
re-config when AGENT_BROWSER_EXECUTABLE_PATH is already set ("already
configured"), so a plain reinstall/update never recovered an existing user. New
strip_snap_browser_override() removes a snap-pointing AGENT_BROWSER_EXECUTABLE_PATH
(and its auto-written comment) from .env on every install/update, run from both
browser-setup paths (install_node_deps and ensure_browser), so updating is
enough to recover. A deliberately-set non-snap override is left untouched.

docker/stage2-hook.sh is intentionally untouched: it discovers the bundled
Playwright Chromium, not a system browser.
…epair

Replace the old "skips download when a system browser exists" assertions with
tests for the new behavior:
- no PATH scan for browser command names, and the "use the system browser" path
  is gone;
- find_system_browser consults only an explicit AGENT_BROWSER_EXECUTABLE_PATH
  override (which still skips the bundled download);
- strip_snap_browser_override runs on both install paths and a /snap/* path is
  rejected, so already-affected installs auto-recover on update.
…tyle (#51168)

Adds a per-platform display.reasoning_style setting (code | blockquote |
subtext) controlling how the show_reasoning summary renders on the gateway.
Discord defaults to "subtext" (-# small grey metadata text); every other
platform keeps the fenced code block. Resolves through the existing
display.platforms.<platform>.reasoning_style override chain.
Authorship-first upstream sync. 11 real conflicts resolved (see PR body).

KEEP OURS (fork evolution features preserved):
- scripts/release.py AUTHOR_MAP, tools/lazy_deps.py telemetry.otel (#167):
  union-merged with upstream additions.
- hermes_cli/tools_config.py: kept #165 security review trail, took upstream's
  safer use_shell + env=_cua_driver_env() form.
- prompt_builder.py / system_prompt.py / tqmemory_setup.py: auto-merged clean,
  verified #485 guidance + TQMEMORY_PROJECT_ROOT intact.
- conversation_loop.py loop-guard #432/#436 + cron digest hooks intact.

TAKE UPSTREAM (upstream-owned infra / supersets):
- .github/workflows/{tests,supply-chain-audit}.yml: workflow_call orchestration
  via new ci.yml + detect-changes action (replaces fork's inline #108 changes job).
- .github/workflows/docker-publish.yml: upstream PR-build structure + fork's
  newer dependabot action pins.
- build-windows-installer.yml: accepted upstream deletion (#c820eb6a5).
- tools/terminal_tool.py: upstream superset (fork watcher routing + #10760 guard).
- agent_runtime_helpers.py memory mirror: upstream notify_memory_tool_write
  (consolidates fork on_memory_write; adds success-gating).
- conversation_loop.py: upstream #39550 token-aware compression retry.

FOLLOW UPSTREAM REMOVAL — google-gemini-cli provider (#50492):
  Upstream removed this provider across 13+ files in non-conflicting ways
  (auto-merged) + deleted agent/google_oauth.py. Keeping the fork's provider
  would require surgical re-integration across 13+ files (out of scope for a sync
  PR, perpetual future conflicts). Followed upstream's clean removal: reverted the
  2 keep-ours hunks (run_agent.py, agent_runtime_helpers.py) and dropped the
  fork's adapter/test/plan. Tree is now consistent; gemini-cli tests no longer
  exist. ** Owner: if the gemini-cli provider must stay, re-add it as a dedicated
  feature branch, NOT in this sync. **

NOT auto-merged into main — escalated to owner review per big-sync policy (PR #405 precedent).
@github-advanced-security

Copy link
Copy Markdown
Contributor

You are seeing this message because GitHub Code Scanning has recently been set up for this repository, or this pull request contains the workflow file for the Code Scanning tool.

What Enabling Code Scanning Means:

  • The 'Security' tab will display more code scanning analysis results (e.g., for the default branch).
  • Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results.
  • You will be able to see the analysis results for the pull request's branch on this overview once the scans have completed and the checks have passed.

For more information about GitHub Code Scanning, check out the documentation.

@github-actions

github-actions Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

🔎 Lint report: sync/upstream-2026-06-23 vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 11396 on HEAD, 11215 on base (🆕 +181)

🆕 New issues (160):

Rule Count
unresolved-import 57
unresolved-attribute 41
invalid-argument-type 19
invalid-assignment 14
unsupported-operator 10
not-subscriptable 9
invalid-method-override 2
not-iterable 2
invalid-return-type 2
call-non-callable 1
unresolved-reference 1
invalid-parameter-default 1
no-matching-overload 1
First entries
tests/tools/test_approval_interrupt.py:131: [invalid-assignment] invalid-assignment: Object of type `() -> dict[str, int]` is not assignable to attribute `_get_approval_config` of type `def _get_approval_config() -> dict[Unknown, Unknown]`
tests/gateway/test_whatsapp_bridge_pidfile.py:179: [unresolved-attribute] unresolved-attribute: Attribute `readline` is not defined on `None` in union `IO[Any] | None`
plugins/memory/mem0/_backend.py:163: [unresolved-import] unresolved-import: Cannot resolve imported module `psycopg2`
tests/gateway/test_approval_prompt_redaction.py:114: [unresolved-attribute] unresolved-attribute: Object of type `AST` has no attribute `lineno`
tests/gateway/test_whatsapp_bridge_pidfile.py:85: [unsupported-operator] unsupported-operator: Operator `+` is not supported between objects of type `int | None` and `Literal[1]`
tools/process_registry.py:541: [invalid-argument-type] invalid-argument-type: Argument to constructor `float.__new__` is incorrect: Expected `str | Buffer | SupportsFloat | SupportsIndex`, found `Unknown | int | str | ... omitted 16 union elements`
tests/honcho_plugin/test_oauth_flow.py:17: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/hermes_cli/test_web_server_boot_handshake.py:29: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/hermes_cli/test_update_zip_atomic_replace.py:17: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/gateway/test_tui_approval_redaction.py:14: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tools/process_registry.py:540: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `bound method str.__getitem__(key: SupportsIndex | slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> str` cannot be called with key of type `Literal["daemon_term_grace_seconds"]` on object of type `str`
tests/gateway/test_whatsapp_to_jid.py:10: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
hermes_cli/tools_config.py:3360: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `dict[Unknown, Unknown]` and `str | list[dict[str, str | list[Unknown]] | dict[str, str | list[Unknown] | bool | list[str]] | dict[str, str | list[dict[str, str]]]] | list[dict[str, str | list[Unknown] | bool | list[str]] | dict[str, str | list[dict[str, str]]]] | ... omitted 5 union elements`
tools/computer_use/cua_backend.py:390: [invalid-argument-type] invalid-argument-type: Argument is incorrect: Expected `str`, found `Any | None | Literal[""]`
plugins/memory/mem0/_setup.py:855: [invalid-assignment] invalid-assignment: Invalid subscript assignment with key of type `Literal["_mode_from_flag"]` and value of type `Literal[False]` on object of type `dict[str, str]`
tests/hermes_cli/test_kanban_lifecycle_hooks.py:131: [unresolved-attribute] unresolved-attribute: Attribute `status` is not defined on `None` in union `Task | None`
tests/hermes_cli/test_goals.py:1339: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["run pytest"]` and `str | None`
plugins/memory/honcho/oauth_flow.py:270: [invalid-method-override] invalid-method-override: Invalid override of method `log_message`: Definition is incompatible with `BaseHTTPRequestHandler.log_message`
tests/hermes_cli/test_web_server_boot_handshake.py:95: [unresolved-import] unresolved-import: Cannot resolve imported module `anyio`
tools/url_safety.py:359: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["%"]` and `str | int`
tests/honcho_plugin/test_oauth_flow.py:337: [unresolved-import] unresolved-import: Cannot resolve imported module `fastapi`
tests/hermes_cli/test_goals.py:1078: [invalid-assignment] invalid-assignment: Object of type `int | float` is not assignable to attribute `waiting_until` on type `GoalState | None`
tests/gateway/test_whatsapp_bridge_pidfile.py:20: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tools/browser_tool.py:1354: [unresolved-import] unresolved-import: Cannot resolve imported module `psutil`
tests/plugins/memory/test_mem0_setup.py:6: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
... and 135 more

✅ Fixed issues (28):

Rule Count
unresolved-attribute 13
unresolved-import 7
invalid-argument-type 4
invalid-assignment 3
unsupported-operator 1
First entries
agent/google_oauth.py:782: [unresolved-attribute] unresolved-attribute: Attribute `set` is not defined on `None` in union `Event | None`
tests/skills/test_google_oauth_setup.py:332: [invalid-argument-type] invalid-argument-type: Argument to function `module_from_spec` is incorrect: Expected `ModuleSpec`, found `ModuleSpec | None`
agent/google_oauth.py:187: [invalid-assignment] invalid-assignment: Object of type `None` is not assignable to `<module 'fcntl'>`
tests/skills/test_google_oauth_setup.py:365: [unresolved-import] unresolved-import: Cannot resolve imported module `googleapiclient`
tests/hermes_cli/test_tools_config.py:735: [invalid-argument-type] invalid-argument-type: Argument to function `_detect_active_provider_index` is incorrect: Expected `list[Unknown]`, found `str | list[dict[str, str | list[Unknown]] | dict[str, str | list[Unknown] | bool | list[str]] | dict[str, str | list[dict[str, str]]]] | list[dict[str, str | list[Unknown] | bool | list[str]] | dict[str, str | list[dict[str, str]]]] | ... omitted 4 union elements`
tests/skills/test_google_oauth_setup.py:108: [unresolved-attribute] unresolved-attribute: Unresolved attribute `Flow` on type `ModuleType`
hermes_cli/inventory.py:178: [invalid-assignment] invalid-assignment: Object of type `None` is not assignable to `def is_aggregator(provider: str) -> bool`
agent/google_oauth.py:235: [unresolved-attribute] unresolved-attribute: Module `msvcrt` has no member `locking`
tests/agent/test_gemini_cloudcode.py:334: [unresolved-attribute] unresolved-attribute: Attribute `refresh_token` is not defined on `None` in union `GoogleCredentials | None`
agent/google_oauth.py:235: [unresolved-attribute] unresolved-attribute: Module `msvcrt` has no member `LK_UNLCK`
agent/google_oauth.py:907: [invalid-assignment] invalid-assignment: Object of type `() -> Literal[True]` is not assignable to `def _can_open_graphical_browser() -> bool`
hermes_cli/cli_commands_mixin.py:994: [unresolved-attribute] unresolved-attribute: Object of type `Self@_handle_gquota_command` has no attribute `_console_print`
tests/agent/test_gemini_cloudcode.py:289: [invalid-argument-type] invalid-argument-type: Argument is incorrect: Expected `str`, found `Unknown | str | int`
tests/skills/test_google_oauth_setup.py:334: [unresolved-attribute] unresolved-attribute: Attribute `loader` is not defined on `None` in union `ModuleSpec | None`
tests/agent/test_gemini_cloudcode.py:246: [unresolved-attribute] unresolved-attribute: Attribute `project_id` is not defined on `None` in union `GoogleCredentials | None`
tests/skills/test_google_oauth_setup.py:13: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
agent/background_review.py:527: [invalid-argument-type] invalid-argument-type: Argument to `AIAgent.__init__` is incorrect: Expected `str`, found `(Any & ~AlwaysFalsy & ~Literal["codex_app_server"]) | None | Literal["codex_responses"]`
agent/gemini_cloudcode_adapter.py:523: [unresolved-attribute] unresolved-attribute: Attribute `get` is not defined on `None` in union `Any | None | dict[str, Any]`
plugins/memory/mem0/__init__.py:175: [unresolved-import] unresolved-import: Cannot resolve imported module `mem0`
tests/plugins/memory/test_mem0_v2.py:198: [unresolved-attribute] unresolved-attribute: Attribute `join` is not defined on `None` in union `None | Thread`
tests/agent/test_gemini_cloudcode.py:247: [unresolved-attribute] unresolved-attribute: Attribute `managed_project_id` is not defined on `None` in union `GoogleCredentials | None`
tests/skills/test_google_oauth_setup.py:366: [unresolved-import] unresolved-import: Cannot resolve imported module `google_auth_oauthlib`
tests/plugins/memory/test_mem0_v2.py:10: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/agent/test_gemini_cloudcode.py:22: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/skills/test_google_oauth_setup.py:109: [unresolved-attribute] unresolved-attribute: Unresolved attribute `flow` on type `ModuleType`
... and 3 more

Unchanged: 5870 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

…ibution

Two CI regressions on the sync/upstream-2026-06-23 PR, both resolved minimally:

1. tests/gateway/test_13121_shutdown_inflight_transcript_flush.py (`assert 7 == 5`):
   This is upstream's #13121 test, brought in by the merge. It sets up the flush
   dedup state via upstream's `_flushed_db_message_ids` (set of id()) API, but the
   fork's `_flush_messages_to_session_db` uses the fork's `_flushed_db_messages`
   (list of pinned objects, the #160 id-recycling fix) which the merge correctly
   kept. The set attribute was ignored → already-flushed prior_history was
   re-written → 7 rows instead of 5. Fix: adapt the upstream test's declarative
   dedup setup to the fork's canonical `_flushed_db_messages` API (pin the actual
   prior_history dicts). Flush production logic is byte-identical to origin/main —
   no flush regression, only the imported test spoke the wrong API.

2. contributor-check / check-attribution: the 286-commit merge range includes one
   upstream contributor whose email is a bare `<id>@users.noreply.github.com`
   (no `+username`, so the workflow's auto-skip regex misses it) and was not in
   AUTHOR_MAP. Added `wnuuee1` mapping — the exact fix the check prescribes.

Verified: test_13121 6/6 pass (CPython); contributor-check reproduction 0 unmapped.
NOTE: the "test (6)" shard failure shares root cause with #1 (same gateway test;
slice assignment is duration-cache-dependent). No other CPython-level regression
found — fork flush + compaction code unchanged by the merge.
#365/#368)

My earlier merge resolution for tools/terminal_tool.py was WRONG: I labeled
upstream's version a "superset" and took it wholesale (2746 lines), silently
dropping the fork's entire terminal streak-classification mechanism (#365/#368,
related to loop-guard #432) — the classifier module survived but its integration
in terminal_tool.py was lost. `tests/tools/test_terminal_failure_classifier.py`
(26 tests) failed with AttributeError: no attribute '_reset_terminal_streak'.

Fix: redid the resolution as a proper git 3-way merge of just this file
(merge-base ↔ origin/main ↔ upstream/main). Only ONE region conflicted — the
watcher-routing block — which correctly resolves to upstream (#10760
async_delivery_supported guard is a true superset of the fork's routing there).
Everything else auto-reapplied, restoring:
  - _increment_terminal_streak / _reset_terminal_streak / get_terminal_streak
  - `from tools.terminal_failure_classifier import ... streak_recommendation`
  - streak_recommendation() usage + terminal_streak telemetry in the failure paths
…while keeping upstream's #10760 watcher-routing guard AND the fork watcher
routing metadata. Result: 3149 lines (origin 3123 + upstream guard).

Verified: test_terminal_failure_classifier 26/26, test_terminal_tool 25/25,
test_13121 6/6 (CPython). Structural audit (top-level defs origin→HEAD across
201 fork-modified files) now shows only ONE remaining delta:
_check_cua_driver_asset_for_arch — which upstream DELETED on purpose
(#5f1d23cfb "delete broken pre-install asset probe") with a regression test
asserting its absence; left removed as intended (not a loss).
@Lexus2016 Lexus2016 merged commit 3e09f1f into main Jun 23, 2026
36 checks passed
@Lexus2016 Lexus2016 deleted the sync/upstream-2026-06-23 branch June 23, 2026 19:35
Lexus2016 added a commit that referenced this pull request Jun 24, 2026
…odel-resolution fix (#486)

Restore the evolution cached-digest pre-flight as a tracked feature and fix
the root bug that made it fail on prod with 'no model configured for
pre-flight ping'.

What:
- cron/evolution_preflight.py: lightweight non-streaming provider ping plus
  most-recent on-disk digest fallback for evolution pipeline cron stages
  (introspection/analysis/implementation/research/funnel/integration).
- cron/scheduler.py: integrate the pre-flight between runtime resolution and
  AIAgent construction. On ping failure, return the latest stale digest
  (graceful degradation) or raise if none exists. Purely additive — the
  #487 native model-resolution path is unchanged.

Why:
- When the configured provider is unreachable (e.g. Kimi timeout storms),
  evolution sessions burn retries/timeouts and deliver nothing. The pre-flight
  detects this fast and falls back to the last good digest so the pipeline
  keeps moving with stale-but-structured input instead of failing silently.

ROOT-FIX:
- preflight_provider() reads runtime['model'], but resolve_runtime_provider()
  never populates it — the scheduler resolves the model into a separate local
  'model' variable (job.model > HERMES_MODEL > config.yaml model.default) and
  passes it to AIAgent(model=...) directly. On prod runtime['model'] was empty
  so the ping always returned 'no model configured' and the cached-digest
  fallback could never trigger. Fixed by syncing runtime['model'] = model
  (the already-resolved local) before the ping, without clobbering an
  ACP-resolved model.

Provenance:
- This feature previously existed only as untracked local code on the prod
  host; git stash -u hid it and upstream-merge #487 overwrote scheduler.py
  with the native version. Now restored into git from that stash, with the
  root bug fixed.

Tested:
- tests/cron/test_evolution_preflight.py (27 tests) pass, including a new
  unit test and a scheduler-level test that reproduces the prod failure
  (runtime returned with NO model + empty job.model + config.yaml
  model.default) and asserts runtime['model'] is synced to the resolved
  default. Verified the new test FAILS without the root-fix (captured
  model == None) and PASSES with it.
- Adjacent suites green: test_scheduler_provider.py (29), scheduler
  model/runtime/run_job slice (77), test_run_one_job.py + codex paths (8).
Lexus2016 added a commit that referenced this pull request Jun 24, 2026
…odel-resolution fix (#486) (#490)

* feat(evolution): cached-digest provider pre-flight for cron stages, model-resolution fix (#486)

Restore the evolution cached-digest pre-flight as a tracked feature and fix
the root bug that made it fail on prod with 'no model configured for
pre-flight ping'.

What:
- cron/evolution_preflight.py: lightweight non-streaming provider ping plus
  most-recent on-disk digest fallback for evolution pipeline cron stages
  (introspection/analysis/implementation/research/funnel/integration).
- cron/scheduler.py: integrate the pre-flight between runtime resolution and
  AIAgent construction. On ping failure, return the latest stale digest
  (graceful degradation) or raise if none exists. Purely additive — the
  #487 native model-resolution path is unchanged.

Why:
- When the configured provider is unreachable (e.g. Kimi timeout storms),
  evolution sessions burn retries/timeouts and deliver nothing. The pre-flight
  detects this fast and falls back to the last good digest so the pipeline
  keeps moving with stale-but-structured input instead of failing silently.

ROOT-FIX:
- preflight_provider() reads runtime['model'], but resolve_runtime_provider()
  never populates it — the scheduler resolves the model into a separate local
  'model' variable (job.model > HERMES_MODEL > config.yaml model.default) and
  passes it to AIAgent(model=...) directly. On prod runtime['model'] was empty
  so the ping always returned 'no model configured' and the cached-digest
  fallback could never trigger. Fixed by syncing runtime['model'] = model
  (the already-resolved local) before the ping, without clobbering an
  ACP-resolved model.

Provenance:
- This feature previously existed only as untracked local code on the prod
  host; git stash -u hid it and upstream-merge #487 overwrote scheduler.py
  with the native version. Now restored into git from that stash, with the
  root bug fixed.

Tested:
- tests/cron/test_evolution_preflight.py (27 tests) pass, including a new
  unit test and a scheduler-level test that reproduces the prod failure
  (runtime returned with NO model + empty job.model + config.yaml
  model.default) and asserts runtime['model'] is synced to the resolved
  default. Verified the new test FAILS without the root-fix (captured
  model == None) and PASSES with it.
- Adjacent suites green: test_scheduler_provider.py (29), scheduler
  model/runtime/run_job slice (77), test_run_one_job.py + codex paths (8).

* test(evolution): skip anthropic pre-flight tests when package absent (#486)

CI test shard 5 lacks the optional 'anthropic' package, so
test_anthropic_success / test_anthropic_failure failed with
ModuleNotFoundError instead of being skipped. Add a per-test
pytest.importorskip('anthropic') guard so they SKIP when the
optional dependency is missing. Applied per-test (not class-level)
because the other TestPreflightProvider cases (openai, missing_api_key,
missing_model, acp, root-fix unit) do not touch anthropic and must
keep running.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.