Skip to content

fix(runner): always emit MESSAGES_SNAPSHOT to prevent compaction failures#1693

Merged
mergify[bot] merged 1 commit into
mainfrom
fix/messages-snapshot-missing
Jun 16, 2026
Merged

fix(runner): always emit MESSAGES_SNAPSHOT to prevent compaction failures#1693
mergify[bot] merged 1 commit into
mainfrom
fix/messages-snapshot-missing

Conversation

@markturansky

@markturansky markturansky commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Summary

  • C1 — grpc_transport.py: Skip gRPC messages with empty payloads instead of building an empty-content message. An empty-content message causes process_messages to return "", which triggers an early RunFinishedEvent before _stream_claude_sdk ever runs — so run_messages stays empty and MESSAGES_SNAPSHOT is never emitted. Affects both Operator-managed and control-plane-reconciled runner sessions.

  • C2 — run.py: Assign UUIDs to message dicts missing an id in to_run_agent_input. Without ids, upsert_message falls back to append on every call, creating duplicates in the snapshot for multi-turn sessions going through the gRPC dict message path.

  • C3 — adapter.py: Remove the if run_messages: guard around MESSAGES_SNAPSHOT emission. Runs that produce no assistant output (interrupted, halted on frontend tool, state-management-tool-only turns) still need a snapshot so compactFinishedRun can succeed. When run_messages is empty the snapshot contains the stamped input history, which is sufficient. Without a snapshot, compaction logs "session corrupted, keeping raw events" and chat history is lost after the tail-read optimization (ffe4a21).

Root cause

compactFinishedRun (backend agui_store.go) requires MESSAGES_SNAPSHOT to be present in the JSONL to perform compaction. If absent, it aborts and leaves raw streaming delta events in the file. On the next large-session tail-read (8e3bac3), those deltas may be missing from the head scan and the user sees blank chat history on reconnect.

The MESSAGES_SNAPSHOT gap was introduced when ambient-control-plane added a second reconciled runner path: control-plane sessions reach the runner via gRPC (GRPCSessionListener) rather than HTTP, and the gRPC path has different message-shape guarantees that exposed all three gaps simultaneously.

Test plan

  • Operator-managed session: send multiple messages, navigate away and back — chat history intact
  • Control-plane-reconciled session: same navigation test
  • Session where Claude calls only state-management tools (no text output): navigate away, verify prior history still present on return
  • Session interrupted mid-run (stop button): reconnect, verify history from previous complete turns is intact
  • Check runner logs: MESSAGES_SNAPSHOT log line should appear for every RUN_FINISHED/RUN_ERROR turn
  • Check backend logs: compaction log should show X raw events → Y snapshot events (not "session corrupted")

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • Bug Fixes
    • Improved handling of empty or malformed messages to prevent processing failures
    • Enhanced message snapshot construction to be more reliable across scenarios
    • Added automatic message ID generation to ensure data consistency
    • Strengthened validation for incoming message payloads with better error logging

…ures

Three fixes to ensure MESSAGES_SNAPSHOT is always emitted, which is required
for compactFinishedRun to succeed. Without it, sessions are marked corrupted
and chat history is lost after the JSONL tail-read optimization (ffe4a21).

- adapter.py: Remove `if run_messages:` guard around MESSAGES_SNAPSHOT
  emission. Runs that produce no assistant output (interrupted, halted on
  frontend tool, state-tool-only) still need a snapshot so compaction can
  succeed. When run_messages is empty the snapshot is the stamped input
  history, which is sufficient for compaction to find MESSAGES_SNAPSHOT
  and atomically replace the JSONL.

- grpc_transport.py: Skip empty-payload gRPC messages instead of building
  a message with empty content. An empty-content message causes process_messages
  to return "" which triggers an early RunFinishedEvent before _stream_claude_sdk
  runs, so run_messages stays empty and MESSAGES_SNAPSHOT is never emitted.
  Affects both Operator-managed and control-plane-reconciled runner sessions.

- run.py: Assign UUIDs to message dicts that lack an id in
  to_run_agent_input. Without ids, upsert_message falls back to append for
  every message, creating duplicates in the MESSAGES_SNAPSHOT for multi-turn
  sessions going through the gRPC dict message path.

Co-Authored-By: Claude <noreply@anthropic.com>
@netlify

netlify Bot commented Jun 16, 2026

Copy link
Copy Markdown

Deploy Preview for cheerful-kitten-f556a0 canceled.

Name Link
🔨 Latest commit 2412f5a
🔍 Latest deploy log https://app.netlify.com/projects/cheerful-kitten-f556a0/deploys/6a309928119a8c0009e843b7

@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2f549f75-7f65-4ce3-a3cd-2e0b6ba5eece

📥 Commits

Reviewing files that changed from the base of the PR and between 7e918dd and 2412f5a.

📒 Files selected for processing (3)
  • components/runners/ambient-runner/ag_ui_claude_sdk/adapter.py
  • components/runners/ambient-runner/ambient_runner/bridges/claude/grpc_transport.py
  • components/runners/ambient-runner/ambient_runner/endpoints/run.py

📝 Walkthrough

Walkthrough

Three defensive fixes in the ambient runner message pipeline: incoming messages are normalized with injected UUIDs when id is missing; the gRPC transport skips empty-payload user events with a warning; and MESSAGES_SNAPSHOT construction is unconditional, gated only on the final combined list being non-empty.

Changes

Ambient Runner Message Pipeline Robustness

Layer / File(s) Summary
UUID injection for incoming messages
ambient_runner/endpoints/run.py
to_run_agent_input() iterates self.messages and injects a UUID id into any dict missing one before constructing RunAgentInput.
Empty-payload guard in gRPC transport
ambient_runner/bridges/claude/grpc_transport.py
_handle_user_message returns early with a seq-keyed warning when msg.payload is absent or whitespace-only, preventing an empty bridge.run() turn.
Unconditional MESSAGES_SNAPSHOT construction
ag_ui_claude_sdk/adapter.py
Removes the run_messages-truthy gate; always builds enriched, stamps input_data.messages with the run-start timestamp, and skips yield only when all_messages is empty.
🚥 Pre-merge checks | ✅ 7 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (7 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title follows Conventional Commits format (fix(runner): ...) and accurately describes the main fix: ensuring MESSAGES_SNAPSHOT is always emitted to prevent compaction failures.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Performance And Algorithmic Complexity ✅ Passed Three changes analyzed: (1) adapter.py removes conditional guard around MESSAGES_SNAPSHOT emission—adds one O(N) loop at shutdown (N typically <100 msgs, not nested); (2) run.py adds UUID assignmen...
Security And Secret Handling ✅ Passed No security violations found. Caller token not logged; user_id sanitized before logging; UUIDs generated securely; no hardcoded credentials, SQL/command/path-traversal injection, K8s Secret OwnerRe...
Kubernetes Resource Safety ✅ Passed PR modifies only Python source code (adapter.py, grpc_transport.py, run.py); contains no Kubernetes manifests or resource definitions. Kubernetes Resource Safety check is not applicable.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/messages-snapshot-missing
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch fix/messages-snapshot-missing

Comment @coderabbitai help to get the list of available commands and usage tips.

@mergify mergify Bot added the queued label Jun 16, 2026
@mergify

mergify Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Merge Queue Status

  • Entered queue2026-06-16 00:52 UTC · Rule: default
  • Checks skipped · PR is already up-to-date
  • Merged2026-06-16 00:53 UTC · at 2412f5ad7cc89c83fc020141c4020c8e4033966b · squash

This pull request spent 12 seconds in the queue, including 2 seconds running CI.

Required conditions to merge

@mergify mergify Bot merged commit b85feee into main Jun 16, 2026
68 checks passed
@mergify mergify Bot removed the queued label Jun 16, 2026
@mergify mergify Bot deleted the fix/messages-snapshot-missing branch June 16, 2026 00:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant