Skip to content

Refactor wake registry sync and fix child wake delivery#4632

Open
KyleAMathews wants to merge 25 commits into
mainfrom
fix-child-wake-delivery
Open

Refactor wake registry sync and fix child wake delivery#4632
KyleAMathews wants to merge 25 commits into
mainfrom
fix-child-wake-delivery

Conversation

@KyleAMathews

@KyleAMathews KyleAMathews commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Refactor the agents-server wake registry onto TanStack DB/Electric collections and fix child wake delivery across the server and runtime. The user-visible impact is that parent agents reliably receive child completion wakes, cron wake setup is safer under concurrent tests, and wake-registry sync behavior is covered by isolated CI-stable integration tests.

Root Cause

Child wake delivery had multiple loss and flake points across the end-to-end path.

On the runtime side, the pull-wake runner intentionally avoids claiming multiple wakes for the same stream concurrently. When another wake arrived while that stream already had an active claim or handler, it was deferred, but deferred wakes were stored as a single event per stream path:

Map<string, PullWakeEvent>

That meant later deferred wakes for the same parent stream overwrote earlier ones.

On the server side, the wake registry was a manual ShapeStream-backed cache. Registration mutations, Electric visibility, and cache lifecycle had to be coordinated by hand, which made terminal runFinished wake evaluation vulnerable to stale or missing in-memory state. Follow-up CI failures also exposed a test isolation problem: integration tests that reset the shared Postgres/Electric backend can delete schema/data while other concurrently running tests are polling Electric or evaluating wakes.

Approach

The branch fixes both the runtime delivery issue and the server registry implementation.

  • The runtime same-stream wake notification buffer is now a FIFO queue per stream path:
Map<string, Array<PullWakeEvent>>

Queued same-stream wake notifications are used to trigger serialized claim attempts after the active stream claim drains. The next successful claim is expected to drain the stream’s available pending wake rows together, while the queue prevents losing trigger notifications and avoids concurrent claims for the same stream.

  • Heartbeat state now records the AbortSignal associated with the in-flight heartbeat. A stale heartbeat from an aborted runner cannot suppress heartbeat startup after restart.

  • The server wake registry now uses TanStack DB collections and optimistic actions over the wake_registrations table, backed by Electric sync. This removes the custom ShapeStream cache and the stale-cache reload fallback.

  • Agents server startup now requires an Electric URL for the wake-registry runtime instead of silently falling back to a non-syncing local load path.

  • Cron stream creation now tolerates a concurrent 409 if another caller created the stream after the existence check.

  • Sensitive integration tests now either avoid shared-backend resets or run on isolated Docker Compose backends/ports, preventing one test from dropping schema/data while another test is still using it.

Key Invariants

  • Server-side wake registrations are sourced from the synced wake_registrations collection.
  • Terminal child run events must produce parent wake rows for every matching registration.
  • Wakes for the same stream are never claimed concurrently.
  • Multiple same-stream wake trigger notifications are preserved while an active claim/handler is already running, so the runner will attempt another claim after the active work drains.
  • Queued same-stream wake trigger retries stop cleanly on runner shutdown.
  • A stale heartbeat promise from a previous run cannot clear or block newer heartbeat state.
  • Tests that reset Postgres/Electric state must own their backend or avoid racing with tests using the shared backend.

Non-goals

  • This does not change the persisted wake_registrations schema.
  • This does not change durable-stream subscription claim semantics.
  • This does not broaden runtime wake batching semantics beyond preserving queued same-stream wake notifications.
  • This does not attempt to make shared-backend schema resets safe under arbitrary concurrent test execution; the reset-owning tests are isolated instead.

Trade-offs

The main implementation choice was to replace the custom ShapeStream cache with TanStack DB rather than continue adding cache-reload fallbacks. TanStack DB adds explicit package dependencies to @electric-ax/agents-server, but it gives the registry a collection/effect model that better matches the rest of the agents stack and removes bespoke cache mutation logic.

For CI stabilization, isolating reset-heavy tests costs some additional Docker Compose setup and ports. That is preferable to serializing the whole package test suite or weakening assertions, because it keeps the tests representative while preventing cross-file data deletion races.

The runtime queue remains per stream path, matching the existing concurrency guard. This preserves the “one active claim per stream” behavior while fixing the trigger-notification loss from storing only one same-stream wake notification while a claim was active. It does not mean wake rows are handled one at a time; the next successful claim can still drain the stream’s available pending wake rows together.

Relation to the previous child wake fix

The previous fix in #4613 addressed a later runtime acking bug: when multiple wake rows were already present in one pending handler window, processWake selected one wake but acknowledged the whole window, effectively consuming sibling wake rows. That fix batches coalesced wake rows into one wake_batch payload.

This PR fixes earlier loss points:

  • Pull-wake delivery: preserve same-stream wake notifications while a parent stream already has an active claim/handler.
  • Server registry sync: use a TanStack DB/Electric-backed collection for wake registrations instead of a manual ShapeStream cache.
  • Test isolation: prevent reset-heavy integration tests from deleting data used by other tests under full-suite concurrency.

Verification

Targeted runtime verification:

cd packages/agents-runtime
pnpm exec vitest run test/pull-wake-runner.test.ts --reporter=dot

Targeted agents-server verification run during the branch:

pnpm --filter @electric-ax/agents-server test test/wake-registry.test.ts -t "reloads wake registrations" --run
pnpm --filter @electric-ax/agents-server test test/wake-registry.test.ts -t "delivers concurrent runFinished" --run
pnpm exec vitest run test/wake-registry-sync.test.ts --reporter verbose
pnpm exec tsc --noEmit

Changeset validation:

GITHUB_BASE_REF=main node scripts/check-changeset.mjs

Result:

✅ Changesets cover all affected packages: @electric-ax/agents-runtime, @electric-ax/agents-server

CI status for the latest pushed commit is green for all active checks.

Files changed

  • .changeset/fix-deferred-pull-wakes.md

    • Adds patch changesets for @electric-ax/agents-runtime and @electric-ax/agents-server.
  • packages/agents-runtime/src/pull-wake-runner.ts

    • Preserves same-stream wake notifications in a per-stream FIFO queue.
    • Uses queued same-stream wake notifications to trigger serialized claim attempts after the active stream claim drains.
    • Tracks the signal owning an in-flight heartbeat to avoid stale heartbeat suppression.
  • packages/agents-runtime/test/pull-wake-runner.test.ts

    • Adds regression coverage for multiple deferred wake events on the same stream.
  • packages/agents-server/package.json and pnpm-lock.yaml

    • Adds TanStack DB and Electric collection dependencies for the server wake registry.
  • packages/agents-server/src/wake-registry.ts

    • Refactors wake registration storage/sync to TanStack DB collections and optimistic actions over wake_registrations.
    • Removes the manual ShapeStream-backed registration cache path.
  • packages/agents-server/src/entity-manager.ts

    • Requires Electric-backed wake registry startup and tolerates concurrent cron stream creation.
  • packages/agents-server/src/host.ts

    • Requires an Electric URL before starting the wake-registry runtime.
  • packages/agents-server/test/*.test.ts

    • Updates wake-registry, host, server-start, pg-sync, Horton pull-wake, scheduler, and wake-registry-sync coverage for the TanStack DB registry and isolated-backend behavior.
  • docs/superpowers/plans/2026-06-19-wake-registry-tanstack-db.md

    • Adds the implementation plan for the wake-registry TanStack DB refactor.
  • docs/superpowers/specs/2026-03-23-wake-registry-tanstack-db-design.md

    • Adds the design/spec context for the wake-registry TanStack DB work.

Related: #4613

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Electric Agents Desktop Builds

Build artifacts for commit c73fb91.

Platform Status Artifact
macOS Apple Silicon Passed DMG
macOS Intel Passed DMG
Windows x64 Passed Installer
Linux x64 Passed AppImage / deb

Workflow run

@codecov

codecov Bot commented Jun 18, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 82.94574% with 66 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.53%. Comparing base (ee0da19) to head (c73fb91).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
packages/agents-server/src/wake-registry.ts 81.93% 57 Missing and 1 partial ⚠️
packages/agents-server/src/entity-manager.ts 54.54% 5 Missing ⚠️
packages/agents-runtime/src/pull-wake-runner.ts 96.15% 2 Missing ⚠️
packages/agents-server/src/host.ts 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4632      +/-   ##
==========================================
+ Coverage   59.46%   59.53%   +0.07%     
==========================================
  Files         385      385              
  Lines       43039    43163     +124     
  Branches    12383    12420      +37     
==========================================
+ Hits        25591    25696     +105     
- Misses      17371    17391      +20     
+ Partials       77       76       -1     
Flag Coverage Δ
packages/agents 72.64% <ø> (ø)
packages/agents-mcp 77.70% <ø> (ø)
packages/agents-mobile 80.67% <ø> (ø)
packages/agents-runtime 83.47% <96.15%> (+<0.01%) ⬆️
packages/agents-server 75.57% <80.89%> (+0.09%) ⬆️
packages/agents-server-ui 7.51% <ø> (ø)
packages/electric-ax 51.06% <ø> (ø)
packages/experimental 87.73% <ø> (ø)
packages/react-hooks 86.48% <ø> (ø)
packages/start 82.83% <ø> (ø)
packages/typescript-client 91.83% <ø> (+0.11%) ⬆️
packages/y-electric 56.05% <ø> (ø)
typescript 59.53% <82.94%> (+0.07%) ⬆️
unit-tests 59.53% <82.94%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Electric Agents Mobile Build

Local mobile checks ran for commit c73fb91.

The EAS Android preview build was skipped because the mobile-eas-build label is not present.
Add the mobile-eas-build label to this PR to produce an installable preview build.

Workflow run

@KyleAMathews KyleAMathews changed the title Fix deferred pull wake delivery Refactor wake registry sync and fix child wake delivery Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant