Bug Report
Versions
- effect: 3.19.15
- @effect/cluster: 0.56.1
- @effect/platform-bun: 0.87.1
- @effect/sql-pg: 0.50.1
- Bun: 1.3.10
- PostgreSQL: 17
Description
When using BunClusterSocket.layer({ storage: "sql" }), the runner registers in PostgreSQL and receives shard assignments, but never transitions to entity/cron registration. The shard acquisition loop runs indefinitely.
Expected behavior
Runner starts, acquires shards, starts ClusterCron entities, logs "tick" every 10s.
Actual behavior
Runner logs Shard acquisition loop / New shard assignments / RunnerStorage sync forever. Entities never register. The runner is alive but non-functional.
Reproduction
git clone https://github.com/noktadev/effect-cluster-shard-stall-repro
cd effect-cluster-shard-stall-repro
bun install
bun run db:up
bun run start
# Wait 30+ seconds - no "tick" output appears
Repo: https://github.com/noktadev/effect-cluster-shard-stall-repro
Minimal code
import { ClusterCron, ClusterWorkflowEngine } from "@effect/cluster";
import { BunClusterSocket, BunRuntime } from "@effect/platform-bun";
import { PgClient } from "@effect/sql-pg";
import { Cron, Effect, Either, Layer, Redacted } from "effect";
const TickCron = ClusterCron.make({
name: "TickCron",
cron: Cron.parse("*/10 * * * * *").pipe(Either.getOrThrow),
execute: Effect.log("tick"),
});
const PgLive = PgClient.layer({
url: Redacted.make("postgres://postgres:postgres@localhost:25432/cluster_test"),
});
const RunnerLive = BunClusterSocket.layer({ storage: "sql" });
const EnvLayer = Layer.mergeAll(TickCron, ClusterWorkflowEngine.layer).pipe(
Layer.provideMerge(RunnerLive),
Layer.provideMerge(PgLive),
);
Layer.launch(EnvLayer).pipe(BunRuntime.runMain);
Observations
- PostgreSQL
cluster_runners table shows the runner registered with healthy: true
- Shard assignments ARE received (visible in DEBUG logs as
New shard assignments with shard IDs)
- The runner never progresses past shard acquisition to entity registration
SingleRunner.layer({ runnerStorage: "memory" }) works perfectly
- Truncating
cluster_runners/messages/replies/locks and restarting sometimes (but not always) resolves the stall
- Ghost runners accumulate across restarts (no deregistration on pod/process shutdown)
- Discovered in production Kubernetes with Bun-based runners
Bug Report
Versions
Description
When using
BunClusterSocket.layer({ storage: "sql" }), the runner registers in PostgreSQL and receives shard assignments, but never transitions to entity/cron registration. The shard acquisition loop runs indefinitely.Expected behavior
Runner starts, acquires shards, starts
ClusterCronentities, logs "tick" every 10s.Actual behavior
Runner logs
Shard acquisition loop/New shard assignments/RunnerStorage syncforever. Entities never register. The runner is alive but non-functional.Reproduction
Repo: https://github.com/noktadev/effect-cluster-shard-stall-repro
Minimal code
Observations
cluster_runnerstable shows the runner registered withhealthy: trueNew shard assignmentswith shard IDs)SingleRunner.layer({ runnerStorage: "memory" })works perfectlycluster_runners/messages/replies/locksand restarting sometimes (but not always) resolves the stall