[pull] main from triggerdotdev:main#103
Merged
Merged
Conversation
…over (#3548) ## Summary During an ElastiCache role swap (failover) or node-type change (vertical scale), the ioredis TCP/TLS connection stays open but the server starts answering with `READONLY` (the client is talking to a node that became a replica) or `LOADING` (node still loading data from disk). Without an explicit hook, those errors surface to caller code as `ReplyError` instances — every write op on the affected connection fails until the cluster fully cuts over. This PR adds `reconnectOnError` to every prod ioredis client so the disconnect + reconnect + retry cycle absorbs these errors and caller code never sees them. ## Fix ```ts export function defaultReconnectOnError(err: Error): boolean | 1 | 2 { const msg = err.message ?? ""; if (msg.startsWith("READONLY") || msg.startsWith("LOADING")) return 2; return false; } ``` Returning `2` tells ioredis to disconnect, reconnect, and re-issue the failed command. After reconnect, DNS / SG state routes the new socket to a writable node. The helper lives in `@internal/redis` and is wired into both the shared `createRedisClient` (which covers RunQueue, schedule-engine, redis-worker, and every other internal-package consumer) and the direct `new Redis(...)` call sites in the webapp. V1-only marqs files are intentionally not migrated. ## Test plan - [x] `pnpm run typecheck --filter webapp` - [x] `pnpm run typecheck --filter @internal/run-engine` - [x] Verified end-to-end against a live ElastiCache vertical-scale event — caller-surfaced errors went from tens of thousands during the cutover window down to a handful per ioredis client - [ ] Confirm steady-state behavior unchanged after deploy
…3549) ## Summary When ElastiCache demotes a primary to replica — during a Multi-AZ failover or a vertical node-type change — the demoting primary issues an `UNBLOCKED` reply to any in-flight blocking commands (`BLPOP`, `BRPOP`, `BLMOVE`, `XREADGROUP ... BLOCK`, etc.) to clear them before the role flips. ioredis surfaces these as `ReplyError` to caller code. The shared `defaultReconnectOnError` added in #3548 only matches `READONLY` and `LOADING`. This extends it to `UNBLOCKED` so the disconnect-reconnect-retry cycle handles BLPOP-shaped errors the same way the existing two cases handle non-blocking-command errors. ## Fix ```ts export function defaultReconnectOnError(err: Error): boolean | 1 | 2 { const msg = err.message ?? ""; if ( msg.startsWith("READONLY") || msg.startsWith("LOADING") || msg.startsWith("UNBLOCKED") ) { return 2; } return false; } ``` Returning `2` tells ioredis to disconnect, reconnect, and re-issue the command. For a BLPOP that means a fresh BLPOP against the new primary instead of the `UNBLOCKED` error escaping to the caller. ## Test plan - [ ] CI green - [ ] Trigger a Multi-AZ failover or a vertical scale event on an ElastiCache replication group whose clients are running blocking commands and confirm no `UNBLOCKED` errors surface to caller code during the cutover.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )