Skip to content

Commit f8a0529

Browse files
dealakoclaude
andauthored
fix(server): add graceful shutdown for zero-downtime rolling deploys (LFXV2-1703) (#738)
* feat(server): add graceful shutdown for zero-downtime rolling deploys (LFXV2-1703) - Wire SIGTERM/SIGINT handlers in server.ts: capture http.Server, drain in-flight requests (25s window) before SIGKILL - Flip /readyz to 503 during shutdown so kube-proxy/Traefik stops routing new traffic to the terminating pod - Add shutdown.ts registry to coordinate ordered teardown hooks - Track active SSE streams on CopilotController; send event:shutdown frame and res.end() before HTTP drain - Add NatsService.shutdownAll() static method to drain all instantiated NATS connections concurrently - Set PM2 kill_timeout: 30000 (exceeds 25s in-app drain window) - Add Helm chart values: strategy (maxSurge 100%/maxUnavailable 0), terminationGracePeriodSeconds: 60, lifecycle.preStop sleep 10s - Expose all three as value-driven fields for per-env tuning Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: David Deal <ddeal@linuxfoundation.org> * fix(server): address review findings on graceful shutdown (LFXV2-1703) - Export markShuttingDown() from shutdown.ts; call it first in gracefulShutdown() so /readyz flips synchronously, removing the dual-flag race between server.ts and shutdown.ts - Increase PM2 kill_timeout to 45000ms (was 30000) so NATS/Snowflake drain has 15s instead of 5s after the 25s HTTP drain window - Add isShuttingDown() guard at copilot chat() entry to reject any connections that arrive during the LB race window - Replace res.write()+res.end() with res.end(payload) in closeAllStreams for atomic write+close; add logger.debug for swallowed stream errors - Document autorestart/process.exit(0) interaction in ecosystem.config.js - Update values.yaml comment to reflect corrected timing chain Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: David Deal <ddeal@linuxfoundation.org> * fix(server): address review findings on graceful shutdown (LFXV2-1703) - Wrap signal handlers with .catch() so unhandled rejections no longer silently crash the process before process.exit(0) is reached (C-001) - Enforce 15s service drain budget via Promise.race() so a hung NATS or Snowflake drain cannot outlast PM2's kill_timeout (C-002) - Bump terminationGracePeriodSeconds 60→75 and correct comment to reflect that preStop runs inside the grace period, not on top of it (C-003) - Add SnowflakeService.shutdownIfInitialized() to avoid lazy-creating a connection pool on pods that never served a Snowflake-backed route (C-004) - Wrap each shutdown hook in Promise.resolve().then() so synchronous throws are caught by allSettled; log failed hooks at ERROR level (C-005) - Make closeAllStreams() async and await write flush so SSE clients receive the shutdown event before closeAllConnections() destroys the socket (M-003) - Gate sendEvent() on isShuttingDown() to close the write-after-end window between res.end() and the async close event (M-005) - Move NatsService instance deregistration to finally block so a drain failure no longer leaves the connection untracked (M-004) - Return deregistration function from addShutdownHook() to prevent hook accumulation across multiple controller instantiations in tests (M-006) - Differentiate expected "already closed" stream errors (DEBUG) from unexpected errors (WARN) in closeAllStreams() (m-001) Jira: LFXV2-1703 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: David Deal <ddeal@linuxfoundation.org> * docs(chart): fix terminationGracePeriodSeconds default and timing note Update README row to match values.yaml: default 60→75 and timing note (35s)→(55s), reflecting 10s preStop + 45s PM2 kill_timeout = 55s. LFXV2-1703 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: David Deal <ddeal@linuxfoundation.org> * fix(review): address PR #738 review feedback Address review comments from copilot-pull-request-reviewer[bot]: - copilot.controller.ts: remove unused removeShutdownHook class field; addShutdownHook() return value discarded since the hook lives for the full process lifetime and deregistration is never needed in production - ecosystem.config.js: add stop_exit_codes: [0] so PM2 does not restart on a clean shutdown regardless of pm2 vs pm2-runtime context; update autorestart comment to reflect the actual intent Resolves 2 review threads. Thread on res.end(resolve) left open with explanation — Node.js end(cb) overload treats a function first-arg as the callback, not as data (confirmed by @types/node signature and runtime source). LFXV2-1703 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: David Deal <ddeal@linuxfoundation.org> * fix(review): address PR #738 review feedback (round 2) Address review comments from copilot-pull-request-reviewer[bot]: - server.ts: use consistent operation name 'graceful_shutdown' for both startOperation and success calls; the previous names 'shutdown_initiated' and 'shutdown_complete' split one operation across two names, violating the repo logging convention of pairing startOperation/success with the same operation identifier - server.ts: add rejection handlers to NATS and Snowflake drain .then() calls so drain failures emit a logger.warning instead of silently disappearing; previously a rejected drain promise would skip the success callback with no log entry at all Resolves 2 review threads. LFXV2-1703 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: David Deal <ddeal@linuxfoundation.org> * Fixed formatting Signed-off-by: David Deal <ddeal@linuxfoundation.org> * fix(review): address PR #738 review feedback (round 3) Address review comments from coderabbitai[bot] and copilot-pull-request-reviewer[bot]: - charts/lfx-self-serve/README.md: fix image pull secrets parameter name from image.pullSecrets to imagePullSecrets to match the actual top-level value path used in values.yaml and the deployment template (per copilot-pull-request-reviewer[bot]) - server.ts: add 5s budget to runShutdownHooks() call; without it a blocked SSE write (backpressure) can stall the entire shutdown sequence past PM2 kill_timeout (per copilot-pull-request-reviewer[bot]) - server.ts: expand raceDrain to accept a name and log a warning when the 15s budget fires before the drain completes, making timeout events visible in logs instead of silently succeeding (per copilot-pull-request-reviewer[bot]) - copilot.controller.ts: add 2s per-stream timeout in closeAllStreams(); if a client's TCP receive buffer is full (backpressure), res.write()'s flush callback stalls indefinitely; the timeout ensures the hook completes within budget regardless (per copilot-pull-request-reviewer[bot]) - Prettier CI failure flagged by coderabbitai[bot] is already resolved (format:check passes; previous commits reformatted affected files) Resolves 5 review threads. LFXV2-1703 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: David Deal <ddeal@linuxfoundation.org> * fix(review): address PR #738 review feedback (round 4) Address review comments from copilot-pull-request-reviewer: - apps/lfx-one/src/server/server.ts: add hooksCompleted flag to hooks budget race so shutdown_hooks_timeout is not spuriously logged when runShutdownHooks() completes before the 5s budget fires (mirrors completed guard in raceDrain) (per copilot-pull-request-reviewer) - apps/lfx-one/src/server/server.ts: rename shutdown_nats_drained → shutdown_nats_complete and remove dead failure handler; NatsService.shutdownAll() uses Promise.allSettled so always resolves — per-connection drain failures are logged at ERROR inside NatsService.shutdown() (per copilot-pull-request-reviewer) - apps/lfx-one/src/server/server.ts: rename shutdown_snowflake_drained → shutdown_snowflake_complete; SnowflakeService.shutdown() has internal try/catch so pool drain failures are logged inside the service — kept rejection handler since shutdownIfInitialized() can reject before the internal try/catch (per copilot-pull-request-reviewer) Resolves 3 review threads. Signed-off-by: David Deal <ddeal@linuxfoundation.org> * fix(server): add LB drain sleep and fix shutdown ordering (LFXV2-1703) Address review feedback from @emsearcy: - apps/lfx-one/src/server/server.ts: replace the Promise.race hook-budget block with Promise.all([lbSleep, hooks]) so the server waits a mandatory 15s (1.5× readyz periodSeconds) after flipping readyz to 503 before closing the HTTP listener — covers the window where the LB still routes requests despite the 503 response. SSE shutdown hooks run concurrently within the 15s window so clients are notified and can reconnect before HTTP closes. Added comment above NATS/Snowflake drain making explicit that they drain *after* HTTP (not before), preserving in-flight NATS request/reply calls. - apps/lfx-one/ecosystem.config.js: increase kill_timeout from 45s to 60s to account for the new 15s LB drain sleep (15s LB + 25s HTTP + 15s service + 5s margin = 60s); terminationGracePeriodSeconds (75s) still exceeds preStop (10s) + kill_timeout (60s) = 70s - charts/lfx-self-serve/values.yaml: update timing breakdown comment to include the 15s LB drain sleep line item; adjust active-time summary from 55s to 70s and safety margin from 20s to 5s - charts/lfx-self-serve/README.md: update terminationGracePeriodSeconds timing note from (55s) to (70s) to match new kill_timeout Signed-off-by: David Deal <ddeal@linuxfoundation.org> * fix(review): address PR #738 review feedback (round 5) Signed-off-by: David Deal <ddeal@linuxfoundation.org> * fix(review): address PR #738 review feedback (round 6) Signed-off-by: David Deal <ddeal@linuxfoundation.org> * fix(review): address PR #738 review feedback - server.ts: replace `Promise.all([lbDrain, hooks])` with `await lbDrain` to hard-cap the hooks wait at LB_DRAIN_MS — a hung hook can no longer delay the HTTP drain beyond the 15 s LB window (per copilot-pull-request-reviewer[bot]) - server.ts: introduce `hooksStartTime` for `shutdown_hooks_error` log so `duration_ms` reflects time spent in hooks, not time since graceful shutdown began (per copilot-pull-request-reviewer[bot]) - server.ts: update comment block to describe best-effort / hard-ceiling semantics of the revised hooks/LB-drain sequence Resolves 2 review threads on PR #738. LFXV2-1703 Signed-off-by: David Deal <ddeal@linuxfoundation.org> --------- Signed-off-by: David Deal <ddeal@linuxfoundation.org> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent c985714 commit f8a0529

9 files changed

Lines changed: 334 additions & 20 deletions

File tree

apps/lfx-one/ecosystem.config.js

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,12 @@ module.exports = {
1515
max_restarts: 10, // Restart limit for unstable apps
1616
exp_backoff_restart_delay: 100, // Exponential backoff restart delay
1717
watch: false, // Disable file watching in production
18-
autorestart: true, // Auto restart on crashes
18+
autorestart: true, // Restart on crash (non-zero exit)
19+
stop_exit_codes: [0], // Do not restart on clean shutdown — process.exit(0) in gracefulShutdown
1920
instances: 1, // Number of instances to run
2021
exec_mode: 'cluster', // Enable cluster mode for load balancing
22+
kill_timeout: 60000, // 15s LB sleep + 25s HTTP drain + 15s service drain (budget-capped) + 5s margin; terminationGracePeriodSeconds (75s) must exceed preStop (10s, inside grace period) + kill_timeout (60s) = 70s
23+
shutdown_with_message: false, // Use real SIGTERM, not PM2 IPC message
2124
},
2225
],
2326
};

apps/lfx-one/src/server/controllers/copilot.controller.ts

Lines changed: 74 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,23 @@ import { NextFunction, Request, Response } from 'express';
77
import { ServiceValidationError } from '../errors';
88
import { CopilotService } from '../services/copilot.service';
99
import { logger } from '../services/logger.service';
10+
import { addShutdownHook, isShuttingDown } from '../utils/shutdown';
1011
import { getEffectiveSub } from '../utils/auth-helper';
1112

1213
export class CopilotController {
1314
private readonly copilotService = new CopilotService();
15+
private readonly activeStreams = new Set<Response>();
16+
17+
public constructor() {
18+
addShutdownHook(() => this.closeAllStreams());
19+
}
1420

1521
public async chat(req: Request, res: Response, next: NextFunction): Promise<void> {
22+
if (isShuttingDown()) {
23+
res.status(503).json({ status: 'shutting_down' });
24+
return;
25+
}
26+
1627
const { message, sessionId, context } = req.body as CopilotChatRequest;
1728

1829
if (!message || typeof message !== 'string' || !message.trim()) {
@@ -51,13 +62,15 @@ export class CopilotController {
5162
const abortController = new AbortController();
5263
let clientDisconnected = false;
5364

65+
this.activeStreams.add(res);
5466
res.on('close', () => {
5567
clientDisconnected = true;
68+
this.activeStreams.delete(res);
5669
abortController.abort();
5770
});
5871

5972
const sendEvent = (type: CopilotSSEEventType, data: unknown): void => {
60-
if (clientDisconnected) return;
73+
if (clientDisconnected || isShuttingDown()) return;
6174
res.write(`event: ${type}\ndata: ${JSON.stringify(data)}\n\n`);
6275
(res as FlushableResponse).flush?.();
6376
};
@@ -94,9 +107,69 @@ export class CopilotController {
94107
logger.error(req, 'copilot_chat', startTime, error, { user_id: userId });
95108
sendEvent('error', 'Something went wrong. Please try again.');
96109
} finally {
110+
this.activeStreams.delete(res);
97111
if (!clientDisconnected) {
98112
res.end();
99113
}
100114
}
101115
}
116+
117+
private async closeAllStreams(): Promise<void> {
118+
const streams = [...this.activeStreams];
119+
this.activeStreams.clear();
120+
// 2s per-stream timeout guards against backpressure: if the client's TCP receive
121+
// buffer is full, res.write()'s flush callback stalls until the buffer drains.
122+
const STREAM_CLOSE_TIMEOUT_MS = 2_000;
123+
await Promise.all(
124+
streams.map(
125+
(res) =>
126+
new Promise<void>((resolve) => {
127+
let done = false;
128+
const finish = (): void => {
129+
if (!done) {
130+
done = true;
131+
resolve();
132+
}
133+
};
134+
const timer = setTimeout(() => {
135+
// Timeout fired: the res.write() flush callback never arrived (likely
136+
// TCP backpressure). Force-close the socket so httpServer.close() isn't
137+
// held open by this stream until the 25s closeAllConnections() cutoff.
138+
logger.debug(undefined, 'sse_stream_shutdown_timeout', 'SSE stream close timed out; force-closing', {});
139+
try {
140+
if (!res.writableEnded) res.end();
141+
} catch {
142+
// already ended — harmless
143+
}
144+
res.socket?.destroy();
145+
finish();
146+
}, STREAM_CLOSE_TIMEOUT_MS);
147+
try {
148+
if (!res.writableEnded) {
149+
res.write('event: shutdown\ndata: {"reason":"server_shutdown"}\n\n', () => {
150+
clearTimeout(timer);
151+
res.end(finish);
152+
});
153+
} else {
154+
clearTimeout(timer);
155+
finish();
156+
}
157+
} catch (error) {
158+
clearTimeout(timer);
159+
const isExpected = error instanceof Error && (error.message.includes('write after end') || error.message.includes('Cannot call end'));
160+
if (isExpected) {
161+
logger.debug(undefined, 'sse_stream_shutdown_close', 'Stream already closed during shutdown', {
162+
err: error,
163+
});
164+
} else {
165+
logger.warning(undefined, 'sse_stream_shutdown_close', 'Unexpected error closing SSE stream', {
166+
err: error,
167+
});
168+
}
169+
finish();
170+
}
171+
})
172+
)
173+
);
174+
}
102175
}

apps/lfx-one/src/server/server.ts

Lines changed: 137 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ import { AuthContext, RuntimeConfig, User } from '@lfx-one/shared/interfaces';
88
import dotenv from 'dotenv';
99
import express, { NextFunction, Request, Response } from 'express';
1010
import { attemptSilentLogin, auth, ConfigParams } from 'express-openid-connect';
11+
import { Server as HttpServer } from 'node:http';
1112
import { dirname, resolve } from 'node:path';
1213
import { fileURLToPath } from 'node:url';
1314
import pinoHttp from 'pino-http';
@@ -50,7 +51,10 @@ import userRouter from './routes/user.route';
5051
import votesRouter from './routes/votes.route';
5152
import { reqSerializer, resSerializer, serverLogger } from './server-logger';
5253
import { logger } from './services/logger.service';
54+
import { NatsService } from './services/nats.service';
55+
import { SnowflakeService } from './services/snowflake.service';
5356
import { clearImpersonationSession, decodeJwtPayload } from './utils/auth-helper';
57+
import { isShuttingDown, markShuttingDown, runShutdownHooks } from './utils/shutdown';
5458
import { resolvePersonaForSsr } from './utils/persona-helper';
5559

5660
if (process.env['NODE_ENV'] !== 'production') {
@@ -100,6 +104,10 @@ app.get('/livez', (_req: Request, res: Response) => {
100104
// failures are handled at the route level, not by pulling the whole pod out
101105
// of the Service endpoints list.
102106
app.get('/readyz', (_req: Request, res: Response) => {
107+
if (isShuttingDown()) {
108+
res.status(503).json({ status: 'shutting_down' });
109+
return;
110+
}
103111
res.status(200).json({ status: 'ready' });
104112
});
105113

@@ -358,9 +366,129 @@ app.use((error: Error, req: Request, res: Response, next: NextFunction) => {
358366
apiErrorHandler(error, req, res, next);
359367
});
360368

369+
let httpServer: HttpServer | undefined;
370+
371+
async function gracefulShutdown(signal: string): Promise<void> {
372+
if (isShuttingDown()) return;
373+
markShuttingDown(); // flip /readyz to 503 synchronously before anything async runs
374+
375+
const startTime = logger.startOperation(undefined, 'graceful_shutdown', { signal });
376+
377+
// Mandatory LB drain window: after readyz flips to 503, the load balancer
378+
// continues routing requests until its next probe fires (readyz periodSeconds: 10s).
379+
// Wait 1.5× that interval before closing the HTTP listener so no new requests
380+
// land on an already-closed server. SSE shutdown hooks run concurrently so
381+
// clients are notified and can reconnect within this window.
382+
//
383+
// Hooks are best-effort: if they exceed the LB drain window we log a warning
384+
// and proceed — we never wait beyond LB_DRAIN_MS for hooks. `await lbDrain`
385+
// after the race guarantees the full 15s always elapses (no-op when lbDrain
386+
// already resolved via the race) while placing a hard ceiling on how long
387+
// hooks can delay the HTTP drain.
388+
const LB_DRAIN_MS = 15_000; // 1.5 × readyz periodSeconds (10s)
389+
const lbDrain = new Promise<void>((resolve) => setTimeout(resolve, LB_DRAIN_MS));
390+
let hooksCompleted = false;
391+
const hooksStartTime = Date.now();
392+
const hooks = runShutdownHooks()
393+
.then(() => {
394+
hooksCompleted = true;
395+
})
396+
.catch((err: unknown) => {
397+
logger.error(undefined, 'shutdown_hooks_error', hooksStartTime, err as Error, {});
398+
});
399+
await Promise.race([hooks, lbDrain]);
400+
if (!hooksCompleted) {
401+
logger.warning(undefined, 'shutdown_hooks_slow', 'Shutdown hooks exceeded LB drain window', { budget_ms: LB_DRAIN_MS });
402+
}
403+
await lbDrain; // hard ceiling: never wait beyond LB_DRAIN_MS for hooks
404+
405+
if (!httpServer) {
406+
logger.success(undefined, 'graceful_shutdown', startTime, { reason: 'no_http_server' });
407+
process.exit(0);
408+
return;
409+
}
410+
411+
// Stop accepting new connections and drain in-flight requests (25s window).
412+
await new Promise<void>((resolve) => {
413+
const drainTimeout = setTimeout(() => {
414+
httpServer!.closeAllConnections();
415+
resolve();
416+
}, 25_000);
417+
418+
httpServer!.closeIdleConnections();
419+
httpServer!.close(() => {
420+
clearTimeout(drainTimeout);
421+
resolve();
422+
});
423+
});
424+
425+
logger.info(undefined, 'shutdown_http_drained', 'HTTP server drained', {});
426+
427+
// Drain NATS and Snowflake *after* HTTP is fully closed. This ordering ensures any
428+
// in-flight HTTP request that issues a NATS request/reply can complete before the
429+
// connection is torn down — draining NATS before HTTP would break those requests.
430+
// Each drain is race'd against a 15s budget so a hung drain cannot exceed PM2's kill_timeout.
431+
// SnowflakeService.shutdownIfInitialized() skips pool creation for pods that never used Snowflake.
432+
const SERVICE_DRAIN_BUDGET_MS = 15_000;
433+
const raceDrain = (name: string, p: Promise<void>): Promise<void> => {
434+
let completed = false;
435+
const tracked = p.then(
436+
() => {
437+
completed = true;
438+
},
439+
() => {
440+
completed = true;
441+
}
442+
);
443+
return Promise.race([
444+
tracked,
445+
new Promise<void>((resolve) =>
446+
setTimeout(() => {
447+
if (!completed) {
448+
logger.warning(undefined, 'shutdown_drain_timeout', `${name} drain budget exceeded`, { budget_ms: SERVICE_DRAIN_BUDGET_MS });
449+
}
450+
resolve();
451+
}, SERVICE_DRAIN_BUDGET_MS)
452+
),
453+
]);
454+
};
455+
456+
await Promise.allSettled([
457+
raceDrain(
458+
'nats',
459+
// shutdownAll() uses Promise.allSettled — always resolves regardless of individual
460+
// drain outcomes. Per-connection failures are already logged at ERROR inside
461+
// NatsService.shutdown(). Log "complete" here (not "drained") to avoid implying
462+
// all drains succeeded when some may have been swallowed.
463+
NatsService.shutdownAll().then(() => {
464+
logger.info(undefined, 'shutdown_nats_complete', 'NATS shutdown complete', {});
465+
})
466+
),
467+
raceDrain(
468+
'snowflake',
469+
// shutdown() has an internal try/catch that logs pool drain errors and resolves.
470+
// Per-drain failures are already logged at ERROR inside SnowflakeService.shutdown().
471+
// Log "complete" here (not "drained") for the same reason as NATS above.
472+
// Keep the rejection handler: unlike shutdownAll(), shutdownIfInitialized() can
473+
// reject if pre-pool code throws before the internal try/catch.
474+
SnowflakeService.shutdownIfInitialized().then(
475+
() => {
476+
logger.info(undefined, 'shutdown_snowflake_complete', 'Snowflake shutdown complete', {});
477+
},
478+
(err) => {
479+
logger.warning(undefined, 'shutdown_snowflake_failed', 'Snowflake shutdown failed', { err });
480+
}
481+
)
482+
),
483+
]);
484+
485+
logger.success(undefined, 'graceful_shutdown', startTime, {});
486+
process.exit(0);
487+
}
488+
361489
export function startServer() {
362490
const port = process.env['PORT'] || 4000;
363-
app.listen(port, () => {
491+
httpServer = app.listen(port, () => {
364492
logger.debug(undefined, 'server_startup', 'Node Express server started', {
365493
port,
366494
url: `http://localhost:${port}`,
@@ -376,6 +504,14 @@ const isPM2 = process.env['PM2'] === 'true';
376504

377505
if (isMain || isPM2) {
378506
startServer();
507+
const handleSignal = (sig: string): void => {
508+
gracefulShutdown(sig).catch((err) => {
509+
logger.error(undefined, 'shutdown_fatal', Date.now(), err, { signal: sig });
510+
process.exit(1);
511+
});
512+
};
513+
process.on('SIGTERM', () => handleSignal('SIGTERM'));
514+
process.on('SIGINT', () => handleSignal('SIGINT'));
379515
}
380516

381517
export const reqHandler = createNodeRequestHandler(app);

apps/lfx-one/src/server/services/nats.service.ts

Lines changed: 19 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,8 @@ import { logger } from './logger.service';
2020
* This service handles only infrastructure concerns, not business logic
2121
*/
2222
export class NatsService {
23+
private static readonly instances = new Set<NatsService>();
24+
2325
private connection: NatsConnection | null = null;
2426
private connectionPromise: Promise<NatsConnection> | null = null;
2527
private natsHostname: string;
@@ -30,6 +32,11 @@ export class NatsService {
3032
const parsedUrl = new URL(natsUrl.replace(/^nats:/, 'http:'));
3133
this.natsHostname = parsedUrl.hostname;
3234
this.natsPort = parseInt(parsedUrl.port, 10) || 4222;
35+
NatsService.instances.add(this);
36+
}
37+
38+
public static async shutdownAll(): Promise<void> {
39+
await Promise.allSettled([...NatsService.instances].map((i) => i.shutdown()));
3340
}
3441

3542
/**
@@ -144,17 +151,20 @@ export class NatsService {
144151
* Gracefully shutdown NATS connection
145152
*/
146153
public async shutdown(): Promise<void> {
147-
if (this.connection && !this.connection.isClosed()) {
148-
const startTime = logger.startOperation(undefined, 'nats_shutdown', {});
149-
150-
try {
151-
await this.connection.drain();
152-
logger.success(undefined, 'nats_shutdown', startTime, {});
153-
} catch (error) {
154-
logger.error(undefined, 'nats_shutdown', startTime, error, {});
154+
try {
155+
if (this.connection && !this.connection.isClosed()) {
156+
const startTime = logger.startOperation(undefined, 'nats_shutdown', {});
157+
try {
158+
await this.connection.drain();
159+
logger.success(undefined, 'nats_shutdown', startTime, {});
160+
} catch (error) {
161+
logger.error(undefined, 'nats_shutdown', startTime, error, {});
162+
}
155163
}
164+
this.connection = null;
165+
} finally {
166+
NatsService.instances.delete(this);
156167
}
157-
this.connection = null;
158168
}
159169

160170
/**

apps/lfx-one/src/server/services/snowflake.service.ts

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,15 @@ export class SnowflakeService {
6666
return SnowflakeService.instance;
6767
}
6868

69+
/**
70+
* Shuts down the singleton only if it was ever initialized.
71+
* Safe to call from server shutdown — avoids creating a pool just to tear it down.
72+
*/
73+
public static shutdownIfInitialized(): Promise<void> {
74+
if (!SnowflakeService.instance) return Promise.resolve();
75+
return SnowflakeService.instance.shutdown();
76+
}
77+
6978
/**
7079
* Reset the singleton instance (primarily for testing)
7180
*/

0 commit comments

Comments
 (0)