Skip to content

Commit b6dc2be

Browse files
sirtimidclaude
andauthored
test(kernel-node-runtime): Fix flaky remote-comms E2E tests (#927)
## Description Two E2E tests in `remote-comms.test.ts` were consistently flaky on `main`, failing more often than passing: 1. **"rejects new messages when queue reaches MAX_QUEUE limit"** — timed out at 120s 2. **"resolves promise after reconnection when retries have not been exhausted"** — failed with `URL redemption timed out after 8000ms` **Root cause:** Both tests used `testBackoffOptions` with `ackTimeoutMs: 2_000`. The `RemoteHandle` gives up after `ackTimeoutMs × (MAX_RETRIES + 1) = 8s`, which is too short for kernel2 to restart (new DB connection + kernel init + libp2p + relay connection + vat launch). ## Changes - Increase `ackTimeoutMs` to `5_000` for both tests, giving a 20s give-up window instead of 8s - Extract named options objects (`queueTestOptions`, `reconnectOptions`) for clarity and consistency between `setupAliceAndBob` and `restartKernelAndReloadVat` calls - Restructure the reconnection test to establish the connection first (completing URL redemption), then stop kernel2 before sending the test message — eliminating both the URL redemption race and the non-deterministic message delivery race - Remove unnecessary `delay()` calls that were masking the underlying timing issue ## Testing Both tests pass reliably after the fix. Verified by running the full `remote-comms.test.ts` E2E suite twice — all 67 E2E tests pass across all 11 test files (including both previously-flaky tests). 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Low Risk** > Low risk: changes are limited to E2E test timing/configuration and message sequencing; main risk is masking real regressions by increasing timeouts or altering test ordering. > > **Overview** > Stabilizes `remote-comms.test.ts` E2E coverage by introducing per-test option objects that override `ackTimeoutMs` (to `5_000`) and related rate-limit settings, and then using those same options consistently across `setupAliceAndBob` and `restartKernelAndReloadVat`. > > Reworks the "reconnect without exhausting retries" test to first establish a connection (complete URL redemption), then stop the remote kernel and queue the message while it’s definitely offline, removing race-prone delays and making the queued-message reconnection behavior deterministic. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit 11d93ec. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 882fd94 commit b6dc2be

1 file changed

Lines changed: 31 additions & 21 deletions

File tree

packages/kernel-node-runtime/test/e2e/remote-comms.test.ts

Lines changed: 31 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -569,20 +569,25 @@ describe.sequential('Remote Communications E2E', () => {
569569
it(
570570
'rejects new messages when queue reaches MAX_QUEUE limit',
571571
async () => {
572-
// Use high rate limits to avoid rate limiting interference with queue limit test
572+
// Use high rate limits to avoid rate limiting interference with queue limit test.
573573
// maxConnectionAttemptsPerMinute is needed because async kernel service invocations
574-
// can cause multiple concurrent connection attempts when processing many messages
574+
// can cause multiple concurrent connection attempts when processing many messages.
575+
// ackTimeoutMs must be long enough for kernel2 to restart + reconnect before
576+
// the RemoteHandle gives up (give-up fires after ackTimeoutMs × (MAX_RETRIES + 1)).
577+
const queueTestOptions = {
578+
...testBackoffOptions,
579+
maxMessagesPerSecond: 500,
580+
maxConnectionAttemptsPerMinute: 500,
581+
ackTimeoutMs: 5_000,
582+
};
583+
575584
const { aliceRef, bobURL } = await setupAliceAndBob(
576585
kernel1,
577586
kernel2,
578587
kernelStore1,
579588
kernelStore2,
580589
testRelays,
581-
{
582-
maxMessagesPerSecond: 500,
583-
maxConnectionAttemptsPerMinute: 500,
584-
...testBackoffOptions,
585-
},
590+
queueTestOptions,
586591
);
587592

588593
await sendRemoteMessage(kernel1, aliceRef, bobURL, 'hello', ['Alice']);
@@ -607,7 +612,7 @@ describe.sequential('Remote Communications E2E', () => {
607612
false,
608613
testRelays,
609614
bobConfig,
610-
testBackoffOptions,
615+
queueTestOptions,
611616
);
612617
// eslint-disable-next-line require-atomic-updates
613618
kernel2 = restartResult.kernel;
@@ -1069,44 +1074,49 @@ describe.sequential('Remote Communications E2E', () => {
10691074
it(
10701075
'resolves promise after reconnection when retries have not been exhausted',
10711076
async () => {
1077+
// Use a longer ACK timeout so the RemoteHandle doesn't give up before
1078+
// kernel2 can restart and reconnect (give-up = ackTimeoutMs × 4).
1079+
const reconnectOptions = {
1080+
...testBackoffOptions,
1081+
ackTimeoutMs: 5_000,
1082+
};
1083+
10721084
const { aliceRef, bobURL } = await setupAliceAndBob(
10731085
kernel1,
10741086
kernel2,
10751087
kernelStore1,
10761088
kernelStore2,
10771089
testRelays,
1078-
testBackoffOptions,
1090+
reconnectOptions,
10791091
);
10801092

1081-
// Send a message that creates a promise with remote as decider
1093+
// Establish the connection and complete URL redemption first so that
1094+
// the subsequent message doesn't race URL redemption against kernel2.stop().
1095+
await sendRemoteMessage(kernel1, aliceRef, bobURL, 'hello', ['Alice']);
1096+
1097+
// Stop kernel2 so the next message is guaranteed to be pending
1098+
await kernel2.stop();
1099+
1100+
// Send a message while kernel2 is down - it will be queued
10821101
const messagePromise = kernel1.queueMessage(
10831102
aliceRef,
10841103
'sendRemoteMessage',
10851104
[bobURL, 'hello', ['Alice']],
10861105
);
10871106

1088-
// Stop kernel2 before it can respond
1089-
await kernel2.stop();
1090-
1091-
// Wait a bit for connection loss to be detected
1092-
await delay(100);
1093-
1094-
// Restart kernel2 quickly (before max retries, since default is infinite)
1107+
// Restart kernel2 (before max retries, since default is infinite)
10951108
// The promise should remain unresolved and resolve normally after reconnection
10961109
const bobConfig = makeRemoteVatConfig('Bob');
10971110
const restartResult = await restartKernelAndReloadVat(
10981111
dbFilename2,
10991112
false,
11001113
testRelays,
11011114
bobConfig,
1102-
testBackoffOptions,
1115+
reconnectOptions,
11031116
);
11041117
// eslint-disable-next-line require-atomic-updates
11051118
kernel2 = restartResult.kernel;
11061119

1107-
// Wait for reconnection
1108-
await delay(200);
1109-
11101120
// The message should eventually be delivered and resolved
11111121
// The promise was never rejected because retries weren't exhausted
11121122
const result = await messagePromise;

0 commit comments

Comments
 (0)