Skip to content

Commit a357055

Browse files
committed
fix: increase Cosmos DB test timeouts to eliminate flakes
Cosmos DB has a variable delay between accepting a write and making it visible on the change stream cursor (internal propagation, not network latency). Against the remote dev cluster this can take 10-30s during spikes, though in a co-located deployment it would be much faster. Previous timeouts (25s poll, 60s test) were insufficient — the lte guard and resume tests flaked at ~20-30% rate. Increased to 50s poll deadline and 120s test timeout. 5 consecutive runs with 0 flakes (29/29 each). maxAwaitTime (when supported) would reduce polling overhead but would not help with propagation delay — the event is simply not available yet. The generous timeouts are appropriate for prototype-quality tests run manually against a remote cluster, not for CI.
1 parent c04541a commit a357055

1 file changed

Lines changed: 6 additions & 7 deletions

File tree

modules/module-mongodb/test/src/cosmosdb_mode.test.ts

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -26,9 +26,9 @@ bucket_definitions:
2626
// commit history on the cosmos branch for the full investigation.
2727
const isCosmosDb = process.env.COSMOS_DB_TEST === 'true';
2828
describe.skipIf(!isCosmosDb)('cosmosDbMode', () => {
29-
// 60s timeout — remote Cosmos DB clusters can have 10-20s latency spikes
29+
// 120s timeout — remote Cosmos DB clusters can have 10-30s latency spikes
3030
// for change stream delivery. Tests that poll for data need headroom.
31-
describeWithStorage({ timeout: 60_000 }, defineCosmosDbModeTests);
31+
describeWithStorage({ timeout: 120_000 }, defineCosmosDbModeTests);
3232
});
3333

3434
function defineCosmosDbModeTests({ factory, storageVersion }: StorageVersionTestContext) {
@@ -224,9 +224,8 @@ bucket_definitions:
224224
// We bypass the flaky getClientCheckpoint timing by polling until the data appears
225225
// or the timeout expires. If the .lte() guard drops same-second events, the data
226226
// will never appear — deterministic failure.
227-
// 25s timeout — remote Cosmos DB clusters can have variable latency
228-
// for change stream delivery.
229-
const deadline = Date.now() + 25_000;
227+
// 50s timeout — remote Cosmos DB clusters can have 10-30s latency spikes.
228+
const deadline = Date.now() + 50_000;
230229
let found = false;
231230
while (Date.now() < deadline) {
232231
try {
@@ -307,8 +306,8 @@ bucket_definitions:
307306
// matches the storage LSN (same second). This mirrors production behavior
308307
// where write checkpoints may take up to ~1s to resolve on a quiet system.
309308
// Use a polling approach with retries to handle this latency.
310-
// 25s timeout for remote Cosmos DB clusters with variable latency.
311-
const deadline = Date.now() + 25_000;
309+
// 50s timeout remote Cosmos DB clusters can have 10-30s latency spikes.
310+
const deadline = Date.now() + 50_000;
312311
let found = false;
313312
while (Date.now() < deadline) {
314313
try {

0 commit comments

Comments
 (0)