Skip to content

test(cloudflare): Unflake integration test#20208

Open
JPeer264 wants to merge 1 commit intodevelopfrom
jp/unflake-cf-integration
Open

test(cloudflare): Unflake integration test#20208
JPeer264 wants to merge 1 commit intodevelopfrom
jp/unflake-cf-integration

Conversation

@JPeer264
Copy link
Copy Markdown
Member

There was one flaky test, which got me a little deeper into the runner.ts logic. This test was only passing when it was running / finishing first. With the shuffle flag it was consistently failing, this is why this is added in this PR as well.

Furthermore a random port will be created for each runner by setting --port 0, this just makes sure that when running wrangler dev in another tab, while running the tests, the local development has the default :8787 port.

@JPeer264 JPeer264 requested review from nicohrubec and s1gr1d April 10, 2026 13:53
@JPeer264 JPeer264 self-assigned this Apr 10, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 10, 2026

Semver Impact of This PR

🟢 Patch (bug fixes)

📋 Changelog Preview

This is how your changes will appear in the changelog.
Entries from this PR are highlighted with a left border (blockquote style).


New Features ✨

Core

  • Automatically disable truncation when span streaming is enabled in LangGraph integration by andreiborza in #20231
  • Automatically disable truncation when span streaming is enabled in LangChain integration by andreiborza in #20230
  • Automatically disable truncation when span streaming is enabled in Google GenAI integration by andreiborza in #20229
  • Automatically disable truncation when span streaming is enabled in Anthropic AI integration by andreiborza in #20228
  • Automatically disable truncation when span streaming is enabled in Vercel AI integration by andreiborza in #20232
  • Automatically disable truncation when span streaming is enabled in OpenAI integration by andreiborza in #20227
  • Add enableTruncation option to Vercel AI integration by nicohrubec in #20195
  • Add enableTruncation option to Google GenAI integration by andreiborza in #20184
  • Add enableTruncation option to Anthropic AI integration by andreiborza in #20181
  • Add enableTruncation option to LangGraph integration by andreiborza in #20183
  • Add enableTruncation option to LangChain integration by andreiborza in #20182
  • Add enableTruncation option to OpenAI integration by andreiborza in #20167
  • Export a reusable function to add tracing headers by JPeer264 in #20076

Deps

  • Bump axios from 1.13.5 to 1.15.0 by dependabot in #20180
  • Bump hono from 4.12.7 to 4.12.12 by dependabot in #20118
  • Bump defu from 6.1.4 to 6.1.6 by dependabot in #20104

Other

  • (cloudflare) Propagate traceparent to RPC calls - via fetch by JPeer264 in #19991

Bug Fixes 🐛

Deno

  • Handle reader.closed rejection from releaseLock() in streaming by andreiborza in #20187
  • Avoid inferring invalid span op from Deno tracer by Lms24 in #20128

Other

  • (ci) Prevent command injection in ci-metadata workflow by fix-it-felix-sentry in #19899
  • (e2e) Add op check to waitForTransaction in React Router e2e tests by copilot-swe-agent in #20193
  • (node-integration-tests) Fix flaky kafkajs test race condition by copilot-swe-agent in #20189

Internal Changes 🔧

Deps

  • Bump hono from 4.12.7 to 4.12.12 in /dev-packages/e2e-tests/test-applications/cloudflare-hono by dependabot in #20119
  • Bump axios from 1.13.5 to 1.15.0 in /dev-packages/e2e-tests/test-applications/nestjs-basic by dependabot in #20179

Other

  • (bugbot) Add rules to flag test-flake-provoking patterns by Lms24 in #20192
  • (cloudflare) Unflake integration test by JPeer264 in #20208
  • (deps-dev) Bump vite from 7.2.0 to 7.3.2 in /dev-packages/e2e-tests/test-applications/tanstackstart-react by dependabot in #20107
  • (react) Remove duplicated test mock by s1gr1d in #20200
  • (size-limit) Bump failing size limit scenario by Lms24 in #20186
  • Fix flaky ANR test by increasing blocking duration by JPeer264 in #20239
  • Add automatic flaky test detector by nicohrubec in #18684

🤖 This preview updates automatically when you update the PR.

// This is needed because wrangler dev may not guarantee waitUntil completion
// the same way production Cloudflare does. Without this delay, the last
// envelope's HTTP request may not complete before the test moves on.
const delay = () => new Promise(resolve => setTimeout(resolve, 50));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m: can we solve this differently, specifically is there some event that we could await before moving onto the next request instead of adding a timeout? this might already help but I am worried that this will not fully resolve the flakiness

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now there is now way, as the runner doesn't provide a way of doing this. Wrangler, as of now, seems to drop waitUntil runs entirely when another request comes in. For that to work we have to change the way how the runner works, and not register all expect at once and then call the worker, but rather call the worker and wait for the expect right after. But that would be a bigger part of work AFAICT

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so to understand correctly: We delay here by 50ms so that a kicked off waitUntil task finishes before we start a new request? And we do this due to a local wrangler limitation (?)

Taking a step back: Why is this test doing 5 request repetitions? I see we always assert on the same payload, without cross-envelope checks, so what do we gain from it? (not saying we shouldn't just that it's not clear).

Given I understand correctly, I'd say the delay is fine (for the lack of better options). But can we make sure this is enough for CI? 50ms seems a bit short but then again, I'm not sure if it's necessary to wait longer. Maybe just deferring to the next tick is already enough?

@Lms24
Copy link
Copy Markdown
Member

Lms24 commented Apr 10, 2026

any chance this resolves #20209?

@JPeer264
Copy link
Copy Markdown
Member Author

any chance this resolves #20209?

Potentially. I also have a little bit more code locally, where we entirely wait between tests that wrangler dev and its child processes have exited entirely. I didn't add this here, as I think using different ports would be enough. So I hope it will solve the flakiness already.

Regardless, I'll take a closer look at the other flake.

@JPeer264 JPeer264 force-pushed the jp/unflake-cf-integration branch from 7aed56c to f77c6b1 Compare April 13, 2026 11:38
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Fixed-delay sleep may still cause test flakiness
    • Replaced the 50ms setTimeout delay with an event-based waitForEnvelopes() method that waits for actual envelope delivery, eliminating the race condition and test flakiness.

Create PR

Or push these changes by commenting:

@cursor push 8dc8b96c93
Preview (8dc8b96c93)
diff --git a/dev-packages/cloudflare-integration-tests/runner.ts b/dev-packages/cloudflare-integration-tests/runner.ts
--- a/dev-packages/cloudflare-integration-tests/runner.ts
+++ b/dev-packages/cloudflare-integration-tests/runner.ts
@@ -50,6 +50,7 @@
     path: string,
     options?: { headers?: Record<string, string>; data?: BodyInit; expectError?: boolean },
   ): Promise<T | undefined>;
+  waitForEnvelopes(count: number): Promise<void>;
 };
 
 /** Creates a test runner */
@@ -112,12 +113,23 @@
       let child: ReturnType<typeof spawn> | undefined;
       let childSubWorker: ReturnType<typeof spawn> | undefined;
 
+      // Track promises waiting for specific envelope counts
+      const envelopeWaiters: Array<{ count: number; resolve: () => void }> = [];
+
       /** Called after each expect callback to check if we're complete */
       function expectCallbackCalled(): void {
         envelopeCount++;
         if (envelopeCount === expectedEnvelopeCount) {
           resolve();
         }
+
+        // Resolve any waiters that are waiting for this envelope count
+        for (let i = envelopeWaiters.length - 1; i >= 0; i--) {
+          if (envelopeCount >= envelopeWaiters[i].count) {
+            envelopeWaiters[i].resolve();
+            envelopeWaiters.splice(i, 1);
+          }
+        }
       }
 
       function assertEnvelopeMatches(expected: Expected, envelope: Envelope): void {
@@ -308,6 +320,15 @@
             return;
           }
         },
+        waitForEnvelopes: async function (count: number): Promise<void> {
+          if (envelopeCount >= count) {
+            return Promise.resolve();
+          }
+
+          return new Promise<void>(resolveWaiter => {
+            envelopeWaiters.push({ count, resolve: resolveWaiter });
+          });
+        },
       };
     },
   };

diff --git a/dev-packages/cloudflare-integration-tests/suites/tracing/durableobject-spans/test.ts b/dev-packages/cloudflare-integration-tests/suites/tracing/durableobject-spans/test.ts
--- a/dev-packages/cloudflare-integration-tests/suites/tracing/durableobject-spans/test.ts
+++ b/dev-packages/cloudflare-integration-tests/suites/tracing/durableobject-spans/test.ts
@@ -45,20 +45,14 @@
   // Expect 5 transaction envelopes — one per call.
   const runner = createRunner(__dirname).expectN(5, assertDoWorkEnvelope).start(signal);
 
-  // Small delay between requests to allow waitUntil to process in wrangler dev.
-  // This is needed because wrangler dev may not guarantee waitUntil completion
-  // the same way production Cloudflare does. Without this delay, the last
-  // envelope's HTTP request may not complete before the test moves on.
-  const delay = () => new Promise(resolve => setTimeout(resolve, 50));
-
   await runner.makeRequest('get', '/');
-  await delay();
+  await runner.waitForEnvelopes(1);
   await runner.makeRequest('get', '/');
-  await delay();
+  await runner.waitForEnvelopes(2);
   await runner.makeRequest('get', '/');
-  await delay();
+  await runner.waitForEnvelopes(3);
   await runner.makeRequest('get', '/');
-  await delay();
+  await runner.waitForEnvelopes(4);
   await runner.makeRequest('get', '/');
   await runner.completed();
 });

This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

Reviewed by Cursor Bugbot for commit f77c6b1. Configure here.

await runner.makeRequest('get', '/');
await delay();
await runner.makeRequest('get', '/');
await delay();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed-delay sleep may still cause test flakiness

Low Severity

The test introduces a hardcoded 50ms setTimeout delay between requests as a workaround for wrangler dev's waitUntil behavior. Per the review rules, timeouts or sleeps in tests are flagged as likely to introduce flakes — concrete events or signals to wait on are preferred. A 50ms delay may be insufficient under CI load, potentially causing the same flakiness this PR aims to fix. The PR discussion already acknowledges this limitation, noting that the runner doesn't currently provide an event-based mechanism.

Fix in Cursor Fix in Web

Triggered by project rule: PR Review Guidelines for Cursor Bot

Reviewed by Cursor Bugbot for commit f77c6b1. Configure here.

@github-actions
Copy link
Copy Markdown
Contributor

size-limit report 📦

Path Size % Change Change
@sentry/browser 25.72 kB - -
@sentry/browser - with treeshaking flags 24.21 kB - -
@sentry/browser (incl. Tracing) 42.73 kB - -
@sentry/browser (incl. Tracing, Profiling) 47.35 kB - -
@sentry/browser (incl. Tracing, Replay) 81.54 kB - -
@sentry/browser (incl. Tracing, Replay) - with treeshaking flags 71.11 kB - -
@sentry/browser (incl. Tracing, Replay with Canvas) 86.25 kB - -
@sentry/browser (incl. Tracing, Replay, Feedback) 98.45 kB - -
@sentry/browser (incl. Feedback) 42.51 kB - -
@sentry/browser (incl. sendFeedback) 30.39 kB - -
@sentry/browser (incl. FeedbackAsync) 35.38 kB - -
@sentry/browser (incl. Metrics) 27.04 kB - -
@sentry/browser (incl. Logs) 27.18 kB - -
@sentry/browser (incl. Metrics & Logs) 27.86 kB - -
@sentry/react 27.48 kB - -
@sentry/react (incl. Tracing) 45.05 kB - -
@sentry/vue 30.56 kB - -
@sentry/vue (incl. Tracing) 44.59 kB - -
@sentry/svelte 25.74 kB - -
CDN Bundle 28.41 kB - -
CDN Bundle (incl. Tracing) 43.75 kB - -
CDN Bundle (incl. Logs, Metrics) 29.78 kB - -
CDN Bundle (incl. Tracing, Logs, Metrics) 44.83 kB - -
CDN Bundle (incl. Replay, Logs, Metrics) 68.59 kB - -
CDN Bundle (incl. Tracing, Replay) 80.64 kB - -
CDN Bundle (incl. Tracing, Replay, Logs, Metrics) 81.66 kB - -
CDN Bundle (incl. Tracing, Replay, Feedback) 86.17 kB - -
CDN Bundle (incl. Tracing, Replay, Feedback, Logs, Metrics) 87.2 kB - -
CDN Bundle - uncompressed 82.99 kB - -
CDN Bundle (incl. Tracing) - uncompressed 129.77 kB - -
CDN Bundle (incl. Logs, Metrics) - uncompressed 87.14 kB - -
CDN Bundle (incl. Tracing, Logs, Metrics) - uncompressed 133.19 kB - -
CDN Bundle (incl. Replay, Logs, Metrics) - uncompressed 210.12 kB - -
CDN Bundle (incl. Tracing, Replay) - uncompressed 246.65 kB - -
CDN Bundle (incl. Tracing, Replay, Logs, Metrics) - uncompressed 250.05 kB - -
CDN Bundle (incl. Tracing, Replay, Feedback) - uncompressed 259.56 kB - -
CDN Bundle (incl. Tracing, Replay, Feedback, Logs, Metrics) - uncompressed 262.95 kB - -
@sentry/nextjs (client) 47.47 kB - -
@sentry/sveltekit (client) 43.2 kB - -
@sentry/node-core 57.86 kB +0.02% +7 B 🔺
@sentry/node 174.93 kB +0.01% +11 B 🔺
@sentry/node - without tracing 97.97 kB +0.03% +22 B 🔺
@sentry/aws-serverless 115.22 kB +0.02% +19 B 🔺

View base workflow run

Copy link
Copy Markdown
Member

@Lms24 Lms24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock but please take a look at my comments

},
},
sequence: {
shuffle: true,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the shuffle flag it was consistently failing, this is why this is added in this PR as well

Maybe I misunderstand but shouldn't shuffle be set to false then?

// This is needed because wrangler dev may not guarantee waitUntil completion
// the same way production Cloudflare does. Without this delay, the last
// envelope's HTTP request may not complete before the test moves on.
const delay = () => new Promise(resolve => setTimeout(resolve, 50));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so to understand correctly: We delay here by 50ms so that a kicked off waitUntil task finishes before we start a new request? And we do this due to a local wrangler limitation (?)

Taking a step back: Why is this test doing 5 request repetitions? I see we always assert on the same payload, without cross-envelope checks, so what do we gain from it? (not saying we shouldn't just that it's not clear).

Given I understand correctly, I'd say the delay is fine (for the lack of better options). But can we make sure this is enough for CI? 50ms seems a bit short but then again, I'm not sure if it's necessary to wait longer. Maybe just deferring to the next tick is already enough?

Comment on lines 29 to 33
expect.objectContaining({ description: 'task-1', op: 'task' }),
expect.objectContaining({ description: 'task-2', op: 'task' }),
expect.objectContaining({ description: 'task-3', op: 'task' }),
expect.objectContaining({ description: 'task-4', op: 'task' }),
expect.objectContaining({ description: 'task-5', op: 'task' }),
Copy link
Copy Markdown
Member

@Lms24 Lms24 Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of scope (so no need to do it in this PR) but I just saw this: Is task a valid op? I didn't find it in our list of span operations. Not sure if this was discussed and agreed upon but if yes, let's update the span operations doc in develop.

JPeer264 added a commit that referenced this pull request Apr 14, 2026
The test is timing out intermittently in CI, causing spurious failures.
This will be fixed as part of #20208

Co-authored-by: Claude Opus 4 <noreply@anthropic.com>
mydea pushed a commit that referenced this pull request Apr 15, 2026
The test is timing out intermittently in CI, causing spurious failures.
This will be fixed as part of #20208

Co-authored-by: Claude Opus 4 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants