Skip to content

Measure retained heap in lint memory test to remove flakiness#5097

Open
habdelra wants to merge 2 commits into
mainfrom
worktree-flaky-lint-memory-test
Open

Measure retained heap in lint memory test to remove flakiness#5097
habdelra wants to merge 2 commits into
mainfrom
worktree-flaky-lint-memory-test

Conversation

@habdelra
Copy link
Copy Markdown
Contributor

@habdelra habdelra commented Jun 3, 2026

What

The realm-endpoints/lint-test.ts > memory usage during lint operations test is intermittently flaky. It asserts a hard ceiling on the process heapUsed delta across ten concurrent /_lint requests, but never forces a garbage collection before reading. The "after" reading therefore includes transient garbage that GC simply hasn't reclaimed yet — so the delta measures GC timing, not retained memory, and occasionally tips over the threshold on a busy CI runner.

Fix

Make the measurement reflect retained memory — the actual leak signal:

  • Warm up once before the baseline reading, so one-time, legitimately-retained initialization (eslint rule definitions, prettier plugins, module caches) isn't miscounted as growth from the measured loop.
  • Force a full GC before both the baseline and the final reading. A new forceGc helper enables GC at runtime via the V8 flag API, because the test runner doesn't start Node with --expose-gc.
  • Assert on the post-GC retained-growth delta with a 20MB leak-guard bound. Locally the retained growth measures ~3MB, so the bound has ample headroom while still catching a genuine per-operation leak.
  • Log a diagnostic line ([lint-memory-test] initial=… final=… retainedGrowth=… gcForced=…) so any future failure shows the real retained number and whether GC actually ran — distinguishing a real leak from measurement noise.

Verification

Ran the module locally against the realm-server test stack:

[lint-memory-test] initial=172.63MB final=175.58MB retainedGrowth=2.95MB gcForced=true
ok ... memory usage during lint operations

🤖 Generated with Claude Code

The "memory usage during lint operations" test asserted a hard ceiling on
the process heapUsed delta across ten concurrent /_lint requests without
forcing a collection first, so the reading was dominated by transient
garbage that GC had not yet reclaimed rather than by retained memory. The
delta therefore tracked GC timing, not leaks, and intermittently tipped
over the threshold.

The test now warms up once (so one-time eslint/prettier initialization is
not miscounted), forces a full GC before both the baseline and final
reading, and asserts on the resulting retained-growth delta. A `forceGc`
helper enables GC at runtime via the V8 flag API since the runner does not
pass --expose-gc. Retained growth measures ~3MB locally; the bound is set
to 20MB as a leak guard with ample headroom. A diagnostic line logs the
retained-growth number and whether GC ran, so any future failure can be
told apart from measurement noise.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces flakiness in the realm-server lint endpoint memory test by shifting the measurement from “heapUsed at arbitrary GC timing” to a post-GC retained-heap delta, using a warm-up request plus an explicit GC trigger.

Changes:

  • Add a forceGc() helper to attempt to enable and run V8 garbage collection at runtime (without starting Node with --expose-gc).
  • Update the lint memory test to warm caches, force GC before baseline/final readings, tighten the retained-growth threshold, and emit a diagnostic line.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
packages/realm-server/tests/realm-endpoints/lint-test.ts Reworks the memory-usage test to measure post-GC retained growth and adds diagnostics.
packages/realm-server/tests/helpers/prettier-test-utils.ts Introduces forceGc() helper used by the memory test to trigger GC without --expose-gc.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/realm-server/tests/helpers/prettier-test-utils.ts Outdated
Comment thread packages/realm-server/tests/realm-endpoints/lint-test.ts
Comment thread packages/realm-server/tests/realm-endpoints/lint-test.ts
- forceGc resolves the GC function once and caches it, so a missing
  globalThis.gc no longer creates a throwaway VM context on every call
  (which would itself allocate and perturb the heap reading). The
  runtime-enable path now swallows a ReferenceError and returns false
  instead of letting it escape, honoring the documented contract.
- The memory test now keys its threshold on whether a collection actually
  ran: the tight 20MB retained-memory bound when GC was forced, falling
  back to the looser historical 45MB bound otherwise, so the test never
  becomes more flaky than before in an environment without runtime GC.
- The warm-up request's status is asserted so a regression there is
  localized rather than silently shifting the baseline.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@habdelra habdelra requested a review from a team June 3, 2026 20:37
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 3, 2026

Host Test Results

    1 files      1 suites   1h 38m 56s ⏱️
2 931 tests 2 916 ✅ 15 💤 0 ❌
2 950 runs  2 935 ✅ 15 💤 0 ❌

Results for commit 7102eeb.

Realm Server Test Results

    1 files      1 suites   10m 36s ⏱️
1 549 tests 1 548 ✅ 1 💤 0 ❌
1 640 runs  1 639 ✅ 1 💤 0 ❌

Results for commit 7102eeb.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants