Skip to content

[ISSUE #10373] Quarantine flaky tests and add detection plan docs#10374

Merged
RongtongJin merged 3 commits into
apache:developfrom
lizhimins:zhimin/quarantine-flaky-tests
May 25, 2026
Merged

[ISSUE #10373] Quarantine flaky tests and add detection plan docs#10374
RongtongJin merged 3 commits into
apache:developfrom
lizhimins:zhimin/quarantine-flaky-tests

Conversation

@lizhimins
Copy link
Copy Markdown
Member

@lizhimins lizhimins commented May 25, 2026

Summary

#10373

  • Quarantine flaky test methods with @Ignore annotations across broker, client, filter, and tieredstore modules
  • Add flaky test detection plan documentation (CN + EN) under docs/cn/
  • Switch LiteLifecycleManagerTest to MockitoJUnitRunner.Silent to fix unnecessary stubbing error after @Ignore

Motivation

Flaky tests cause intermittent CI failures that erode developer trust in build signals. These methods were identified by running the full test suite 100× across 10 ECS nodes using a three-layer funnel (module → class → method). Methods with ≥1% failure rate are quarantined.

Test plan

  • Verify CI passes with quarantined tests ignored
  • Confirm no test coverage loss for non-flaky paths
  • Review @Ignore annotations include failure rate metadata

Ran all RocketMQ module tests 100x across 10 ECS nodes to identify
non-deterministic failures. Quarantined methods with @ignore across
broker, client, filter, and tieredstore modules.

Flaky tests quarantined:
- broker: LiteLifecycleManagerTest#testCleanByParentTopic (2%)
- broker: ConsumerOrderInfoManagerLockFreeNotifyTest#testRecover (2%)
- broker: TransactionalMessageServiceImplTest#testDeletePrepareMessage_maxSize (1%)
- client: DefaultMQConsumerWithTraceTest#testPullMessage_WithTrace_Success (1%)
- client: DefaultMQLitePullConsumerWithTraceTest#testSubscribe_PollMessageSuccess_WithCustomizedTraceTopic (5%)
- client: DefaultMQLitePullConsumerWithTraceTest#testSubscribe_PollMessageSuccess_WithDefaultTraceTopic (6%)
- filter: BloomFilterTest#testCheckFalseHit (1%)
- tieredstore: IndexStoreServiceTest#queryCrossFileBoundaryTest (35%)
- tieredstore: IndexStoreServiceTest#concurrentGetTest (1.5%)

Additional changes:
- LiteLifecycleManagerTest: Switch to MockitoJUnitRunner.Silent
- Add flaky test detection plan docs (CN + EN)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 48.79%. Comparing base (7e5d22d) to head (42ababd).
⚠️ Report is 1 commits behind head on develop.

Additional details and impacted files
@@              Coverage Diff              @@
##             develop   #10374      +/-   ##
=============================================
- Coverage      48.98%   48.79%   -0.19%     
+ Complexity     13482    13430      -52     
=============================================
  Files           1376     1376              
  Lines         100539   100539              
  Branches       12983    12983              
=============================================
- Hits           49244    49057     -187     
- Misses         45287    45471     +184     
- Partials        6008     6011       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- Quarantine PopPriorityIT at class level (multiple methods fail
  intermittently with 'expected:<8> but was:<2>' due to async race)
- Fix ConsumerOrderInfoManagerLockFreeNotifyTest
- Fix IndexStoreServiceTest

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@lizhimins lizhimins force-pushed the zhimin/quarantine-flaky-tests branch from 5633427 to 4ae4e39 Compare May 25, 2026 05:45
Move English doc from docs/cn/ to docs/en/ and rename both files
to match existing docs naming convention (underscore + PascalCase).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@RongtongJin RongtongJin merged commit 980f3d7 into apache:develop May 25, 2026
10 checks passed
lizhimins added a commit to lizhimins/rocketmq that referenced this pull request May 26, 2026
…workflow

Fix root causes of flaky tests quarantined in apache#10374:

- BloomFilterTest#testCheckFalseHit: use single seeded Random instance
  instead of per-character Random(System.nanoTime()) which produced
  duplicate strings in tight loops
- TransactionalMessageServiceImplTest#testDeletePrepareMessage_maxSize:
  increase verify timeout from 50ms to 3000ms to accommodate slow
  thread scheduling
- DefaultMQConsumerWithTraceTest#testPullMessage_WithTrace_Success:
  call pullMessage directly instead of async PullMessageService to
  eliminate race condition
- DefaultMQLitePullConsumerWithTraceTest: set RebalanceService.waitInterval
  as static field in @before to avoid instance-level race condition

Also remove rerun-workflow.yml to stop masking flaky tests with
automatic CI retries.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants