[smoke-test] Decrease retries from 16 to 4#19301
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
| # Reduced from 3 to 1: combined with SWARM_BUILD_NUM_RETRIES=1, gives 2×2=4 total | ||
| # attempts per test (was 4×4=16). Fewer retries reduce resource contention when | ||
| # multiple tests compete for ports on the same CI instance. | ||
| retries = 1 |
There was a problem hiding this comment.
Silly question: if we reduce this, we won't allow a single smoke test flake anymore? Am I reading it correctly? 🤔
There was a problem hiding this comment.
It's 1 retry, so that's 2 tries
There was a problem hiding this comment.
I see... 🤔 This may be too aggressive without fixing the flakes first 😄 (I worry we'll block folks and make them unhappy).
Will unblock this for you now, and take a look at the latest set of partitioned smoke test runs and try to fix any that I understand 🙏
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Remove tests that were explicitly marked as subsets of their epoch_changes counterparts. These tests were permanently #[ignore]'d and provided no additional coverage over the active variants. Removed from state_sync.rs: - test_fullnode_output_sync_no_epoch_changes - test_fullnode_execution_sync_no_epoch_changes - test_validator_sync_and_participate_no_epoch_changes - test_validator_fast_sync_and_participate_no_epoch_changes - test_validator_fast_sync_exponential_backoff_no_epoch_changes Removed from consensus_observer.rs: - test_consensus_observer_fast_sync_no_epoch_changes
The Move Prover z3 version check was silently failing when an existing z3 binary at $INSTALL_DIR couldn't report its version (e.g., wrong glibc, corrupt download, incompatible binary from CI runner image). This caused ~30% of unit test failures with: "cannot extract version from /home/runner/bin/z3" Changes: - Explicitly check if existing z3 binary can run and report version - Delete and reinstall broken binaries instead of skipping install - Add post-install verification to warn early on download corruption
9b2ae23 to
ab2312c
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
| # Reduced from 3 to 1: combined with SWARM_BUILD_NUM_RETRIES=1, gives 2×2=4 total | ||
| # attempts per test (was 4×4=16). Fewer retries reduce resource contention when | ||
| # multiple tests compete for ports on the same CI instance. | ||
| retries = 1 |
There was a problem hiding this comment.
I see... 🤔 This may be too aggressive without fixing the flakes first 😄 (I worry we'll block folks and make them unhappy).
Will unblock this for you now, and take a look at the latest set of partitioned smoke test runs and try to fix any that I understand 🙏
|
This issue is stale because it has been open 45 days with no activity. Remove the |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
✅ Forge suite
|
✅ Forge suite
|
✅ Forge suite
|
Rebase failed
Description
Reduces retries on testing, to bring out flaky tests / not spend so much time on them.
How Has This Been Tested?
Key Areas to Review
Type of Change
Which Components or Systems Does This Change Impact?
Checklist