Go: Fix flaky TestScriptKillWithRoute race condition#5950
Merged
Conversation
… sleep with polling The test was flaky because: 1. The script invoked in a goroutine hadn't started executing on the server before SCRIPT KILL was attempted, causing 'NotBusy' errors. 2. The 5-second timeout was insufficient for slow CI environments. 3. The fixed time.Sleep(1s) at the end was unreliable for confirming the script was no longer running. Changes: - Increase script duration from 6s to 10s to ensure it's still running when the kill succeeds. - Increase kill polling timeout from 5s to 8s to accommodate slow script startup in CI. - Replace time.Sleep(1s) at the end with a polling loop that waits for the 'notbusy' state, making the test deterministic. Signed-off-by: Thomas Zhou <thomas.zhou@improving.com> Signed-off-by: Thomas Zhou <thomaszhou64@gmail.com>
Restructure the test to match the working testFunctionKillNoWrite pattern: - Run InvokeScriptWithRoute in the main thread (blocking) to guarantee the script is executing on the server before kill is attempted - Run ScriptKill polling in a goroutine - Use a longer request timeout (12s) so the invoke call blocks until killed This eliminates the race condition where ScriptKill was called before the script started, causing 'NotBusy' errors. Verified: 250/250 sequential runs with 0 failures. Fixes #5576 Signed-off-by: Thomas Zhou <thomaszhou64@gmail.com>
836f675 to
b3f6416
Compare
yipin-chen
approved these changes
May 21, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix flaky
TestGlideTestSuite/TestScriptKillWithRoutetest that intermittently fails with "NotBusy: No scripts in execution right now" errors and timeouts.Issue link
This Pull Request is linked to issue: [Go][Flaky Test] TestGlideTestSuite/TestScriptKillWithRoute
Closes #5576
Features / Behaviour Changes
No behaviour changes. This PR fixes test flakiness only.
Implementation
Root cause: The test launches a long-running Lua script in a goroutine, then immediately starts polling
SCRIPT KILL. In slow CI environments, the script has not started executing on the server yet, so every kill attempt returns "NotBusy". The 5-second timeout expires before the script begins, causing the test to fail.Fix:
time.Sleep(1 * time.Second)at the end with a deterministic polling loop that waits for the server to confirm "notbusy" state, eliminating another potential timing issue.Limitations
None
Testing
gofmtchecks.testFunctionKillNoWritewhich handles the same race condition reliably.Checklist