Cap RequestErrUnprepared retry recursion on query and batch paths#1952
Open
arloliu wants to merge 1 commit into
Open
Cap RequestErrUnprepared retry recursion on query and batch paths#1952arloliu wants to merge 1 commit into
arloliu wants to merge 1 commit into
Conversation
Conn.executeQuery and Conn.executeBatch both responded to a
*RequestErrUnprepared by evicting the prepared-statement cache entry
and recursing on themselves with no upper bound. When the server
persistently re-reports the same statement as unprepared after
re-prepare (a coordinator thrashing its prepared-statement cache,
or a misbehaving proxy/fork), the recursion never terminates. The
goroutine stack eventually exceeds runtime.SetMaxStack (1 GiB by
default), at which point Go runtime.throw crashes the entire process
with an unrecoverable stack-overflow — no recover() can intercept it.
The failure mode is reachable from any prepared-statement workload;
the batch path is on the hot write path.
Cap recursion on both paths at maxUnprepRetries = 5 by threading an
unprepAttempt counter through unexported helpers
executeQueryWithUnprepRetries and executeBatchWithUnprepRetries. When
the cap fires, return an Iter whose err is
fmt.Errorf("...after N re-prepare attempts: %w", serverErr)
The %w wrap means callers can:
- Detect cap-driven failures via error message pattern.
- Recover the underlying *RequestErrUnprepared with errors.As to
inspect the StatementId the server kept rejecting.
Behavior is a strict superset of the prior code: queries and batches
that succeed within 5 attempts behave exactly as before. Only the
pathological no-progress case is changed.
Test fake-server (conn_test.go) gains an always-unprep opPrepare
case returning id=99, an opBatch arm that replies ErrCodeUnprepared
when any statement carries id=99, and case-insensitive verb-trimming
in the opPrepare query-name parser (select/insert/update/delete) so
DML in batches can reach the always-unprep case. Two new tests in
unprep_retry_test.go drive the always-unprep path through Query.Exec
and Session.ExecuteBatch respectively, assert no infinite recursion,
verify the wrap is recoverable via errors.As, and check that the
server received exactly maxUnprepRetries+1 prepare/execute (resp.
prepare/batch) pairs.
Patch by Arlo Liu
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Conn.executeQueryandConn.executeBatchboth respond to a*RequestErrUnpreparedby evicting the prepared-statement cache entry and recursing on themselves with no upper bound.When the server persistently re-reports the same statement as unprepared after re-prepare — for example a coordinator thrashing its prepared-statement cache under high statement cardinality, or a misbehaving proxy/fork — the recursion never terminates. The goroutine stack eventually exceeds
runtime.SetMaxStack(1 GiB by default), at which point Go's runtime crashes the entire process with an unrecoverable stack-overflow throw. Norecover()can intercept it.The failure mode is reachable from any prepared-statement workload; the batch path is on the hot write path. No attacker is required — organic Cassandra prep-cache thrash is sufficient.
This PR caps re-prepare retries on both paths at 5 and surfaces a descriptive error wrapping the underlying
*RequestErrUnpreparedsoerrors.Ascontinues to work.Approach
executeQueryWithUnprepRetries/executeBatchWithUnprepRetriestake anunprepAttempt intand increment it on each retry.executeQuery,executeBatch) become thin wrappers that start the counter at 0.maxUnprepRetries = 5, the iter is returned witherr = fmt.Errorf("…after N re-prepare attempts: %w", serverErr).Tests
Test fake-server (
conn_test.go) gains:always-unprepopPreparecase returningid=99,opBatcharm that repliesErrCodeUnpreparedwhen any statement carriesid=99,opPreparequery-name parser (select|insert|update|delete) so DML in batches can reach thealways-unprepcase.unprep_retry_test.go(new):TestExecuteQuery_UnprepRetryIsCappeddrives the always-unprep path throughQuery.Exec.TestExecuteBatch_UnprepRetryIsCappeddrives it throughSession.ExecuteBatch."re-prepare attempts",errors.As(err, &*RequestErrUnprepared)succeeds, and the server received exactlymaxUnprepRetries + 1prepare/execute (resp. prepare/batch) pairs.Test plan
make checkcleanmake test-unitgreen (both new tests pass)Notes
No CASSGO ticket attached; happy to file one and amend the commit / CHANGELOG entry if the committer prefers.