ptstorage: lock the meta row during protect and release#170655
Merged
trunk-io[bot] merged 1 commit intoMay 22, 2026
Conversation
Contributor
|
😎 Merged successfully - details. |
Member
6df8dad to
8fa9718
Compare
Collaborator
Author
|
I'm going to follow up by ripping this write out hopefully, but the FOR UPDATE does seem to help in high concurrency situations. |
tbg
approved these changes
May 21, 2026
Both protect and release read the singleton system.protected_ts_meta
row via currentMetaCTE, then upsert a new version of it. Concurrent
callers all read the row at the same HLC, all try to write a new
version, and the losers retry on WriteTooOld. Beyond ~100 retries the
call errors out. Successful retries still leave MVCC versions piled up
on the meta row, so subsequent reads scan more history and slow down
over the life of the cluster.
Acquire an exclusive lock on the meta row during the CTE read.
Concurrent callers queue at the lock instead of colliding on the
upsert. FOR UPDATE is bound to the real-table leg of the CTE so the
synthetic zero-row fallback (used when no meta row exists yet) does
not lock anything. getMetadataQuery is read-only and keeps using the
non-locking CTE.
BenchmarkProtect and BenchmarkRelease, added alongside, measure
throughput at 128 concurrent writers on an in-process server:
name old sec/op new sec/op delta
Protect/workers=128-10 44.744m ± 25% 9.911m ± 42% -77.85% (p=0.000 n=10)
Release/workers=128-10 55.683m ± 41% 9.768m ± 52% -82.46% (p=0.000 n=10)
The baseline runs also produced retry-budget-exhaustion errors during
ramp-up (errs/op 0.13-0.32 in the first trial); after the change
errs/op is 0 across all trials.
Epic: none
Release note (performance improvement): Concurrent protected timestamp
protect and release calls (used heavily by backup and changefeed) now
serialize on the meta row rather than racing into WriteTooOld retries.
Workloads that create or release many protected timestamp records at
once see substantially higher throughput.
Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
50ade81 to
eae678c
Compare
Collaborator
Author
|
/trunk merge TFTR! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Both protect and release read the singleton
system.protected_ts_metarow via
currentMetaCTE, then upsert a new version of it. Concurrentcallers all read the row at the same HLC, all try to write a new
version, and the losers retry on WriteTooOld. Beyond ~100 retries the
call errors out. Successful retries still leave MVCC versions piled up
on the meta row, so subsequent reads scan more history and slow down
over the life of the cluster.
This PR acquires an exclusive lock on the meta row during the CTE
read. Concurrent callers queue at the lock instead of colliding on the
upsert.
FOR UPDATEis bound to the real-table leg of the CTE so thesynthetic zero-row fallback (used when no meta row exists yet) does
not lock anything.
getMetadataQueryis read-only and keeps using thenon-locking CTE.
BenchmarkProtectandBenchmarkRelease, added alongside, measurethroughput at 128 concurrent writers on an in-process server:
The baseline runs also produced retry-budget-exhaustion errors during
ramp-up (
errs/op0.13–0.32 in the first trial); after the changeerrs/opis 0 across all trials.Epic: none
Release note (performance improvement): Concurrent protected timestamp
protect and release calls (used heavily by backup and changefeed) now
serialize on the meta row rather than racing into WriteTooOld retries.
Workloads that create or release many protected timestamp records at
once see substantially higher throughput.