kvstorage: Add batching to the WAGTruncator#168351
kvstorage: Add batching to the WAGTruncator#168351trunk-io[bot] merged 2 commits intocockroachdb:masterfrom
Conversation
|
😎 Merged directly without going through the merge queue, as the queue was empty and the PR was up to date with the target branch - details. |
|
Detected infrastructure failure (matched: ). Automatically rerunning failed jobs. (run link) |
4236da9 to
55a5cbd
Compare
55a5cbd to
5ca62af
Compare
iskettaneh
left a comment
There was a problem hiding this comment.
@iskettaneh made 4 comments.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on pav-kv).
pkg/kv/kvserver/kvstorage/wag_truncator_test.go line 528 at r6 (raw file):
// thing, but it should give an idea of the improvement of different batch // sizes. func BenchmarkWAGTruncation(b *testing.B) {
@pav-kv I am not sure if the Benchmark is needed really, it just helped me make sure that the batching works and helped me pick a default batchSize.
pav-kv
left a comment
There was a problem hiding this comment.
The non-test code LGTM. I'll review tests a bit later, stepping off for today.
…learRaftState Move batch creation, commit, and truncIndex advancement from truncateAppliedNodes into truncateAppliedWAGNodeAndClearRaftState, making the latter fully self-contained. This simplifies the caller loop and makes the method signature cleaner (bool instead of uint64). Release note: None Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
5ca62af to
cf21b21
Compare
iskettaneh
left a comment
There was a problem hiding this comment.
@iskettaneh made 5 comments and resolved 3 discussions.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on pav-kv).
| for index, node := range iter.IterFrom(ctx, raft.RO, iterStartKey) { | ||
| if index != truncateIndex && index > t.initIndex { | ||
| truncated := t.truncIndex.Load() | ||
| iterStartKey := keys.StoreWAGNodeKey(truncated) |
There was a problem hiding this comment.
Looks like we started seeking one key below the first truncatable one. This is fine w.r.t. correctness, but maybe use truncated+1 here for precision?
Also, should we consider hiding the key generation inside IterFrom, now that it's always StoreWAGNodeKey(index)? Maybe in a separate commit if so.
There was a problem hiding this comment.
Will address the IterFrom in another PR (left a TODO)
| wantRemaining []uint64 | ||
| wantTruncIndex uint64 | ||
| }{ | ||
| {initIndex: 0, batchSize: 1, wantTruncated: false, wantRemaining: []uint64{3, 7, 10, 11, 12}, wantTruncIndex: 0}, |
There was a problem hiding this comment.
nit: wantTruncated is redundant - can be inferred either from wantTruncIndex != 0 or len(wantRemaining) != len(wagNodeIndices).
optionally: wantRemaining is inferrable from wantTruncIndex too - a suffix of wagNodeIndices that are > wantTruncIndex.
or vice versa (wantTruncIndex is inferrable): wantRemaining is always a suffix of wagNodeIndices, and wantTruncIndex is the index preceding this suffix.
There was a problem hiding this comment.
I removed wantTruncated but I kept wantRemaining and wantTruncIndex.
| }{ | ||
| {initIndex: 0, batchSize: 1, wantTruncated: false, wantRemaining: []uint64{3, 7, 10, 11, 12}, wantTruncIndex: 0}, | ||
| {initIndex: 0, batchSize: 8, wantTruncated: false, wantRemaining: []uint64{3, 7, 10, 11, 12}, wantTruncIndex: 0}, | ||
| {initIndex: 3, batchSize: 1, wantTruncated: true, wantRemaining: []uint64{7, 10, 11, 12}, wantTruncIndex: 3}, |
There was a problem hiding this comment.
Overall, could save some deal of screen space / redundancy:
| {initIndex: 3, batchSize: 1, wantTruncated: true, wantRemaining: []uint64{7, 10, 11, 12}, wantTruncIndex: 3}, | |
| {init: 3, batch: 8, wantTrunc: 3}, |
| {initIndex: 7, batchSize: 8, wantTruncated: true, wantRemaining: []uint64{10, 11, 12}, wantTruncIndex: 7}, | ||
| {initIndex: 10, batchSize: 1, wantTruncated: true, wantRemaining: []uint64{7, 10, 11, 12}, wantTruncIndex: 3}, | ||
| {initIndex: 10, batchSize: 4, wantTruncated: true, wantRemaining: []uint64{12}, wantTruncIndex: 11}, | ||
| // Node 11 isn't applied yet, so it's not truncated. |
There was a problem hiding this comment.
Understanding why the last node isn't applied is subtle and relies on writeWAGNodesAt generating the raft log the same size as the WAG, and its indices being exactly consecutive and start from 1.
Could you try making it more explicit in some way / less tightly coupled?
- Maybe return the applied index from
writeWAGNodesAt, and useapplied-1forSetRangeAppliedState. This depends only on consecutiveness, but not on the other 2 things. - Or, make a
constfor the applied index, next to WAG indices definition. Assert thatwriteWAGNodesAtends up exactly at that index (either it returns the applied index, or we pass the target applied index in).
Please also clarify in this comment why node 11 isn't applied yet.
| // Write numNodes WAG nodes that are all eligible for | ||
| // truncation. |
There was a problem hiding this comment.
nit: looks like it should be one line
| b.Fatal(err) | ||
| } | ||
| if err := eng.Flush(); err != nil { | ||
| b.Fatal(err) |
There was a problem hiding this comment.
Do we want to use require here and above, since the timer is paused? Same as the code below does.
Or it would still screw allocations?
There was a problem hiding this comment.
I tried with require, and the allocations remained similar:
BenchmarkWAGTruncation/batchSize=1 1048576 16071 ns/op 959 B/op 20 allocs/op
BenchmarkWAGTruncation/batchSize=4 1048576 7111 ns/op 620 B/op 11 allocs/op
BenchmarkWAGTruncation/batchSize=8 1048576 5575 ns/op 548 B/op 10 allocs/op
BenchmarkWAGTruncation/batchSize=16 1048576 2861 ns/op 502 B/op 9 allocs/op
BenchmarkWAGTruncation/batchSize=32 1048576 3992 ns/op 491 B/op 9 allocs/op
BenchmarkWAGTruncation/batchSize=64 1048576 2105 ns/op 486 B/op 9 allocs/op
| err := truncator.truncateAppliedNodes(ctx) | ||
| b.StopTimer() | ||
| require.NoError(b, err) | ||
| require.Equal(b, truncator.truncIndex.Load(), uint64(b.N)) |
There was a problem hiding this comment.
Swap parameters: first comes the "expected" value, last comes the tested value.
Previously, truncateAppliedWAGNodeAndClearRaftState deleted one WAG node per batch and committed immediately. This commit does the following: 1) Rename truncateAppliedWAGNodeAndClearRaftState() to truncateBatch(). 2) Introduce a cluster-setting that controls the batch size. 3) Try to fit up-to batchSize deletion in each call to truncateBatch(). Benchmark results: ``` BenchmarkWAGTruncation/batchSize=1 100000 15413 ns/op 1003 B/op 20 allocs/op BenchmarkWAGTruncation/batchSize=1 100000 17085 ns/op 959 B/op 20 allocs/op BenchmarkWAGTruncation/batchSize=1 100000 16959 ns/op 1003 B/op 20 allocs/op BenchmarkWAGTruncation/batchSize=1 100000 15240 ns/op 1003 B/op 20 allocs/op BenchmarkWAGTruncation/batchSize=1 100000 15508 ns/op 1003 B/op 20 allocs/op BenchmarkWAGTruncation/batchSize=4 100000 7110 ns/op 611 B/op 11 allocs/op BenchmarkWAGTruncation/batchSize=4 100000 5439 ns/op 610 B/op 11 allocs/op BenchmarkWAGTruncation/batchSize=4 100000 5314 ns/op 610 B/op 11 allocs/op BenchmarkWAGTruncation/batchSize=4 100000 7148 ns/op 590 B/op 11 allocs/op BenchmarkWAGTruncation/batchSize=4 100000 7059 ns/op 612 B/op 11 allocs/op BenchmarkWAGTruncation/batchSize=8 100000 3677 ns/op 539 B/op 10 allocs/op BenchmarkWAGTruncation/batchSize=8 100000 5388 ns/op 540 B/op 10 allocs/op BenchmarkWAGTruncation/batchSize=8 100000 5329 ns/op 539 B/op 10 allocs/op BenchmarkWAGTruncation/batchSize=8 100000 3750 ns/op 540 B/op 10 allocs/op BenchmarkWAGTruncation/batchSize=8 100000 5428 ns/op 541 B/op 10 allocs/op BenchmarkWAGTruncation/batchSize=16 100000 3021 ns/op 495 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=16 100000 4555 ns/op 496 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=16 100000 2834 ns/op 495 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=16 100000 2827 ns/op 495 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=16 100000 4440 ns/op 501 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=32 100000 2374 ns/op 487 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=32 100000 3944 ns/op 484 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=32 100000 2514 ns/op 486 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=32 100000 2406 ns/op 485 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=32 100000 4016 ns/op 484 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=64 100000 3946 ns/op 479 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=64 100000 2328 ns/op 480 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=64 100000 2247 ns/op 480 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=64 100000 3747 ns/op 480 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=64 100000 2192 ns/op 480 B/op 9 allocs/op ``` Release note: None Epic: none
cf21b21 to
681bd67
Compare
iskettaneh
left a comment
There was a problem hiding this comment.
@iskettaneh made 6 comments and resolved 1 discussion.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on pav-kv).
| for index, node := range iter.IterFrom(ctx, raft.RO, iterStartKey) { | ||
| if index != truncateIndex && index > t.initIndex { | ||
| truncated := t.truncIndex.Load() | ||
| iterStartKey := keys.StoreWAGNodeKey(truncated) |
There was a problem hiding this comment.
Will address the IterFrom in another PR (left a TODO)
| wantRemaining []uint64 | ||
| wantTruncIndex uint64 | ||
| }{ | ||
| {initIndex: 0, batchSize: 1, wantTruncated: false, wantRemaining: []uint64{3, 7, 10, 11, 12}, wantTruncIndex: 0}, |
There was a problem hiding this comment.
I removed wantTruncated but I kept wantRemaining and wantTruncIndex.
| }{ | ||
| {initIndex: 0, batchSize: 1, wantTruncated: false, wantRemaining: []uint64{3, 7, 10, 11, 12}, wantTruncIndex: 0}, | ||
| {initIndex: 0, batchSize: 8, wantTruncated: false, wantRemaining: []uint64{3, 7, 10, 11, 12}, wantTruncIndex: 0}, | ||
| {initIndex: 3, batchSize: 1, wantTruncated: true, wantRemaining: []uint64{7, 10, 11, 12}, wantTruncIndex: 3}, |
| {initIndex: 7, batchSize: 8, wantTruncated: true, wantRemaining: []uint64{10, 11, 12}, wantTruncIndex: 7}, | ||
| {initIndex: 10, batchSize: 1, wantTruncated: true, wantRemaining: []uint64{7, 10, 11, 12}, wantTruncIndex: 3}, | ||
| {initIndex: 10, batchSize: 4, wantTruncated: true, wantRemaining: []uint64{12}, wantTruncIndex: 11}, | ||
| // Node 11 isn't applied yet, so it's not truncated. |
| b.Fatal(err) | ||
| } | ||
| if err := eng.Flush(); err != nil { | ||
| b.Fatal(err) |
There was a problem hiding this comment.
I tried with require, and the allocations remained similar:
BenchmarkWAGTruncation/batchSize=1 1048576 16071 ns/op 959 B/op 20 allocs/op
BenchmarkWAGTruncation/batchSize=4 1048576 7111 ns/op 620 B/op 11 allocs/op
BenchmarkWAGTruncation/batchSize=8 1048576 5575 ns/op 548 B/op 10 allocs/op
BenchmarkWAGTruncation/batchSize=16 1048576 2861 ns/op 502 B/op 9 allocs/op
BenchmarkWAGTruncation/batchSize=32 1048576 3992 ns/op 491 B/op 9 allocs/op
BenchmarkWAGTruncation/batchSize=64 1048576 2105 ns/op 486 B/op 9 allocs/op
| err := truncator.truncateAppliedNodes(ctx) | ||
| b.StopTimer() | ||
| require.NoError(b, err) | ||
| require.Equal(b, truncator.truncIndex.Load(), uint64(b.N)) |
|
TFTR! /trunk merge |
This PR adds the ability to truncate multiple WAG nodes in a single batch. It adds the cluster setting
kv.wag.truncator_batch_sizeto control the batch size.Batch sizes benchmark:
References: #167607
Release note: None
Co-Authored-By: roachdev-claude roachdev-claude-bot@cockroachlabs.com