Skip to content

kvstorage: Add batching to the WAGTruncator#168351

Merged
trunk-io[bot] merged 2 commits intocockroachdb:masterfrom
iskettaneh:rse_truncate_7
Apr 20, 2026
Merged

kvstorage: Add batching to the WAGTruncator#168351
trunk-io[bot] merged 2 commits intocockroachdb:masterfrom
iskettaneh:rse_truncate_7

Conversation

@iskettaneh
Copy link
Copy Markdown
Contributor

@iskettaneh iskettaneh commented Apr 14, 2026

This PR adds the ability to truncate multiple WAG nodes in a single batch. It adds the cluster setting kv.wag.truncator_batch_size to control the batch size.

Batch sizes benchmark:

BenchmarkWAGTruncation/batchSize=1                247114             15261 ns/op            1002 B/op         20 allocs/op
BenchmarkWAGTruncation/batchSize=4                396939              5533 ns/op             608 B/op         11 allocs/op
BenchmarkWAGTruncation/batchSize=8                452824              5220 ns/op             549 B/op         10 allocs/op
BenchmarkWAGTruncation/batchSize=16               447286              4400 ns/op             505 B/op          9 allocs/op
BenchmarkWAGTruncation/batchSize=32               512752              2402 ns/op             503 B/op          9 allocs/op
BenchmarkWAGTruncation/batchSize=64               484096              3737 ns/op             481 B/op          9 allocs/op

References: #167607

Release note: None

Co-Authored-By: roachdev-claude roachdev-claude-bot@cockroachlabs.com

@iskettaneh iskettaneh requested a review from pav-kv April 14, 2026 18:19
@trunk-io
Copy link
Copy Markdown
Contributor

trunk-io Bot commented Apr 14, 2026

😎 Merged directly without going through the merge queue, as the queue was empty and the PR was up to date with the target branch - details.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@blathers-crl
Copy link
Copy Markdown

blathers-crl Bot commented Apr 14, 2026

Detected infrastructure failure (matched: ). Automatically rerunning failed jobs. (run link)

@iskettaneh iskettaneh marked this pull request as ready for review April 15, 2026 14:08
@iskettaneh iskettaneh requested review from a team as code owners April 15, 2026 14:08
@iskettaneh iskettaneh requested a review from sumeerbhola April 15, 2026 14:08
Comment thread pkg/kv/kvserver/kvstorage/wag_truncator.go Outdated
Comment thread pkg/kv/kvserver/kvstorage/wag_truncator.go Outdated
Comment thread pkg/kv/kvserver/kvstorage/wag_truncator.go Outdated
@iskettaneh iskettaneh requested review from a team and removed request for a team and sumeerbhola April 17, 2026 20:20
Copy link
Copy Markdown
Contributor Author

@iskettaneh iskettaneh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iskettaneh made 4 comments.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on pav-kv).


pkg/kv/kvserver/kvstorage/wag_truncator_test.go line 528 at r6 (raw file):

// thing, but it should give an idea of the improvement of different batch
// sizes.
func BenchmarkWAGTruncation(b *testing.B) {

@pav-kv I am not sure if the Benchmark is needed really, it just helped me make sure that the batching works and helped me pick a default batchSize.

Comment thread pkg/kv/kvserver/kvstorage/wag_truncator.go Outdated
Comment thread pkg/kv/kvserver/kvstorage/wag_truncator.go Outdated
Comment thread pkg/kv/kvserver/kvstorage/wag_truncator.go Outdated
@iskettaneh iskettaneh requested a review from pav-kv April 17, 2026 20:24
Copy link
Copy Markdown
Collaborator

@pav-kv pav-kv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The non-test code LGTM. I'll review tests a bit later, stepping off for today.

Comment thread pkg/kv/kvserver/kvstorage/wag_truncator.go
Comment thread pkg/kv/kvserver/kvstorage/wag_truncator.go Outdated
Comment thread pkg/kv/kvserver/kvstorage/wag_truncator.go Outdated
Comment thread pkg/kv/kvserver/kvstorage/wag_truncator.go
Comment thread pkg/kv/kvserver/kvstorage/wag_truncator.go Outdated
Comment thread pkg/kv/kvserver/kvstorage/wag_truncator.go Outdated
Comment thread pkg/kv/kvserver/kvstorage/wag_truncator.go Outdated
Comment thread pkg/kv/kvserver/kvstorage/wag_truncator.go Outdated
…learRaftState

Move batch creation, commit, and truncIndex advancement from
truncateAppliedNodes into truncateAppliedWAGNodeAndClearRaftState,
making the latter fully self-contained. This simplifies the caller
loop and makes the method signature cleaner (bool instead of uint64).

Release note: None

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Copy link
Copy Markdown
Contributor Author

@iskettaneh iskettaneh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iskettaneh made 5 comments and resolved 3 discussions.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on pav-kv).

Comment thread pkg/kv/kvserver/kvstorage/wag_truncator.go Outdated
Comment thread pkg/kv/kvserver/kvstorage/wag_truncator.go
Comment thread pkg/kv/kvserver/kvstorage/wag_truncator.go Outdated
Comment thread pkg/kv/kvserver/kvstorage/wag_truncator.go Outdated
Comment thread pkg/kv/kvserver/kvstorage/wag_truncator.go Outdated
@iskettaneh iskettaneh requested a review from pav-kv April 20, 2026 00:45
Copy link
Copy Markdown
Collaborator

@pav-kv pav-kv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % nits

for index, node := range iter.IterFrom(ctx, raft.RO, iterStartKey) {
if index != truncateIndex && index > t.initIndex {
truncated := t.truncIndex.Load()
iterStartKey := keys.StoreWAGNodeKey(truncated)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we started seeking one key below the first truncatable one. This is fine w.r.t. correctness, but maybe use truncated+1 here for precision?

Also, should we consider hiding the key generation inside IterFrom, now that it's always StoreWAGNodeKey(index)? Maybe in a separate commit if so.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will address the IterFrom in another PR (left a TODO)

wantRemaining []uint64
wantTruncIndex uint64
}{
{initIndex: 0, batchSize: 1, wantTruncated: false, wantRemaining: []uint64{3, 7, 10, 11, 12}, wantTruncIndex: 0},
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: wantTruncated is redundant - can be inferred either from wantTruncIndex != 0 or len(wantRemaining) != len(wagNodeIndices).

optionally: wantRemaining is inferrable from wantTruncIndex too - a suffix of wagNodeIndices that are > wantTruncIndex.

or vice versa (wantTruncIndex is inferrable): wantRemaining is always a suffix of wagNodeIndices, and wantTruncIndex is the index preceding this suffix.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed wantTruncated but I kept wantRemaining and wantTruncIndex.

}{
{initIndex: 0, batchSize: 1, wantTruncated: false, wantRemaining: []uint64{3, 7, 10, 11, 12}, wantTruncIndex: 0},
{initIndex: 0, batchSize: 8, wantTruncated: false, wantRemaining: []uint64{3, 7, 10, 11, 12}, wantTruncIndex: 0},
{initIndex: 3, batchSize: 1, wantTruncated: true, wantRemaining: []uint64{7, 10, 11, 12}, wantTruncIndex: 3},
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, could save some deal of screen space / redundancy:

Suggested change
{initIndex: 3, batchSize: 1, wantTruncated: true, wantRemaining: []uint64{7, 10, 11, 12}, wantTruncIndex: 3},
{init: 3, batch: 8, wantTrunc: 3},

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

{initIndex: 7, batchSize: 8, wantTruncated: true, wantRemaining: []uint64{10, 11, 12}, wantTruncIndex: 7},
{initIndex: 10, batchSize: 1, wantTruncated: true, wantRemaining: []uint64{7, 10, 11, 12}, wantTruncIndex: 3},
{initIndex: 10, batchSize: 4, wantTruncated: true, wantRemaining: []uint64{12}, wantTruncIndex: 11},
// Node 11 isn't applied yet, so it's not truncated.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understanding why the last node isn't applied is subtle and relies on writeWAGNodesAt generating the raft log the same size as the WAG, and its indices being exactly consecutive and start from 1.

Could you try making it more explicit in some way / less tightly coupled?

  • Maybe return the applied index from writeWAGNodesAt, and use applied-1 for SetRangeAppliedState. This depends only on consecutiveness, but not on the other 2 things.
  • Or, make a const for the applied index, next to WAG indices definition. Assert that writeWAGNodesAt ends up exactly at that index (either it returns the applied index, or we pass the target applied index in).

Please also clarify in this comment why node 11 isn't applied yet.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines +544 to +545
// Write numNodes WAG nodes that are all eligible for
// truncation.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: looks like it should be one line

b.Fatal(err)
}
if err := eng.Flush(); err != nil {
b.Fatal(err)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to use require here and above, since the timer is paused? Same as the code below does.

Or it would still screw allocations?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried with require, and the allocations remained similar:

BenchmarkWAGTruncation/batchSize=1 1048576 16071 ns/op 959 B/op 20 allocs/op
BenchmarkWAGTruncation/batchSize=4 1048576 7111 ns/op 620 B/op 11 allocs/op
BenchmarkWAGTruncation/batchSize=8 1048576 5575 ns/op 548 B/op 10 allocs/op
BenchmarkWAGTruncation/batchSize=16 1048576 2861 ns/op 502 B/op 9 allocs/op
BenchmarkWAGTruncation/batchSize=32 1048576 3992 ns/op 491 B/op 9 allocs/op
BenchmarkWAGTruncation/batchSize=64 1048576 2105 ns/op 486 B/op 9 allocs/op

err := truncator.truncateAppliedNodes(ctx)
b.StopTimer()
require.NoError(b, err)
require.Equal(b, truncator.truncIndex.Load(), uint64(b.N))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swap parameters: first comes the "expected" value, last comes the tested value.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Previously, truncateAppliedWAGNodeAndClearRaftState deleted one WAG node
per batch and committed immediately.

This commit does the following:

1) Rename truncateAppliedWAGNodeAndClearRaftState() to truncateBatch().

2) Introduce a cluster-setting that controls the batch size.

3) Try to fit up-to batchSize deletion in each call to truncateBatch().

Benchmark results:

```
BenchmarkWAGTruncation/batchSize=1                100000             15413 ns/op            1003 B/op         20 allocs/op
BenchmarkWAGTruncation/batchSize=1                100000             17085 ns/op             959 B/op         20 allocs/op
BenchmarkWAGTruncation/batchSize=1                100000             16959 ns/op            1003 B/op         20 allocs/op
BenchmarkWAGTruncation/batchSize=1                100000             15240 ns/op            1003 B/op         20 allocs/op
BenchmarkWAGTruncation/batchSize=1                100000             15508 ns/op            1003 B/op         20 allocs/op
BenchmarkWAGTruncation/batchSize=4                100000              7110 ns/op             611 B/op         11 allocs/op
BenchmarkWAGTruncation/batchSize=4                100000              5439 ns/op             610 B/op         11 allocs/op
BenchmarkWAGTruncation/batchSize=4                100000              5314 ns/op             610 B/op         11 allocs/op
BenchmarkWAGTruncation/batchSize=4                100000              7148 ns/op             590 B/op         11 allocs/op
BenchmarkWAGTruncation/batchSize=4                100000              7059 ns/op             612 B/op         11 allocs/op
BenchmarkWAGTruncation/batchSize=8                100000              3677 ns/op             539 B/op         10 allocs/op
BenchmarkWAGTruncation/batchSize=8                100000              5388 ns/op             540 B/op         10 allocs/op
BenchmarkWAGTruncation/batchSize=8                100000              5329 ns/op             539 B/op         10 allocs/op
BenchmarkWAGTruncation/batchSize=8                100000              3750 ns/op             540 B/op         10 allocs/op
BenchmarkWAGTruncation/batchSize=8                100000              5428 ns/op             541 B/op         10 allocs/op
BenchmarkWAGTruncation/batchSize=16               100000              3021 ns/op             495 B/op          9 allocs/op
BenchmarkWAGTruncation/batchSize=16               100000              4555 ns/op             496 B/op          9 allocs/op
BenchmarkWAGTruncation/batchSize=16               100000              2834 ns/op             495 B/op          9 allocs/op
BenchmarkWAGTruncation/batchSize=16               100000              2827 ns/op             495 B/op          9 allocs/op
BenchmarkWAGTruncation/batchSize=16               100000              4440 ns/op             501 B/op          9 allocs/op
BenchmarkWAGTruncation/batchSize=32               100000              2374 ns/op             487 B/op          9 allocs/op
BenchmarkWAGTruncation/batchSize=32               100000              3944 ns/op             484 B/op          9 allocs/op
BenchmarkWAGTruncation/batchSize=32               100000              2514 ns/op             486 B/op          9 allocs/op
BenchmarkWAGTruncation/batchSize=32               100000              2406 ns/op             485 B/op          9 allocs/op
BenchmarkWAGTruncation/batchSize=32               100000              4016 ns/op             484 B/op          9 allocs/op
BenchmarkWAGTruncation/batchSize=64               100000              3946 ns/op             479 B/op          9 allocs/op
BenchmarkWAGTruncation/batchSize=64               100000              2328 ns/op             480 B/op          9 allocs/op
BenchmarkWAGTruncation/batchSize=64               100000              2247 ns/op             480 B/op          9 allocs/op
BenchmarkWAGTruncation/batchSize=64               100000              3747 ns/op             480 B/op          9 allocs/op
BenchmarkWAGTruncation/batchSize=64               100000              2192 ns/op             480 B/op          9 allocs/op
```

Release note: None
Epic: none
Copy link
Copy Markdown
Contributor Author

@iskettaneh iskettaneh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iskettaneh made 6 comments and resolved 1 discussion.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on pav-kv).

for index, node := range iter.IterFrom(ctx, raft.RO, iterStartKey) {
if index != truncateIndex && index > t.initIndex {
truncated := t.truncIndex.Load()
iterStartKey := keys.StoreWAGNodeKey(truncated)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will address the IterFrom in another PR (left a TODO)

wantRemaining []uint64
wantTruncIndex uint64
}{
{initIndex: 0, batchSize: 1, wantTruncated: false, wantRemaining: []uint64{3, 7, 10, 11, 12}, wantTruncIndex: 0},
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed wantTruncated but I kept wantRemaining and wantTruncIndex.

}{
{initIndex: 0, batchSize: 1, wantTruncated: false, wantRemaining: []uint64{3, 7, 10, 11, 12}, wantTruncIndex: 0},
{initIndex: 0, batchSize: 8, wantTruncated: false, wantRemaining: []uint64{3, 7, 10, 11, 12}, wantTruncIndex: 0},
{initIndex: 3, batchSize: 1, wantTruncated: true, wantRemaining: []uint64{7, 10, 11, 12}, wantTruncIndex: 3},
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

{initIndex: 7, batchSize: 8, wantTruncated: true, wantRemaining: []uint64{10, 11, 12}, wantTruncIndex: 7},
{initIndex: 10, batchSize: 1, wantTruncated: true, wantRemaining: []uint64{7, 10, 11, 12}, wantTruncIndex: 3},
{initIndex: 10, batchSize: 4, wantTruncated: true, wantRemaining: []uint64{12}, wantTruncIndex: 11},
// Node 11 isn't applied yet, so it's not truncated.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

b.Fatal(err)
}
if err := eng.Flush(); err != nil {
b.Fatal(err)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried with require, and the allocations remained similar:

BenchmarkWAGTruncation/batchSize=1 1048576 16071 ns/op 959 B/op 20 allocs/op
BenchmarkWAGTruncation/batchSize=4 1048576 7111 ns/op 620 B/op 11 allocs/op
BenchmarkWAGTruncation/batchSize=8 1048576 5575 ns/op 548 B/op 10 allocs/op
BenchmarkWAGTruncation/batchSize=16 1048576 2861 ns/op 502 B/op 9 allocs/op
BenchmarkWAGTruncation/batchSize=32 1048576 3992 ns/op 491 B/op 9 allocs/op
BenchmarkWAGTruncation/batchSize=64 1048576 2105 ns/op 486 B/op 9 allocs/op

err := truncator.truncateAppliedNodes(ctx)
b.StopTimer()
require.NoError(b, err)
require.Equal(b, truncator.truncIndex.Load(), uint64(b.N))
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@iskettaneh
Copy link
Copy Markdown
Contributor Author

TFTR!

/trunk merge

@trunk-io trunk-io Bot merged commit ba589e3 into cockroachdb:master Apr 20, 2026
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants