kvstorage: Add batching to the WAGTruncator by iskettaneh · Pull Request #168351 · cockroachdb/cockroach

iskettaneh · 2026-04-14T18:19:19Z

This PR adds the ability to truncate multiple WAG nodes in a single batch. It adds the cluster setting kv.wag.truncator_batch_size to control the batch size.

Batch sizes benchmark:

BenchmarkWAGTruncation/batchSize=1                247114             15261 ns/op            1002 B/op         20 allocs/op
BenchmarkWAGTruncation/batchSize=4                396939              5533 ns/op             608 B/op         11 allocs/op
BenchmarkWAGTruncation/batchSize=8                452824              5220 ns/op             549 B/op         10 allocs/op
BenchmarkWAGTruncation/batchSize=16               447286              4400 ns/op             505 B/op          9 allocs/op
BenchmarkWAGTruncation/batchSize=32               512752              2402 ns/op             503 B/op          9 allocs/op
BenchmarkWAGTruncation/batchSize=64               484096              3737 ns/op             481 B/op          9 allocs/op

References: #167607

Release note: None

Co-Authored-By: roachdev-claude roachdev-claude-bot@cockroachlabs.com

trunk-io · 2026-04-14T18:19:23Z

😎 Merged directly without going through the merge queue, as the queue was empty and the PR was up to date with the target branch - details.

cockroach-teamcity · 2026-04-14T18:19:39Z

This change is

blathers-crl · 2026-04-14T18:56:49Z

Detected infrastructure failure (matched: ). Automatically rerunning failed jobs. (run link)

iskettaneh

@iskettaneh made 4 comments.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on pav-kv).

pkg/kv/kvserver/kvstorage/wag_truncator_test.go line 528 at r6 (raw file):

// thing, but it should give an idea of the improvement of different batch
// sizes.
func BenchmarkWAGTruncation(b *testing.B) {

@pav-kv I am not sure if the Benchmark is needed really, it just helped me make sure that the batching works and helped me pick a default batchSize.

pav-kv

The non-test code LGTM. I'll review tests a bit later, stepping off for today.

…learRaftState Move batch creation, commit, and truncIndex advancement from truncateAppliedNodes into truncateAppliedWAGNodeAndClearRaftState, making the latter fully self-contained. This simplifies the caller loop and makes the method signature cleaner (bool instead of uint64). Release note: None Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>

iskettaneh

@iskettaneh made 5 comments and resolved 3 discussions.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on pav-kv).

pav-kv

LGTM % nits

pav-kv · 2026-04-20T12:22:16Z

-	for index, node := range iter.IterFrom(ctx, raft.RO, iterStartKey) {
-		if index != truncateIndex && index > t.initIndex {
+	truncated := t.truncIndex.Load()
+	iterStartKey := keys.StoreWAGNodeKey(truncated)


Looks like we started seeking one key below the first truncatable one. This is fine w.r.t. correctness, but maybe use truncated+1 here for precision?

Also, should we consider hiding the key generation inside IterFrom, now that it's always StoreWAGNodeKey(index)? Maybe in a separate commit if so.

Will address the IterFrom in another PR (left a TODO)

pav-kv · 2026-04-20T12:23:43Z

+		wantRemaining  []uint64
+		wantTruncIndex uint64
+	}{
+		{initIndex: 0, batchSize: 1, wantTruncated: false, wantRemaining: []uint64{3, 7, 10, 11, 12}, wantTruncIndex: 0},


nit: wantTruncated is redundant - can be inferred either from wantTruncIndex != 0 or len(wantRemaining) != len(wagNodeIndices).

optionally: wantRemaining is inferrable from wantTruncIndex too - a suffix of wagNodeIndices that are > wantTruncIndex.

or vice versa (wantTruncIndex is inferrable): wantRemaining is always a suffix of wagNodeIndices, and wantTruncIndex is the index preceding this suffix.

I removed wantTruncated but I kept wantRemaining and wantTruncIndex.

pav-kv · 2026-04-20T12:34:04Z

+	}{
+		{initIndex: 0, batchSize: 1, wantTruncated: false, wantRemaining: []uint64{3, 7, 10, 11, 12}, wantTruncIndex: 0},
+		{initIndex: 0, batchSize: 8, wantTruncated: false, wantRemaining: []uint64{3, 7, 10, 11, 12}, wantTruncIndex: 0},
+		{initIndex: 3, batchSize: 1, wantTruncated: true, wantRemaining: []uint64{7, 10, 11, 12}, wantTruncIndex: 3},


Overall, could save some deal of screen space / redundancy:

Suggested change

{initIndex: 3, batchSize: 1, wantTruncated: true, wantRemaining: []uint64{7, 10, 11, 12}, wantTruncIndex: 3},

{init: 3, batch: 8, wantTrunc: 3},

pav-kv · 2026-04-20T13:04:44Z

+		{initIndex: 7, batchSize: 8, wantTruncated: true, wantRemaining: []uint64{10, 11, 12}, wantTruncIndex: 7},
+		{initIndex: 10, batchSize: 1, wantTruncated: true, wantRemaining: []uint64{7, 10, 11, 12}, wantTruncIndex: 3},
+		{initIndex: 10, batchSize: 4, wantTruncated: true, wantRemaining: []uint64{12}, wantTruncIndex: 11},
+		// Node 11 isn't applied yet, so it's not truncated.


Understanding why the last node isn't applied is subtle and relies on writeWAGNodesAt generating the raft log the same size as the WAG, and its indices being exactly consecutive and start from 1.

Could you try making it more explicit in some way / less tightly coupled?

Maybe return the applied index from writeWAGNodesAt, and use applied-1 for SetRangeAppliedState. This depends only on consecutiveness, but not on the other 2 things.

Or, make a const for the applied index, next to WAG indices definition. Assert that writeWAGNodesAt ends up exactly at that index (either it returns the applied index, or we pass the target applied index in).

Please also clarify in this comment why node 11 isn't applied yet.

pav-kv · 2026-04-20T13:13:14Z

+			// Write numNodes WAG nodes that are all eligible for
+			// truncation.


nit: looks like it should be one line

pav-kv · 2026-04-20T13:20:09Z

+				b.Fatal(err)
+			}
+			if err := eng.Flush(); err != nil {
+				b.Fatal(err)


Do we want to use require here and above, since the timer is paused? Same as the code below does.

Or it would still screw allocations?

I tried with require, and the allocations remained similar:

BenchmarkWAGTruncation/batchSize=1 1048576 16071 ns/op 959 B/op 20 allocs/op
BenchmarkWAGTruncation/batchSize=4 1048576 7111 ns/op 620 B/op 11 allocs/op
BenchmarkWAGTruncation/batchSize=8 1048576 5575 ns/op 548 B/op 10 allocs/op
BenchmarkWAGTruncation/batchSize=16 1048576 2861 ns/op 502 B/op 9 allocs/op
BenchmarkWAGTruncation/batchSize=32 1048576 3992 ns/op 491 B/op 9 allocs/op
BenchmarkWAGTruncation/batchSize=64 1048576 2105 ns/op 486 B/op 9 allocs/op

pav-kv · 2026-04-20T13:21:25Z

+			err := truncator.truncateAppliedNodes(ctx)
+			b.StopTimer()
+			require.NoError(b, err)
+			require.Equal(b, truncator.truncIndex.Load(), uint64(b.N))


Swap parameters: first comes the "expected" value, last comes the tested value.

Previously, truncateAppliedWAGNodeAndClearRaftState deleted one WAG node per batch and committed immediately. This commit does the following: 1) Rename truncateAppliedWAGNodeAndClearRaftState() to truncateBatch(). 2) Introduce a cluster-setting that controls the batch size. 3) Try to fit up-to batchSize deletion in each call to truncateBatch(). Benchmark results: ``` BenchmarkWAGTruncation/batchSize=1 100000 15413 ns/op 1003 B/op 20 allocs/op BenchmarkWAGTruncation/batchSize=1 100000 17085 ns/op 959 B/op 20 allocs/op BenchmarkWAGTruncation/batchSize=1 100000 16959 ns/op 1003 B/op 20 allocs/op BenchmarkWAGTruncation/batchSize=1 100000 15240 ns/op 1003 B/op 20 allocs/op BenchmarkWAGTruncation/batchSize=1 100000 15508 ns/op 1003 B/op 20 allocs/op BenchmarkWAGTruncation/batchSize=4 100000 7110 ns/op 611 B/op 11 allocs/op BenchmarkWAGTruncation/batchSize=4 100000 5439 ns/op 610 B/op 11 allocs/op BenchmarkWAGTruncation/batchSize=4 100000 5314 ns/op 610 B/op 11 allocs/op BenchmarkWAGTruncation/batchSize=4 100000 7148 ns/op 590 B/op 11 allocs/op BenchmarkWAGTruncation/batchSize=4 100000 7059 ns/op 612 B/op 11 allocs/op BenchmarkWAGTruncation/batchSize=8 100000 3677 ns/op 539 B/op 10 allocs/op BenchmarkWAGTruncation/batchSize=8 100000 5388 ns/op 540 B/op 10 allocs/op BenchmarkWAGTruncation/batchSize=8 100000 5329 ns/op 539 B/op 10 allocs/op BenchmarkWAGTruncation/batchSize=8 100000 3750 ns/op 540 B/op 10 allocs/op BenchmarkWAGTruncation/batchSize=8 100000 5428 ns/op 541 B/op 10 allocs/op BenchmarkWAGTruncation/batchSize=16 100000 3021 ns/op 495 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=16 100000 4555 ns/op 496 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=16 100000 2834 ns/op 495 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=16 100000 2827 ns/op 495 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=16 100000 4440 ns/op 501 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=32 100000 2374 ns/op 487 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=32 100000 3944 ns/op 484 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=32 100000 2514 ns/op 486 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=32 100000 2406 ns/op 485 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=32 100000 4016 ns/op 484 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=64 100000 3946 ns/op 479 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=64 100000 2328 ns/op 480 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=64 100000 2247 ns/op 480 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=64 100000 3747 ns/op 480 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=64 100000 2192 ns/op 480 B/op 9 allocs/op ``` Release note: None Epic: none

iskettaneh

@iskettaneh made 6 comments and resolved 1 discussion.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on pav-kv).

iskettaneh · 2026-04-20T20:37:09Z

-	for index, node := range iter.IterFrom(ctx, raft.RO, iterStartKey) {
-		if index != truncateIndex && index > t.initIndex {
+	truncated := t.truncIndex.Load()
+	iterStartKey := keys.StoreWAGNodeKey(truncated)


Will address the IterFrom in another PR (left a TODO)

iskettaneh · 2026-04-20T20:37:09Z

+		wantRemaining  []uint64
+		wantTruncIndex uint64
+	}{
+		{initIndex: 0, batchSize: 1, wantTruncated: false, wantRemaining: []uint64{3, 7, 10, 11, 12}, wantTruncIndex: 0},


I removed wantTruncated but I kept wantRemaining and wantTruncIndex.

iskettaneh · 2026-04-20T20:37:10Z

+	}{
+		{initIndex: 0, batchSize: 1, wantTruncated: false, wantRemaining: []uint64{3, 7, 10, 11, 12}, wantTruncIndex: 0},
+		{initIndex: 0, batchSize: 8, wantTruncated: false, wantRemaining: []uint64{3, 7, 10, 11, 12}, wantTruncIndex: 0},
+		{initIndex: 3, batchSize: 1, wantTruncated: true, wantRemaining: []uint64{7, 10, 11, 12}, wantTruncIndex: 3},


iskettaneh · 2026-04-20T20:37:10Z

+		{initIndex: 7, batchSize: 8, wantTruncated: true, wantRemaining: []uint64{10, 11, 12}, wantTruncIndex: 7},
+		{initIndex: 10, batchSize: 1, wantTruncated: true, wantRemaining: []uint64{7, 10, 11, 12}, wantTruncIndex: 3},
+		{initIndex: 10, batchSize: 4, wantTruncated: true, wantRemaining: []uint64{12}, wantTruncIndex: 11},
+		// Node 11 isn't applied yet, so it's not truncated.


iskettaneh · 2026-04-20T20:37:11Z

+				b.Fatal(err)
+			}
+			if err := eng.Flush(); err != nil {
+				b.Fatal(err)


I tried with require, and the allocations remained similar:

BenchmarkWAGTruncation/batchSize=1 1048576 16071 ns/op 959 B/op 20 allocs/op
BenchmarkWAGTruncation/batchSize=4 1048576 7111 ns/op 620 B/op 11 allocs/op
BenchmarkWAGTruncation/batchSize=8 1048576 5575 ns/op 548 B/op 10 allocs/op
BenchmarkWAGTruncation/batchSize=16 1048576 2861 ns/op 502 B/op 9 allocs/op
BenchmarkWAGTruncation/batchSize=32 1048576 3992 ns/op 491 B/op 9 allocs/op
BenchmarkWAGTruncation/batchSize=64 1048576 2105 ns/op 486 B/op 9 allocs/op

iskettaneh · 2026-04-20T20:37:11Z

+			err := truncator.truncateAppliedNodes(ctx)
+			b.StopTimer()
+			require.NoError(b, err)
+			require.Equal(b, truncator.truncIndex.Load(), uint64(b.N))


iskettaneh · 2026-04-20T20:37:20Z

TFTR!

/trunk merge

iskettaneh requested a review from pav-kv April 14, 2026 18:19

iskettaneh marked this pull request as ready for review April 15, 2026 14:08

iskettaneh requested review from a team as code owners April 15, 2026 14:08

iskettaneh requested a review from sumeerbhola April 15, 2026 14:08

pav-kv reviewed Apr 15, 2026

View reviewed changes

Comment thread pkg/kv/kvserver/kvstorage/wag_truncator.go Outdated

pav-kv reviewed Apr 15, 2026

View reviewed changes

Comment thread pkg/kv/kvserver/kvstorage/wag_truncator.go Outdated

pav-kv reviewed Apr 15, 2026

View reviewed changes

Comment thread pkg/kv/kvserver/kvstorage/wag_truncator.go Outdated

iskettaneh force-pushed the rse_truncate_7 branch from 4236da9 to 55a5cbd Compare April 17, 2026 20:19

iskettaneh requested review from a team and removed request for a team and sumeerbhola April 17, 2026 20:20

iskettaneh force-pushed the rse_truncate_7 branch from 55a5cbd to 5ca62af Compare April 17, 2026 20:22

iskettaneh commented Apr 17, 2026

View reviewed changes

Comment thread pkg/kv/kvserver/kvstorage/wag_truncator.go Outdated

Comment thread pkg/kv/kvserver/kvstorage/wag_truncator.go Outdated

Comment thread pkg/kv/kvserver/kvstorage/wag_truncator.go Outdated

iskettaneh requested a review from pav-kv April 17, 2026 20:24

pav-kv reviewed Apr 17, 2026

View reviewed changes

Comment thread pkg/kv/kvserver/kvstorage/wag_truncator.go Outdated

iskettaneh force-pushed the rse_truncate_7 branch from 5ca62af to cf21b21 Compare April 20, 2026 00:45

iskettaneh commented Apr 20, 2026

View reviewed changes

iskettaneh requested a review from pav-kv April 20, 2026 00:45

pav-kv approved these changes Apr 20, 2026

View reviewed changes

iskettaneh force-pushed the rse_truncate_7 branch from cf21b21 to 681bd67 Compare April 20, 2026 20:37

iskettaneh commented Apr 20, 2026

View reviewed changes

trunk-io Bot merged commit ba589e3 into cockroachdb:master Apr 20, 2026
31 checks passed

celeste-cockroachdb Bot added the target-release-26.3.0 label Apr 20, 2026

	{initIndex: 3, batchSize: 1, wantTruncated: true, wantRemaining: []uint64{7, 10, 11, 12}, wantTruncIndex: 3},
	{init: 3, batch: 8, wantTrunc: 3},

		// Write numNodes WAG nodes that are all eligible for
		// truncation.

Conversation

iskettaneh commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

trunk-io Bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cockroach-teamcity commented Apr 14, 2026

Uh oh!

blathers-crl Bot commented Apr 14, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

iskettaneh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pav-kv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

iskettaneh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pav-kv left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iskettaneh left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

iskettaneh commented Apr 14, 2026 •

edited

Loading

trunk-io Bot commented Apr 14, 2026 •

edited

Loading