Skip to content

Commit e99f26b

Browse files
roachtest/perturbation: lower ratioOfMax to 0.3 for restart
The restart perturbation runs the workload at 50% of cluster peak, calibrated for steady state. During recovery the target node must absorb raft catch-up, rebalancing snapshots, and re-acquired lease traffic on top of its share of the ongoing workload, which can push it past disk capacity into a Pebble write stall. The stall blocks every leaseholder that lives on the recovering store and produces a recovery-phase p5 throughput dip that trips the 1.25x impact gate. Drop ratioOfMax to 0.3 for restart only so the cluster runs with ~40% IO headroom; recovery work then fits within sustainable disk capacity. Fixes: #170849 Release note: None Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
1 parent 0f497cf commit e99f26b

1 file changed

Lines changed: 7 additions & 0 deletions

File tree

pkg/cmd/roachtest/tests/perturbation/restart_node.go

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,13 @@ func (r restart) setup() variations {
2727
r.cleanRestart = true
2828
v := setup(r, defaultThresholds())
2929

30+
// Run with extra IO headroom relative to the cluster-wide default. When the
31+
// target node returns it must absorb raft catch-up, rebalancing snapshots,
32+
// and re-acquired lease traffic concurrently; at the default 0.5 the
33+
// recovered store sits at the edge of its sustainable write rate and the
34+
// flush/compaction pipeline can fall behind into a Pebble write stall.
35+
v.ratioOfMax = 0.3
36+
3037
// TODO(baptist): Remove this setting once #120073 is fixed.
3138
v.clusterSettings["kv.lease.reject_on_leader_unknown.enabled"] = "true"
3239

0 commit comments

Comments
 (0)