group/tx/stm: limit local snapshot to max_removable_local_log_offset#30071
group/tx/stm: limit local snapshot to max_removable_local_log_offset#30071bharathv merged 2 commits intoredpanda-data:devfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes a compaction/snapshot interaction in group_tx_tracker_stm that could persist “open transaction” state into a local snapshot and later lose the corresponding commit marker to compaction, permanently pinning max_removable_local_log_offset after restart.
Changes:
- Modify
group_tx_tracker_stm::take_local_snapshot()to snapshot atmax_removable_local_log_offset()and omit transaction state. - Bump
supported_local_snapshot_versionto invalidate existing on-disk snapshots on upgrade. - Add a new regression test covering snapshot/compaction/restart behavior, and update test BUILD deps/timeout.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/v/kafka/server/tests/group_tx_compaction_test.cc |
Adds a regression scenario around local snapshots, compaction, and restart recovery for consumer offsets partitions. |
src/v/kafka/server/tests/BUILD |
Updates the group_tx_compaction_test target timeout and adds a dependency for ssx::semaphore_units. |
src/v/kafka/server/group_tx_tracker_stm.h |
Bumps the supported local snapshot version. |
src/v/kafka/server/group_tx_tracker_stm.cc |
Changes local snapshot contents/offset to avoid persisting open transaction state. |
Comments suppressed due to low confidence (2)
src/v/kafka/server/group_tx_tracker_stm.h:146
- After bumping
supported_local_snapshot_versionto 2, snapshots with header version 1 will be rejected inapply_local_snapshot(), so the comment/branch describing reconstruction of a “legacy snapshot from version 1” is now effectively unreachable. Consider updating/removing the legacy wording/branch to avoid confusion for future maintainers.
private:
static constexpr int8_t supported_local_snapshot_version = 2;
struct snapshot
: serde::envelope<snapshot, serde::version<2>, serde::compat_version<0>> {
all_txs_t transactions;
// legacy for version 1 RP to decode version 2+ snapshots
chunked_vector<kafka::group_id> blocked_groups;
src/v/kafka/server/tests/BUILD:742
- The target’s timeout is reduced from "moderate" to "short" even though this test file includes long-running compaction loops with
RPTEST_REQUIRE_EVENTUALLY_CORO(30s, ...)(and the new restart/compaction scenario adds more work). This is likely to cause spurious CI timeouts/flakes; consider keeping this as "moderate" (or reducing the test’s runtime/upper bounds if "short" is required).
redpanda_cc_gtest(
name = "group_tx_compaction_test",
timeout = "short",
srcs = [
"group_tx_compaction_test.cc",
],
cpu = 1,
tx bugs and compaction, what a duo |
:D |
The group_tx_tracker_stm snapshot could capture open transaction state (begin_offsets/producer_states) at an offset where the corresponding commit batch had not yet been written. If compaction later removed that commit batch — which is allowed once max_removable_offset advances past it — the open transaction could never be resolved on restart, permanently blocking max_removable_offset and preventing further compaction. The sequence: 1. Snapshot taken while a tx is open (fence at F, snapshot offset >= F) 2. Tx commits at offset C, max_removable advances past C 3. Compaction removes the commit batch at C 4. On restart, snapshot loads stale open tx at F, replay cannot find the commit -> max_removable stuck at prev(F) forever Fix: snapshot at max_removable_local_log_offset with an empty transactions map. Since this STM's sole purpose is tracking open transactions for max_removable_local_log_offset, and closed transactions leave no state, all meaningful state can be reconstructed from log replay. Open transactions are re-discovered from fence batches in the log, which are guaranteed to be present since compaction is bounded by max_removable while the STM is live. Also adds a regression test that reproduces the scenario by taking a snapshot during an open tx, committing, compacting, re-persisting the stale snapshot, and restarting. To fix existing setups that have stale snapshots, this commit also bumps supported_local_snapshot_version, this invalidates saved snapshots upon upgrade and applies everything from log and reconstructs the correct snapshots the next time with the newer logic.
Retry command for Build#82752please wait until all jobs are finished before running the slash command |
Retry command for Build#82755please wait until all jobs are finished before running the slash command |
|
/ci-repeat 3 |
There was a problem hiding this comment.
I'm wondering why this situation exists if we are already actively clamping the max tx removal offset by the last snapshotted offset here:
redpanda/src/v/storage/log_manager.cc
Lines 790 to 795 in efbcf9a
Answered, this only considers rm_stm, not group_tx_tracker_stm! We unconditionally remove group_commit_tx batches here:
redpanda/src/v/compaction/utils.cc
Lines 33 to 51 in efbcf9a
🤯
Also, is it about time we drop some tx w/ compaction tests in antithesis to try to shake the tree as hard as possible for any remaining bugs?
WillemKauf
left a comment
There was a problem hiding this comment.
LGTM for reasons we discussed in Slack
yes good idea. |
|
/backport v26.1.x |
|
/backport v25.3.x |
|
/backport v25.2.x |
|
Failed to create a backport PR to v25.2.x branch. I tried: |
|
Failed to create a backport PR to v25.3.x branch. I tried: |
The group_tx_tracker_stm snapshot could capture open transaction state
(begin_offsets/producer_states) at an offset where the corresponding
commit batch had not yet been written. If compaction later removed that
commit batch, which is allowed once max_removable_offset advances past
it, the open transaction could never be resolved on restart, permanently
blocking max_removable_offset and preventing further compaction.
The sequence:
the commit -> max_removable stuck at prev(F) forever
Fix: snapshot at max_removable_local_log_offset with an empty
transactions map. Since this STM's sole purpose is tracking open
transactions for max_removable_local_log_offset, and closed transactions
leave no state, all meaningful state can be reconstructed from log
replay. Open transactions are re-discovered from fence batches in the
log, which are guaranteed to be present since compaction is bounded by
max_removable while the STM is live.
Also adds a regression test that reproduces the scenario by taking a
snapshot during an open tx, committing, compacting, re-persisting the
stale snapshot, and restarting.
To fix existing setups that have stale snapshots, this commit also bumps
supported_local_snapshot_version, this invalidates saved snapshots upon
upgrade and applies everything from log and reconstructs the correct
snapshots the next time with the newer logic.
Note: the issue is very rare/hard to reproduce because it requires the broker
to quickly restart between steps 3/4 and the stm doesn't recompute the
local snapshot and the window is very tiny but is possible in a rolling upgrade
test with a tight compaction interval.
Backports Required
Release Notes
Bug Fixes