Skip to content

raftengine: add phase 2 snapshot persistence#467

Merged
bootjp merged 27 commits intomainfrom
codex/replace-raft-phase1
Apr 9, 2026
Merged

raftengine: add phase 2 snapshot persistence#467
bootjp merged 27 commits intomainfrom
codex/replace-raft-phase1

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented Apr 6, 2026

No description provided.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the etcd raft engine's persistence layer by splitting the monolithic state file into separate metadata, entries, and snapshot files. It introduces Snapshot and Restore capabilities to the StateMachine and includes logic for migrating legacy state files to the new format. The review identifies a critical bug in the manual slice growth implementation for entries, which can cause runtime panics if the entry count exceeds the capacity cap. It is recommended to replace this logic with Go's native append for safer incremental allocation.

Comment thread internal/raftengine/etcd/persistence.go Outdated
Comment thread internal/raftengine/etcd/persistence.go Outdated
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 6, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a multi-node etcd/raft backend implementation, replacing the previous single-node prototype. Key changes include the addition of a gRPC-based transport layer, a new migration tool for FSM stores, and a refactored persistence layer that splits state into metadata, entries, and snapshot data files. My feedback highlights concerns regarding the transport handler initialization sequence, the need for atomic file replacement best practices, and interface design improvements for snapshot handling.

I am having trouble creating individual review comments. Click here to see my feedback.

internal/raftengine/etcd/engine.go (192-194)

medium

The transport handler is set after the engine is initialized but before it is started. If a message is received via the transport before the engine is fully initialized or the run loop is active, it might lead to unexpected behavior. Consider ensuring the transport handler is only active when the engine is ready to process messages. Additionally, explicitly document this initialization sequence and its limitations to prevent misuse.

References
  1. For prototype or limited-scope implementations, explicitly document the limitations and add runtime validation to prevent misuse in a general context.

internal/raftengine/etcd/persistence.go (432-466)

medium

The replaceFile function uses os.Rename to perform an atomic update. While this is generally correct, on some filesystems, os.Rename might not be atomic if the source and destination are on different partitions. Ensure that the temporary file and the destination file are on the same filesystem. Also, consider centralizing this atomic file replacement logic into a shared helper function to avoid duplication and improve maintainability.

References
  1. Centralize common complex logic, such as retry/backoff mechanisms or atomic file operations, into shared helper functions to avoid code duplication and improve maintainability.

internal/raftengine/etcd/wal_store.go (214-228)

medium

The stateMachineSnapshotBytes function uses a deferred function to close the snapshot. If this is an input stream, consider using io.Reader instead of io.ReadCloser to keep the interface narrow and leave closure responsibility to the caller. If the function must close it, handle the error from snapshot.Close() explicitly to ensure it does not shadow more relevant errors from WriteTo.

References
  1. When designing interfaces, use io.Reader instead of io.ReadCloser for input streams where the responsibility for closing the underlying resource should lie with the component that opened it.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 7, 2026

Addressed the latest review summary in 571a26ff.

  • Transport startup: inbound gRPC delivery is now gated on startedCh via handleTransportMessage, so the transport can be registered before Open returns without enqueueing messages until local startup completes. I also added a regression test for that sequencing.
  • Atomic replace: replaceFile now uses os.CreateTemp in the destination directory, and the code comment makes the same-filesystem/atomic-rename requirement explicit.
  • Snapshot handling: I kept the owned snapshot handle interface narrow for this package (WriteTo + Close), but changed stateMachineSnapshotBytes to close explicitly so WriteTo and Close errors are combined in a predictable order instead of relying on the deferred helper.

Validation: go test ./internal/raftengine/etcd and golangci-lint run ./internal/raftengine/etcd --timeout=5m both pass.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 7, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request transitions the etcd raft engine from a single-node prototype to a multi-node implementation, introducing gRPC transport, WAL-based persistence, and snapshotting. Feedback identifies critical issues where synchronous message dispatch blocks the main Raft loop and snapshots are triggered too frequently. High-severity concerns were raised regarding potential memory exhaustion during snapshot buffering. Additionally, a logic error in the startup applied index initialization was found, and a more idiomatic implementation for peer counting was suggested.

Comment thread internal/raftengine/etcd/engine.go
Comment thread internal/raftengine/etcd/engine.go
Comment thread internal/raftengine/etcd/wal_store.go Outdated
Comment thread internal/raftengine/etcd/grpc_transport.go Outdated
Comment thread internal/raftengine/etcd/engine.go Outdated
Comment thread internal/raftengine/etcd/engine.go
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 7, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a robust etcd-based raft backend, including a new gRPC transport layer, snapshot management, and migration utilities for FSM stores. The implementation includes significant improvements to persistence, such as atomic file replacement and versioned file headers. I have identified two areas for improvement: the file closing logic in the persistence layer is slightly redundant and can be simplified, and the snapshot validation check in the engine can be streamlined by leveraging existing etcd raft helper functions.

Comment thread internal/raftengine/etcd/persistence.go Outdated
Comment thread internal/raftengine/etcd/engine.go
bootjp and others added 2 commits April 8, 2026 02:16
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 7, 2026

/gemini review

@bootjp bootjp closed this Apr 7, 2026
@bootjp bootjp reopened this Apr 7, 2026
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 7, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new etcd-based raft engine backend, including a migration tool for FSM stores, a gRPC-based transport layer, and updated persistence logic. The changes include adding necessary dependencies in go.mod, implementing snapshot spooling, and updating the raft engine to support multi-node configurations. My feedback focuses on improving error handling during snapshot cleanup to ensure all potential errors are properly captured and propagated.

Comment thread internal/raftengine/etcd/wal_store.go Outdated
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new etcd-based Raft engine backend, including support for gRPC-based transport, snapshot management, and migration utilities from existing FSM stores. The implementation adds robust persistence logic and multi-node communication capabilities. My review highlights several critical performance and reliability concerns: synchronous snapshot persistence in the main Raft loop can cause election timeouts, the use of grpc.WaitForReady(true) in a shared worker pool risks head-of-line blocking, and the current message cloning implementation is inefficient. Additionally, I have suggested improvements for error handling in transport dispatch and provided a security recommendation regarding memory allocation in the snapshot spooling logic.

Comment thread internal/raftengine/etcd/engine.go
Comment thread internal/raftengine/etcd/grpc_transport.go Outdated
Comment thread internal/raftengine/etcd/engine.go Outdated
Comment thread internal/raftengine/etcd/engine.go Outdated
Comment thread internal/raftengine/etcd/engine.go Outdated
Comment thread internal/raftengine/etcd/snapshot_spool.go
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 8, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements multi-node support and persistent storage for the etcd-based raft engine, introducing gRPC transport for replication and a migration tool for FSM stores. The review feedback focuses on enhancing the handling of large snapshots, specifically by addressing potential gRPC dispatch timeouts, mitigating OOM risks during migration through file-backed spooling, and optimizing memory by eliminating redundant data copies. Additionally, it is suggested to relocate temporary snapshot files to the data directory to prevent disk space issues on system temporary partitions.

Comment thread internal/raftengine/etcd/grpc_transport.go Outdated
Comment thread internal/raftengine/etcd/migrate.go Outdated
Comment thread internal/raftengine/etcd/grpc_transport.go Outdated
Comment thread internal/raftengine/etcd/snapshot_spool.go Outdated
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 8, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new etcd-based raft engine implementation, including gRPC transport, snapshot persistence, and migration utilities. My review identified several critical and high-severity issues: clearing snapshot data in MemoryStorage will break cluster recovery, the snapshot spooling mechanism risks OOM errors by loading large payloads into memory, and blocking operations like fsm.Restore and fsm.Snapshot within the main raft loop will cause node instability and performance degradation. Additionally, the deep-copying of raft messages in the critical path is inefficient and should be optimized.

Comment thread internal/raftengine/etcd/engine.go Outdated
Comment thread internal/raftengine/etcd/snapshot_spool.go
Comment thread internal/raftengine/etcd/engine.go
Comment thread internal/raftengine/etcd/engine.go
Comment thread internal/raftengine/etcd/engine.go
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a robust etcd-based raft engine backend, including support for snapshot persistence, gRPC-based transport, and a migration tool for existing FSM stores. The implementation includes significant improvements to the persistence layer, such as atomic file replacement and split metadata/entry storage. My feedback highlights a redundant channel capacity check in the dispatch logic and a potential issue in the deferred cleanup logic within the file replacement utility, where the temporary file might be incorrectly targeted for removal after a successful rename.

Comment thread internal/raftengine/etcd/engine.go
Comment thread internal/raftengine/etcd/persistence.go
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 9, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant refactor of the etcd raft engine to support multi-node clusters, robust persistence using etcd's WAL and snapshotting mechanisms, and gRPC-based transport. Key changes include new interfaces for Snapshot and StateMachine to handle FSM snapshots and restores, and a migration tool to convert existing FSM stores to the new etcd raft data directory format. The persistence layer has been split into separate files for metadata, entries, and snapshot data, with support for migrating from a legacy single-file state. A new gRPC transport is implemented for inter-node communication, including streaming large snapshots. The changes also include comprehensive testing for these new features. Feedback indicates a potential issue with reduced snapshot size limits during legacy migrations, which might break compatibility, and suggests using a more specific error message for duplicate snapshot metadata in the gRPC transport.

Comment thread internal/raftengine/etcd/persistence.go Outdated
Comment thread internal/raftengine/etcd/grpc_transport.go Outdated
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 9, 2026

/gemini review

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 9, 2026

Addressed the latest Gemini review in 6ca0605d.

  • Open now creates and retains the dispatch worker cancel function in the caller-owned setup path, which satisfies gosec without changing runtime behavior.
  • Legacy single-file state loading now uses a dedicated maxPersistedLegacySnapshot budget so older FSM snapshot payloads keep their historical 256 MiB compatibility envelope.
  • Snapshot stream parsing now returns a dedicated duplicate-metadata error instead of reusing the missing-metadata error.

Re-verified with go test ./internal/raftengine/etcd, go test -race ./internal/raftengine/etcd, and golangci-lint run ./internal/raftengine/etcd --timeout=5m.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new etcd-based Raft engine, replacing the previous single-node prototype with a multi-node, production-ready implementation. Key changes include a new etcd-raft-migrate tool for transitioning existing FSM stores, extensive refactoring of the etcd raft engine to support multi-node clusters with gRPC-based transport and snapshot streaming, and the introduction of dedicated worker goroutines for message dispatch and snapshot persistence. The persistence mechanisms have been enhanced with a split file format for metadata, entries, and snapshot data, including backward compatibility for migrating legacy state files. The StateMachine interface now includes Snapshot and Restore methods for FSM state management, and dependencies have been updated. Review comments highlight potential performance issues with synchronous disk I/O in the main Raft loop during snapshot preparation, suggesting offloading this work. Additionally, the use of a file-backed spooler for temporary snapshot storage in snapshotBytes is noted as inefficient, with a bytes.Buffer recommended for better performance. Finally, a backward compatibility concern is raised regarding a reduced snapshot size limit in the legacy state file loader, which could cause migration failures for larger existing installations.

Comment thread internal/raftengine/etcd/engine.go
Comment thread internal/raftengine/etcd/wal_store.go
Comment thread internal/raftengine/etcd/persistence.go
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 9, 2026

Followed up on the latest review in d66b6182.

  • prepareDispatchRequest no longer creates a temp spool file or writes snapshot bytes on the raft loop. The loop now only deep-copies the outbound raftpb.Message, and the dispatch workers send the already-materialized snapshot payload through the transport. That removes the synchronous disk I/O from the main raft critical path.
  • I kept the pre-check before enqueueing outbound dispatch. In Go, the send expression in a select case is evaluated before the case is chosen, so collapsing it to case e.dispatchCh <- prepareDispatchRequest(msg) would still clone large payloads when the queue is already saturated.
  • I also kept the file-backed spool in snapshotBytes. This helper still has to return a final []byte because the current raftpb / MemoryStorage APIs are byte-slice based, but the spool avoids growing a second large heap buffer while Snapshot.WriteTo is serializing the FSM state.
  • The legacy single-file snapshot budget, duplicate snapshot metadata error, and dispatchCancel lifecycle issues were already fixed in 6ca0605d and are unchanged here.

Re-verified with go test ./internal/raftengine/etcd, go test -race ./internal/raftengine/etcd, and golangci-lint run ./internal/raftengine/etcd --timeout=5m.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 9, 2026

/gemini review

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 9, 2026

Follow-up on the latest reviewdog thread: I did not make another code change here because the current head d66b6182 does not reproduce the G118 warning anymore. Re-running golangci-lint run ./internal/raftengine/etcd --timeout=5m on that commit returns 0 issues. The dispatchCancel created in Open is still cleaned up on startup failure via the deferred engine.dispatchCancel() path and on normal shutdown via stopDispatchWorkers(). This looks like a stale analyzer result rather than a live issue.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

Gemini is experiencing higher than usual traffic and was unable to create the review. Please try again in a few hours by commenting /gemini review.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 9, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new etcd-based Raft backend implementation for the engine, including a gRPC-based transport layer for multi-node communication and snapshot management. The changes involve significant updates to the persistence layer, snapshot handling, and the introduction of a migration tool to transition from the previous FSM store. My review highlights several critical areas for improvement, particularly regarding error handling in snapshot streaming, potential OOM risks with large snapshot payloads, and performance concerns related to blocking operations within the main Raft loop.

Comment thread internal/raftengine/etcd/grpc_transport.go
Comment thread internal/raftengine/etcd/grpc_transport.go
Comment thread internal/raftengine/etcd/engine.go
Comment thread internal/raftengine/etcd/engine.go
Comment thread internal/raftengine/etcd/engine.go
Comment thread internal/raftengine/etcd/grpc_transport.go Outdated
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 9, 2026

Followed up on the latest review in 2625b846.

  • enqueueStep now fails fast with errStepQueueFull instead of blocking indefinitely when the inbound step queue is saturated, and engine_test.go covers that overload path.
  • GRPCTransport.clientFor now deduplicates concurrent client creation through a per-address singleflight.Group, and grpc_transport_test.go verifies that concurrent callers only invoke the dial path once.
  • I did not change the io.ErrUnexpectedEOF handling in readSnapshotChunk. In this streaming path it is the normal signal for the final short chunk from io.ReadFull, not corruption. I added an explicit code comment there because treating it as a hard error would truncate every snapshot whose size is not an exact multiple of the chunk size.
  • I also kept the final snapshot materialization in buildSnapshotMessage and the synchronous fsm.Restore in applyReadySnapshot. Those are still constrained by the current raftpb.Message / RawNode.Step API: the raft layer requires an in-memory []byte snapshot payload, and the local FSM must match the applied raft snapshot before Ready can safely advance.
  • snapshotMu still spans the full local snapshot publish path intentionally. The new comment in persistSnapshot makes that explicit: a follower restore must not race an older locally-captured snapshot being published into raft storage and persisted metadata.

Re-verified with go test ./internal/raftengine/etcd, go test -race ./internal/raftengine/etcd, and golangci-lint run ./internal/raftengine/etcd --timeout=5m.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 9, 2026

/gemini review

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

Gemini is experiencing higher than usual traffic and was unable to create the review. Please try again in a few hours by commenting /gemini review.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 9, 2026

/gemini review

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

Gemini is experiencing higher than usual traffic and was unable to create the review. Please try again in a few hours by commenting /gemini review.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 9, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new etcd-based raft engine backend, including support for gRPC transport, snapshot persistence, and migration tools. The implementation adds necessary protobuf definitions and updates the persistence layer to support atomic file operations. My feedback focuses on simplifying the cleanup logic in the engine's Open function and ensuring that performance implications of synchronous file synchronization are considered.

Comment thread internal/raftengine/etcd/engine.go Outdated
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 9, 2026

Followed up on the latest review in c1a8031b.

  • Open now keeps the startup-failure cleanup on the local dispatchCancel variable instead of reaching back through engine.dispatchCancel from the defer block.
  • That makes the ownership of the context.WithCancel result more direct for both readers and analyzers while preserving the same runtime behavior: startup failure still cancels immediately, and steady-state shutdown still goes through stopDispatchWorkers().

Re-verified with go test ./internal/raftengine/etcd and golangci-lint run ./internal/raftengine/etcd --timeout=5m.

@bootjp bootjp enabled auto-merge April 9, 2026 15:39
@bootjp bootjp merged commit 22e1d85 into main Apr 9, 2026
7 checks passed
@bootjp bootjp deleted the codex/replace-raft-phase1 branch April 9, 2026 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant