WIP: The great config bootstrap fix#30396
Open
WillemKauf wants to merge 14 commits intoredpanda-data:devfrom
Open
WIP: The great config bootstrap fix#30396WillemKauf wants to merge 14 commits intoredpanda-data:devfrom
WillemKauf wants to merge 14 commits intoredpanda-data:devfrom
Conversation
4a97a52 to
4755fa7
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR reworks Redpanda’s startup/bootstrap sequencing to prevent early reads of stale cluster configuration during node join/restart. It introduces a “config readiness” gate for config::shard_local_cfg(), splits kvstore recovery from writer startup, and adds an RPC path to fetch a leader-authoritative controller snapshot for restarting nodes.
Changes:
- Add a readiness check to
config::shard_local_cfg()plus an*_unsafe()escape hatch for early bootstrap. - Split
kvstorerecovery into an explicit phase, and restructure application bootstrap to recover/apply snapshots before marking config “ready”. - Add a
fetch_controller_snapshotRPC and controller snapshot plumbing to support leader-authoritative bootstrap for restarting nodes.
Reviewed changes
Copilot reviewed 29 out of 29 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/v/storage/storage_resources.h | Extend ctor signature and add API to update append chunk size during bootstrap. |
| src/v/storage/storage_resources.cc | Plumb append chunk size into storage_resources initialization and add updater method. |
| src/v/storage/kvstore.h | Make recover() a public entry point and add start(recover_t) mode switch. |
| src/v/storage/kvstore.cc | Implement split recovery/start flow; adjust read APIs to require recovery rather than full start. |
| src/v/storage/api.h | Construct kvstore earlier and add recover_kvstore() bootstrap hook before start(). |
| src/v/redpanda/BUILD | Add Bazel deps needed by updated application/bootstrap compilation units. |
| src/v/redpanda/application.h | Reshape bootstrap APIs into async helpers for applying snapshots/config and discovery. |
| src/v/redpanda/application.cc | Move cluster-config hydration earlier and ensure feature-table metrics are set up during metrics init. |
| src/v/redpanda/application_start.cc | Use stored cluster_discovery instance and remove passing discovery by reference into runtime start. |
| src/v/redpanda/application_config.cc | Use shard_local_cfg_unsafe() for bootstrap-time kvstore config extraction. |
| src/v/redpanda/application_bootstrap.cc | Rework bootstrap sequence: recover kvstore, apply snapshots, discovery/join, then mark config ready. |
| src/v/features/feature_table.h | Add setup_metrics() API. |
| src/v/features/feature_table.cc | Defer probe/metrics setup from ctor to explicit setup_metrics(). |
| src/v/config/node_config.cc | Use shard_local_cfg_unsafe() during node-config YAML loading/ignores. |
| src/v/config/configuration.h | Add readiness flag + document shard_local_cfg() vs shard_local_cfg_unsafe(). |
| src/v/config/configuration.cc | Implement shard_local_cfg_unsafe() and guard shard_local_cfg() with readiness vassert. |
| src/v/cluster/types.h | Add serde RPC request/reply types for fetching a controller snapshot. |
| src/v/cluster/service.h | Add controller service RPC endpoint for fetch_controller_snapshot. |
| src/v/cluster/service.cc | Implement RPC handler to delegate snapshot fetch logic to members manager. |
| src/v/cluster/members_manager.h | Declare handler to produce/forward leader-authoritative controller snapshot replies. |
| src/v/cluster/members_manager.cc | Implement snapshot fetch path (forward to leader if needed; leader builds partial join snapshot). |
| src/v/cluster/controller.json | Register new fetch_controller_snapshot RPC in controller protocol schema. |
| src/v/cluster/controller.cc | Adjust controller start wiring after config_manager ctor signature change. |
| src/v/cluster/controller_stm.h | Generalize join snapshot construction via templated backend selection. |
| src/v/cluster/controller_stm.cc | Reorder snapshot application so config manager applies earlier; remove old join snapshot impl. |
| src/v/cluster/config_manager.h | Remove cluster recovery table dependency from config manager constructor/state. |
| src/v/cluster/config_manager.cc | Switch more bootstrap-time reads to shard_local_cfg_unsafe() and adjust replay/bootstrap flow. |
| src/v/cluster/cluster_discovery.h | Update discovery ctor signature and add controller snapshot fetch API (with stale-comment mismatch). |
| src/v/cluster/cluster_discovery.cc | Implement new controller snapshot fetch RPC path and decouple discovery from storage::api. |
Comments suppressed due to low confidence (1)
src/v/cluster/config_manager.cc:193
config_manager::apply_local()still stages updates toneeds_restartproperties viaset_pending_value(...), butconfig_manager::start()no longer promotes pending values after STM replay. This means nodes may never applyneeds_restartconfig updates even after a restart (pending values remain invisible indefinitely). Reintroduce a promotion point after replay (as before), or otherwise ensure pending values are promoted at least once during startup after replay completes.
ss::future<> config_manager::start() {
if (_seen_version == config_version_unset) {
vlog(clusterlog.trace, "Starting config_manager... (initial)");
// Background: wait till we have a leader. If this node is leader
Comment on lines
+104
to
+108
| // Flushing background fiber | ||
| ssx::spawn_with_gate(_gate, [this] { | ||
| return ss::do_until( | ||
| [this] { return _gate.is_closed(); }, | ||
| [this] { |
Contributor
Author
There was a problem hiding this comment.
Meh probably unnecessary
| // Accessors for the shard local cluster configuration. | ||
| // shard_local_cfg() contains a vassert which strictly enforces that the | ||
| // configuration has been marked as ready by the bootstrap process after we have | ||
| // successfully preloaded from our node local state, and furthermore, recieved |
Comment on lines
+142
to
+144
| // Static because the call doesn't need any cluster_discovery | ||
| // instance state — it dispatches via a one-shot RPC client and | ||
| // reads the seed-server list from global node config. |
72cd608 to
0140ac7
Compare
This comment was marked as outdated.
This comment was marked as outdated.
2615bf0 to
5b59c01
Compare
Pull config_manager::apply_snapshot out of the ss::when_all parallel block and run it serially after feature_backend and before members_manager. Downstream backends (members, topics, plugin, recovery, security, quota, migrations, cluster_link) may consult shard_local_cfg during their own apply_snapshot work; sequencing config_manager first guarantees they see fresh values.
Removes the promote_all_pending() call from config_manager::start(). Pending values that accumulated during the post-bootstrap controller log replay (i.e. cluster_config_delta_cmd records applied above the hydrated snapshot offset) now stay pending instead of being promoted when config_manager::start runs. This is the strict reading of needs_restart=yes semantics: a property change visible to a node only after it restarts. shard_local_cfg() is already hydrated from the controller snapshot before bootstrap proceeds (see application::bootstrap_cluster_config_view), so any delta replayed afterward is a change the cluster made *after* the hydration point and should not retroactively affect the running node's active values for needs_restart properties.
We will want to be able to use the `kvstore` for read only purposes early in the bootstrap process to recover important persisted state. Expose `kvstore::recover()` as a way to allow read-only users access to its state. Also expose an accessor higher up in `storage::api`.
The `kvstore` should be available after constructor is called for ease of access.
…covery` We were only using this in order to access the `node_uuid` and `cluster_uuid`, which are constant after they are first set. Simply pass these values directly instead of passing the entire `storage::api` into the constructor.
We are going to want control over what backends we need to include in a `controller_stm` snapshot in a future commit. Move this function to the header and use a variadic template for this purpose.
A new RPC to fetch a controller snapshot, containing the latest in-memory state of the controller leader's `config_manager` and `feature_backend`. This RPC will be called unconditionally during bootstrap of a `redpanda` broker, whether that node is simply restarting or is joining for the first time. The reason for this being that a local node who is down for a long time will have only stale configurations and features in its local cache/snapshots/log, and the configuration and feature managers are two global pieces of state that should be brought to a consistent view early on in a `redpanda` node's lifetime.
895f588 to
3ec0835
Compare
The start-up process for `redpanda` now looks like this: 1. Cluster configs are hydrated from the local `config_cache` and legacy `node_cfg_yaml`. 2. `storage::api` is constructed (but not started), and the local `kvstore` is recovered. 3. We bootstrap from the `kvstore`: 1. Apply the local feature table snapshot (potentially stale) from the `kvstore`. 2. We attempt to load any existing cluster/node UUIDs from the `kvstore`. 4. We perform cluster discovery - if we are a joiner, we will perform our RPC to register with the cluster, and collect the `controller_join_snapshot` this way. If we are simply a restarter, this is mostly a no-op. 5. We bootstrap our view of the controller - if we were a joiner, we already have our required snapshot. If we are a restarter, we utilize the new RPC to fetch the controller snapshot from the leader. Unlike the first time registration RPC, we do permit this to fail, and will operate with stale configs/feature tables during bootstrap. If we have a controller snapshot (obtained in either aforementioned way), it is applied to the feature table, and the configuration is preloaded. 6. We now have a consistent view with the rest of the cluster of the feature table and cluster configuration for the rest of the bootstrapping process. raft0 replay still happens _after_ the rest of the system is started. A big caveat with cluster recovery - `needs_restart` properties fetched can no longer be applied until the recovered node has restarted. There is no clear way to make this _not_ the behavior.
And swap any uses of `shard_local_cfg()` which occur in the bootstrap
process (before it is marked as `ready`) over to the unsafe accessor.
These uses of potentially stale values are all audited and deemed safe,
due to minimal or no impact on the behavior of the system post bootstrap.
Two other relevant changes in this commit:
* Pulling out `feature_table::setup_metrics()` into its own function,
called after the bootstrap process (in order to properly respect
`shard_local_cfg().disable_{public}_metrics`)
* Updating `storage_resources` to _not_ invoke the `internal::chunks()`
singleton constructor, which would in turn invoke `shard_local_cfg()`.
We use `storage_resources for an early replay of the `kvstore` segments,
so it is fine to use stale configuration values here. However, after
bootstrapping is complete, it is *probably* important to set the
`_append_chunk_size` back to the same value as what `internal::chunks()`
is operating with. We now do so when initializing the storage system in
`wire_up_bootstrap_services()`.
3ec0835 to
2058bd9
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No tests yet, opening for CI.
Backports Required
Release Notes
Bug Fixes
redpandanodes during the bootstrapping process.