Skip to content

Commit 8199fa4

Browse files
committed
address comments
1 parent 66b0087 commit 8199fa4

1 file changed

Lines changed: 5 additions & 6 deletions

File tree

docs/adr/ADR-023-sequencer-recovery.md

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -22,12 +22,12 @@ Considered but not chosen for this iteration:
2222

2323
> We will operate **1 active + 1 failover** sequencer at all times, regardless of control plane. Two implementation options are approved:
2424
25-
- **Design A — Rafted Conductor (CFT)**: A sidecar *conductor* runs next to each `ev-node`. Conductors form a **Raft** cluster to elect a single leader and **gate** sequencing so only the leader may produce blocks. For quorum while preserving 1‑active/1‑failover semantics, we will run **2 sequencer nodes + 1 conductor‑only witness** (no sequencer) as the third Raft voter.
25+
- **Design A — Rafted Conductor (CFT)**: A sidecar *conductor* runs next to each `ev-node`. Conductors form a **Raft** cluster to elect a single leader and **gate** sequencing so only the leader may produce blocks. For quorum while running 1‑active/2‑failover semantics, we will run **1 sequencer nodes + 2 failover** (no sequencer) as the third Raft voter.
2626
*Note:* OP Stack uses a very similar pattern for its sequencer; see `op-conductor` in References.
2727

2828
- **Design B — 1‑Active / 1‑Failover (Lease/Lock)**: One hot standby promotes itself when the active fails by acquiring a **lease/lock** (e.g., Kubernetes Lease or external KV). Strong **fencing** ensures the old leader cannot keep producing after lease loss.
2929

30-
**Why both assume 1A/1F:** Even with Raft, we intentionally keep only **one** hot standby capable of immediate promotion; additional nodes may exist as **read‑only** or **witness** roles to strengthen quorum without enabling extra leaders.
30+
**Why both assume 1A/1F:** Even with Raft, we intentionally keep **n** nodes on hot standby capable of immediate promotion; additional nodes may exist as **read‑only** or **witness** roles to strengthen quorum without enabling extra leaders.
3131

3232
Status of this decision: **Proposed** for implementation and test hardening.
3333

@@ -70,7 +70,7 @@ These are additive and should not break existing RPCs.
7070

7171
### Logging/Monitoring/Observability
7272
- Metrics: `leader_id`, `raft_term` (A), `lease_owner` (B), `unsafe_head_advance`, `peer_count`, `rpc_error_rate`, `da_publish_latency`, `backlog`.
73-
- Alerts: no unsafe advance > 3× block time; unexpected leader churn; lease lost but sequencer still active (fencing breach); witness down (A).
73+
- Alerts: no unsafe advance > 3× block time; unexpected leader churn; lease lost but sequencer still active (fencing breach).
7474
- Logs: audit all **Start/Stop** decisions and override operations.
7575

7676
### Security considerations
@@ -90,13 +90,12 @@ These are additive and should not break existing RPCs.
9090

9191
### Change breakdown
9292
- Phase 1: Implement Admin RPC + health surface in `ev-node`; add sidecar skeletons.
93-
- Phase 2: Integrate Design A (Raft) in a 2 sequencer + 1 witness topology; build dashboards/runbooks.
93+
- Phase 2: Integrate Design A (Raft) in a 1 sequencer + 2 failover; build dashboards/runbooks.
9494
- Phase 3: Add Design B (Lease) profile for small/test clusters; share common health logic.
9595
- Phase 4: Game days and SLO validation; finalize SRE playbooks.
9696

9797
### Release/compatibility
9898
- **Breaking release?** No — Admin RPCs are additive.
99-
- **Coordination with LazyLedger fork / lazyledger-app?** Not required; DA posting interfaces are unchanged.
10099

101100
## Status
102101

@@ -110,7 +109,7 @@ Proposed
110109
- Choice of control plane allows right‑sizing ops: Raft for prod; Lease for small/test.
111110

112111
### Negative
113-
- Design A adds Raft operational overhead (quorum management, snapshots, witness requirement).
112+
- Design A adds Raft operational overhead (quorum management, snapshots).
114113
- Design B has a smaller blast radius but does not generalize to N replicas; stricter reliance on correct fencing.
115114
- Additional components (sidecars, proxies) increase deployment surface.
116115

0 commit comments

Comments
 (0)