Skip to content

Commit 3c88176

Browse files
committed
feat(disk): handle disk maxing out better
1 parent 3cfa47a commit 3c88176

9 files changed

Lines changed: 186 additions & 6 deletions

File tree

CLAUDE.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -176,6 +176,8 @@ obol sell delete ollama-gated -n llm
176176

177177
Two-stage templating: `values.yaml.gotmpl` with `@enum/@default/@description` annotations → CLI flags → rendered `values.yaml` (Stage 1), then `helmfile sync --state-values-file values.yaml --state-values-set id=<id>` (Stage 2). Unique namespaces: `<network>-<id>` where ID is petname or `--id <name>`. Local Ethereum nodes auto-registered as priority upstream in eRPC via `RegisterERPCUpstream()` (write methods blocked on local → routed to remote).
178178

179+
**Ethereum `--mode full|archive`** (default `full`): controls whether reth runs as a pruned full node (~500 GB mainnet / ~100 GB testnet) or an archive node retaining all historical state (~4 TB+ mainnet / ~300 GB testnet). Archive mode is for state replay (block explorers, historical `eth_call`, indexers); full mode is the right default for everything else. The mode flows through to (a) the reth `--full` arg in `internal/embed/networks/ethereum/helmfile.yaml.gotmpl`, (b) PVC sizing in `templates/pvc.yaml`, and (c) the `helmfile` `persistence.size` request. `obol network install ethereum` runs a disk-space preflight via `internal/network/preflight.go` — it warns when `cfg.DataDir` has less free disk than `(network, mode)` is expected to need, prompts the user, and auto-continues in non-interactive mode (no TTY / JSON output) so scripted installs don't deadlock. Other execution clients (geth, nethermind, besu, erigon) ignore the mode flag for now.
180+
179181
## Stack Lifecycle
180182

181183
| Command | Action |

docs/getting-started.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -216,6 +216,25 @@ obol network sync ethereum/demo
216216

217217
This creates the `ethereum-demo` namespace with an execution client (reth) and a consensus client (lighthouse).
218218

219+
### Full vs archive mode
220+
221+
`obol network install ethereum` defaults to `--mode=full`, which prunes
222+
historical state and needs ~500 GB on mainnet (~100 GB on testnets). Pass
223+
`--mode=archive` if you need to replay state across history (block
224+
explorers, historical `eth_call`, indexers); archive nodes hold the full
225+
state trie and grow to ~4 TB+ on mainnet.
226+
227+
```bash
228+
# Default: pruned full node
229+
obol network install ethereum --network=mainnet
230+
231+
# Archive node for state replay (requires ~4-5 TB free)
232+
obol network install ethereum --network=mainnet --mode=archive
233+
```
234+
235+
The installer warns when the data directory has less free disk than the
236+
chosen mode is likely to need.
237+
219238
Verify:
220239

221240
```bash

internal/embed/infrastructure/values/erpc.yaml.gotmpl

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,10 @@ config: |-
2929
- id: memory-cache
3030
driver: memory
3131
memory:
32-
maxItems: 10000
32+
# LRU cap on cached RPC responses. The connector is memory-only
33+
# (no disk), so the upper bound is enforced by the pod memory
34+
# limit below; this cap keeps cache RAM well under that limit.
35+
maxItems: 5000
3336
policies:
3437
- network: "*"
3538
method: "*"

internal/embed/infrastructure/values/monitoring.yaml.gotmpl

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,12 @@ prometheus:
1111
matchLabels:
1212
release: monitoring
1313
podMonitorNamespaceSelector: {}
14-
retention: 8d
14+
# Time-based retention is still honored, but retentionSize is the hard
15+
# cap that prevents the TSDB from filling the k3d node's writable layer
16+
# overnight and triggering DiskPressure → cascading pod evictions.
17+
# 2GB on emptyDir is plenty for a local single-node stack.
18+
retention: 2d
19+
retentionSize: 2GB
1520
resources:
1621
requests:
1722
cpu: 100m

internal/embed/networks/ethereum/helmfile.yaml.gotmpl

Lines changed: 28 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,15 @@ releases:
2929
needs: [ethereum-pvcs]
3030
values:
3131
# Network and checkpoint sync configuration
32+
#
33+
# global.clientArgs.networks.<network>.execution.reth is the upstream
34+
# umbrella chart's per-network reth args list. We APPEND to it rather
35+
# than overriding reth.extraArgs directly, because reth.extraArgs in
36+
# the umbrella chart is a templated string that looks up this very
37+
# list — overriding it would silently drop the --chain=<testnet> arg
38+
# that testnets depend on. Mainnet's upstream list is empty; testnets
39+
# are ["--chain=<network>"]. We carry those over verbatim and add
40+
# --full only when mode != archive.
3241
- global:
3342
main:
3443
network: '{{ .Values.network }}'
@@ -38,8 +47,25 @@ releases:
3847
mainnet: https://mainnet-checkpoint-sync.attestant.io
3948
sepolia: https://checkpoint-sync.sepolia.ethpandaops.io
4049
hoodi: https://checkpoint-sync.hoodi.ethpandaops.io
50+
{{- if eq .Values.executionClient "reth" }}
51+
clientArgs:
52+
networks:
53+
{{ .Values.network }}:
54+
execution:
55+
reth:
56+
{{- if ne .Values.network "mainnet" }}
57+
- --chain={{ .Values.network }}
58+
{{- end }}
59+
{{- if ne .Values.mode "archive" }}
60+
- --full
61+
{{- end }}
62+
{{- end }}
4163

4264
# Execution client (pinned versions — Renovate-tracked)
65+
# Reth defaults to archive (~4TB+ mainnet). The --mode flag controls
66+
# whether we pass --full to prune historical state down to ~500GB,
67+
# wired through global.clientArgs above (NOT reth.extraArgs — see
68+
# comment above).
4369
- {{ .Values.executionClient }}:
4470
enabled: true
4571
image:
@@ -61,7 +87,7 @@ releases:
6187
{{- end }}
6288
persistence:
6389
enabled: true
64-
size: 500Gi
90+
size: {{ if eq .Values.network "mainnet" }}{{ if eq .Values.mode "archive" }}4500Gi{{ else }}500Gi{{ end }}{{ else }}{{ if eq .Values.mode "archive" }}300Gi{{ else }}100Gi{{ end }}{{ end }}
6591
existingClaim: execution-{{ .Values.executionClient }}-{{ .Values.network }}
6692

6793
# Consensus client (pinned versions — Renovate-tracked)
@@ -87,7 +113,7 @@ releases:
87113
{{- end }}
88114
persistence:
89115
enabled: true
90-
size: 200Gi
116+
size: {{ if eq .Values.network "mainnet" }}{{ if eq .Values.mode "archive" }}500Gi{{ else }}200Gi{{ end }}{{ else }}{{ if eq .Values.mode "archive" }}100Gi{{ else }}50Gi{{ end }}{{ end }}
91117
existingClaim: consensus-{{ .Values.consensusClient }}-{{ .Values.network }}
92118

93119
# Metadata ConfigMap for frontend discovery

internal/embed/networks/ethereum/templates/pvc.yaml

Lines changed: 30 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,32 @@
11
{{- if eq .Release.Name "ethereum-pvcs" }}
2+
{{- /*
3+
PVC sizing is a function of (network, mode). Sizes are estimates with
4+
~30% headroom for chain growth. local-path storage does not pre-allocate,
5+
so these requests primarily document intent and serve as soft caps when
6+
a sized storage class is swapped in later.
7+
*/ -}}
8+
{{- $mode := default "full" .Values.mode -}}
9+
{{- $isArchive := eq $mode "archive" -}}
10+
{{- $execSize := "500Gi" -}}
11+
{{- $consensusSize := "200Gi" -}}
12+
{{- if eq .Values.network "mainnet" -}}
13+
{{- if $isArchive -}}
14+
{{- $execSize = "4500Gi" -}}
15+
{{- $consensusSize = "500Gi" -}}
16+
{{- else -}}
17+
{{- $execSize = "500Gi" -}}
18+
{{- $consensusSize = "200Gi" -}}
19+
{{- end -}}
20+
{{- else -}}
21+
{{- /* sepolia, hoodi and other testnets */ -}}
22+
{{- if $isArchive -}}
23+
{{- $execSize = "300Gi" -}}
24+
{{- $consensusSize = "100Gi" -}}
25+
{{- else -}}
26+
{{- $execSize = "100Gi" -}}
27+
{{- $consensusSize = "50Gi" -}}
28+
{{- end -}}
29+
{{- end -}}
230
---
331
# Ethereum Execution Client PVC
432
apiVersion: v1
@@ -12,7 +40,7 @@ spec:
1240
storageClassName: local-path
1341
resources:
1442
requests:
15-
storage: 500Gi
43+
storage: {{ $execSize }}
1644
---
1745
# Ethereum Consensus Client PVC
1846
apiVersion: v1
@@ -26,5 +54,5 @@ spec:
2654
storageClassName: local-path
2755
resources:
2856
requests:
29-
storage: 200Gi
57+
storage: {{ $consensusSize }}
3058
{{- end }}

internal/embed/networks/ethereum/values.yaml.gotmpl

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,3 +15,8 @@ executionClient: {{.ExecutionClient}}
1515
# @default lighthouse
1616
# @description Consensus layer client
1717
consensusClient: {{.ConsensusClient}}
18+
19+
# @enum full,archive
20+
# @default full
21+
# @description Node mode. 'full' prunes historical state (~500GB mainnet); 'archive' keeps all state for history replay (~4TB+ mainnet)
22+
mode: {{.Mode}}

internal/network/network.go

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,20 @@ func Install(cfg *config.Config, u *ui.UI, network string, overrides map[string]
125125
templateData[field.Name] = value
126126
}
127127

128+
// Disk-space preflight (currently only meaningful for ethereum). The
129+
// check warns and prompts; in non-interactive mode (no TTY / JSON) it
130+
// auto-continues so scripted installs don't deadlock.
131+
if network == "ethereum" {
132+
netValue := templateData["Network"]
133+
modeValue := templateData["Mode"]
134+
if modeValue == "" {
135+
modeValue = "full"
136+
}
137+
if err := CheckNetworkDiskSpace(u, cfg.DataDir, netValue, modeValue); err != nil {
138+
return err
139+
}
140+
}
141+
128142
// Read the embedded values template
129143
valuesContent, err := embed.ReadEmbeddedNetworkFile(network, "values.yaml.gotmpl")
130144
if err != nil {

internal/network/preflight.go

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
package network
2+
3+
import (
4+
"fmt"
5+
"syscall"
6+
7+
"github.com/ObolNetwork/obol-stack/internal/ui"
8+
)
9+
10+
// diskSpaceRequirementGB returns the recommended free-disk minimum for
11+
// (network, mode) in gigabytes. Numbers include ~30% headroom for chain
12+
// growth between releases. Sizes are reth-anchored; other clients are in
13+
// the same ballpark.
14+
func diskSpaceRequirementGB(network, mode string) uint64 {
15+
archive := mode == "archive"
16+
switch network {
17+
case "mainnet":
18+
if archive {
19+
return 5000
20+
}
21+
return 700
22+
default:
23+
// sepolia, hoodi, and other testnets
24+
if archive {
25+
return 400
26+
}
27+
return 150
28+
}
29+
}
30+
31+
// freeDiskBytes returns the free disk bytes available at path. Used to
32+
// check whether a network install has room to grow before we let helmfile
33+
// schedule a 4TB PVC that will silently fill the host overnight.
34+
func freeDiskBytes(path string) (uint64, error) {
35+
var stat syscall.Statfs_t
36+
if err := syscall.Statfs(path, &stat); err != nil {
37+
return 0, fmt.Errorf("statfs %s: %w", path, err)
38+
}
39+
// Bavail is reserved-block-aware (vs Bfree); use it for "what a regular
40+
// process can actually allocate".
41+
return stat.Bavail * uint64(stat.Bsize), nil
42+
}
43+
44+
// CheckNetworkDiskSpace warns when the data directory has less free disk
45+
// than the install is expected to need. The default answer is to continue:
46+
// in non-interactive contexts (no TTY, JSON mode) the prompt auto-accepts
47+
// so scripted installs don't deadlock. The user only blocks the install by
48+
// explicitly declining at an interactive prompt.
49+
func CheckNetworkDiskSpace(u *ui.UI, dataDir, network, mode string) error {
50+
requiredGB := diskSpaceRequirementGB(network, mode)
51+
52+
freeBytes, err := freeDiskBytes(dataDir)
53+
if err != nil {
54+
// Best-effort: a statfs failure shouldn't block install.
55+
u.Warnf("Could not check free disk space at %s: %v", dataDir, err)
56+
return nil
57+
}
58+
59+
freeGB := freeBytes / (1024 * 1024 * 1024)
60+
61+
u.Detail("Disk space", fmt.Sprintf("%d GB free at %s (this network needs ~%d GB)", freeGB, dataDir, requiredGB))
62+
63+
if freeGB >= requiredGB {
64+
return nil
65+
}
66+
67+
u.Warnf("Low disk space: %d GB free, ~%d GB recommended for %s/%s",
68+
freeGB, requiredGB, network, mode)
69+
if mode != "archive" {
70+
u.Dim(" (full mode is the lighter option; archive mode would need ~4-5 TB on mainnet)")
71+
}
72+
73+
if !u.Confirm("Continue with install anyway?", true) {
74+
return fmt.Errorf("install cancelled: insufficient disk space (%d GB free, ~%d GB recommended)", freeGB, requiredGB)
75+
}
76+
77+
return nil
78+
}

0 commit comments

Comments
 (0)