|
| 1 | +--- |
| 2 | +name: kurtosis-test |
| 3 | +description: Run a local Kurtosis Ethereum testnet against a locally-built erigon image, monitor EL/CL/assertoor/spamoor health, triage failures with a cross-client comparison methodology, and auto-iterate fix → rebuild → rerun. Use when the user wants to reproduce, debug, or validate erigon against an `ethereum-package` config locally — equivalent to the `test-kurtosis-assertoor` CI workflow but interactive. Handles image build, enclave lifecycle, block-progress + assertoor + log watching, log dumping on failure, and the erigon-source fix loop. |
| 4 | +argument-hint: "<config-yaml-path> [enclave-name] [duration=Nm] [auto=true|false] [max-attempts=N]" |
| 5 | +allowed-tools: Bash, Read, Write, Edit, Glob, Grep, WebFetch, Skill |
| 6 | +--- |
| 7 | + |
| 8 | +# Run a local Kurtosis Ethereum testnet against erigon |
| 9 | + |
| 10 | +This skill mirrors the CI workflow at `.github/workflows/test-kurtosis-assertoor.yml` |
| 11 | +but runs **locally** via the raw `kurtosis` CLI. The CI uses the |
| 12 | +`ethpandaops/kurtosis-assertoor-github-action@v1` wrapper, which is not portable outside |
| 13 | +GitHub Actions; this skill drives `kurtosis run`, `kurtosis enclave inspect`, |
| 14 | +`kurtosis service logs`, and `kurtosis enclave dump` directly. |
| 15 | + |
| 16 | +The skill takes an `ethereum-package` YAML config, builds the local |
| 17 | +`test/erigon:current` Docker image, starts a Kurtosis enclave, monitors |
| 18 | +EL/CL/assertoor/spamoor health, triages failures (challenging peer clients against |
| 19 | +erigon to identify the offender), and iterates a fix → rebuild → rerun loop until the |
| 20 | +testnet is stable or `max-attempts` is reached. |
| 21 | + |
| 22 | +## Inputs |
| 23 | + |
| 24 | +The model parses these arguments and binds them to the shell variables used in the |
| 25 | +bash blocks below: `$1`/`$2` are positional; `duration=Nm` → `duration_secs`, |
| 26 | +`auto=true|false` → `auto`, `max-attempts=N` → `max_attempts`. |
| 27 | + |
| 28 | +| Argument | Default | Notes | |
| 29 | +|---|---|---| |
| 30 | +| `$1` config path | required | Path to an `ethereum-package` args YAML. The reference set lives in `.github/workflows/kurtosis/`, but that directory also contains assertoor playbooks (`id:` / `tasks:` schema) — only the files whose top-level keys are `participants:` or `participants_matrix:` are valid here. To list candidates: `grep -lE '^participants(_matrix)?:' .github/workflows/kurtosis/*.io`. | |
| 31 | +| `$2` enclave name | `kurtosis-test-<unix-ts>` | Used for `kurtosis run --enclave`. Each rerun gets a fresh timestamp. | |
| 32 | +| `duration=Nm` | `20m` | Wall-clock window the monitor watches before declaring "stable" if no failures trip. | |
| 33 | +| `auto=true\|false` | `true` | If `true`, the fix-rebuild-rerun loop runs autonomously up to `max-attempts`. If `false`, pause for user approval before each fix. | |
| 34 | +| `max-attempts=N` | `5` | Cap on fix-loop iterations. After hitting the cap, halt and surface the per-attempt triage history. | |
| 35 | + |
| 36 | +## Prerequisites |
| 37 | + |
| 38 | +1. **Docker** running: `docker info >/dev/null` should succeed. |
| 39 | +2. **Kurtosis CLI** installed: `kurtosis version`. Install from |
| 40 | + https://docs.kurtosis.com/install if missing. CLI **≥ 1.18.1** is only needed when |
| 41 | + the `--package@branch` you run includes the `GpuConfig` Starlark built-in (i.e. |
| 42 | + `ethereum-package` `main` post commit `835dd9b`). The pinned branches in the |
| 43 | + mapping table below — including `glamsterdam`'s `6.1.0` — predate that change, so |
| 44 | + they work on older CLIs. See Troubleshooting if you hit a `GpuConfig` Starlark |
| 45 | + error. |
| 46 | +3. **Erigon source tree** at the cwd: `Makefile` exists and `go.mod` contains |
| 47 | + `module github.com/erigontech/erigon`. |
| 48 | +4. **`curl` and `jq`** on `$PATH` — used by the monitor / triage snippets below to |
| 49 | + poll the EL JSON-RPC endpoint and the assertoor API. |
| 50 | +5. **Fork detection** — skim the YAML for `_fork_epoch` keys. The repo's configs use |
| 51 | + the CL-side fork names: `deneb_fork_epoch` (Cancun on EL), `electra_fork_epoch` |
| 52 | + (Prague), `fulu_fork_epoch` (Osaka), `gloas_fork_epoch` (Amsterdam). Future fork |
| 53 | + keys will follow the same CL-naming convention. Feeds the spec-lookup section below. |
| 54 | + |
| 55 | +## Spec lookup (when debugging unfamiliar forks/EIPs) |
| 56 | + |
| 57 | +If the YAML enables a fork under development, invoke `/erigon-implement-eip` Steps 2–4 |
| 58 | +to fetch: |
| 59 | + |
| 60 | +- **Step 2** — referenced/dependent EIPs. |
| 61 | +- **Step 3** — the meta EIP enumerating which EIPs the fork includes (CFI/SFI/PFI/DFI lists). |
| 62 | +- **Step 4** — the latest devnet specification at `https://notes.ethereum.org/@ethpandaops/<devnet>`. |
| 63 | + |
| 64 | +Use these as ground truth when triaging. For specific opcodes / state transitions, |
| 65 | +also pull the EIP body via Step 1. If anything in the spec is contradictory or |
| 66 | +ambiguous, **stop and ask the user** rather than guessing — the same rule the EIP skill |
| 67 | +enforces. |
| 68 | + |
| 69 | +## Build the erigon docker image |
| 70 | + |
| 71 | +The image tag must be **exactly** `test/erigon:current` because every |
| 72 | +`.github/workflows/kurtosis/*.io` config references that tag. |
| 73 | + |
| 74 | +```bash |
| 75 | +docker build -t test/erigon:current --build-arg BINARIES="erigon caplin" . |
| 76 | +``` |
| 77 | + |
| 78 | +`caplin` is required in `BINARIES` because some configs (e.g. |
| 79 | +`caplin-minimal-assertoor.io`) use erigon as the CL via the same image. |
| 80 | + |
| 81 | +Always rebuild before each run — the same approach the CI uses. BuildKit's layer cache |
| 82 | +makes the no-op rebuild fast, and the fix → rebuild → rerun loop necessarily picks up |
| 83 | +uncommitted source edits this way (a freshness check against `git log` would miss them |
| 84 | +and silently run a stale image). |
| 85 | + |
| 86 | +If the user asks for a from-scratch binary build instead of docker, point them at |
| 87 | +`/erigon-build`; this skill itself uses docker because the kurtosis configs reference a |
| 88 | +docker image tag. |
| 89 | + |
| 90 | +## Suite → ethereum-package branch mapping (from CI) |
| 91 | + |
| 92 | +The CI matrix pins different package branches per suite. Use the same pinning when the |
| 93 | +config matches a known CI file; for unknown configs, default to `5.0.1` and ask the |
| 94 | +user to confirm. |
| 95 | + |
| 96 | +| Config file | `--package@branch` | |
| 97 | +|---|---| |
| 98 | +| `regular-assertoor.io` | `github.com/ethpandaops/ethereum-package@5.0.1` | |
| 99 | +| `pectra.io` | `github.com/ethpandaops/ethereum-package@5.0.1` | |
| 100 | +| `glamsterdam.io` | `github.com/ethpandaops/ethereum-package@6.1.0` | |
| 101 | +| `caplin-assertoor.io` | `github.com/erigontech/ethereum-package@erigontech/fix-caplin-launcher` | |
| 102 | +| `caplin-minimal-assertoor.io` | `github.com/erigontech/ethereum-package@erigontech/fix-caplin-launcher` | |
| 103 | +| (other / user-supplied) | default `5.0.1`, prompt user if unsure | |
| 104 | + |
| 105 | +Note: `glamsterdam` is pinned to `6.1.0` rather than `main` because `main` introduced |
| 106 | +the `GpuConfig` Starlark built-in which requires kurtosis CLI ≥ 1.18.1. Caplin suites |
| 107 | +(`caplin-assertoor.io`, `caplin-minimal-assertoor.io`) require the `erigontech` fork — |
| 108 | +do not let them fall back to the default `5.0.1`. |
| 109 | + |
| 110 | +## Start the testnet |
| 111 | + |
| 112 | +```bash |
| 113 | +ENCLAVE="${2:-kurtosis-test-$(date +%s)}" |
| 114 | +CONFIG="$1" |
| 115 | + |
| 116 | +# Map config basename → ethereum-package branch (mirrors the table above). |
| 117 | +case "$(basename "$CONFIG")" in |
| 118 | + glamsterdam.io) |
| 119 | + PACKAGE_REF="github.com/ethpandaops/ethereum-package@6.1.0" ;; |
| 120 | + caplin-assertoor.io|caplin-minimal-assertoor.io) |
| 121 | + PACKAGE_REF="github.com/erigontech/ethereum-package@erigontech/fix-caplin-launcher" ;; |
| 122 | + regular-assertoor.io|pectra.io) |
| 123 | + PACKAGE_REF="github.com/ethpandaops/ethereum-package@5.0.1" ;; |
| 124 | + *) |
| 125 | + PACKAGE_REF="github.com/ethpandaops/ethereum-package@5.0.1" ;; |
| 126 | +esac |
| 127 | + |
| 128 | +kurtosis run \ |
| 129 | + "$PACKAGE_REF" \ |
| 130 | + --enclave "$ENCLAVE" \ |
| 131 | + --args-file "$CONFIG" \ |
| 132 | + --verbosity detailed --cli-log-level trace |
| 133 | +``` |
| 134 | + |
| 135 | +Once `kurtosis run` returns, capture service names and host-mapped ports: |
| 136 | + |
| 137 | +```bash |
| 138 | +kurtosis enclave inspect "$ENCLAVE" --full-uuids |
| 139 | + |
| 140 | +# Pick the first erigon EL service. `kurtosis enclave inspect` prints columnar |
| 141 | +# rows (UUID first, then service name), so we match against field 2. |
| 142 | +EL_SERVICE=$(kurtosis enclave inspect "$ENCLAVE" --full-uuids 2>/dev/null \ |
| 143 | + | awk '$2 ~ /^el-[0-9]+-erigon-[a-z]+$/ {print $2; exit}') |
| 144 | +EL_RPC_PORT=$(kurtosis port print "$ENCLAVE" "$EL_SERVICE" rpc 2>/dev/null \ |
| 145 | + | sed -E 's|.*:([0-9]+).*|\1|') |
| 146 | + |
| 147 | +# CL endpoint (whichever client paired with that EL) |
| 148 | +CL_SERVICE=$(kurtosis enclave inspect "$ENCLAVE" --full-uuids 2>/dev/null \ |
| 149 | + | awk '$2 ~ /^cl-[0-9]+-[a-z]+-erigon$/ {print $2; exit}') |
| 150 | +CL_HTTP_PORT=$(kurtosis port print "$ENCLAVE" "$CL_SERVICE" http 2>/dev/null \ |
| 151 | + | sed -E 's|.*:([0-9]+).*|\1|') |
| 152 | + |
| 153 | +# Optional services |
| 154 | +ASSERTOOR_PORT=$(kurtosis port print "$ENCLAVE" assertoor http 2>/dev/null | sed -E 's|.*:([0-9]+).*|\1|') |
| 155 | +DORA_PORT=$(kurtosis port print "$ENCLAVE" dora http 2>/dev/null | sed -E 's|.*:([0-9]+).*|\1|') |
| 156 | +SPAMOOR_PORT=$(kurtosis port print "$ENCLAVE" spamoor http 2>/dev/null | sed -E 's|.*:([0-9]+).*|\1|') |
| 157 | +``` |
| 158 | + |
| 159 | +Print the assertoor / dora URLs so the user can open the dashboards in a browser. |
| 160 | + |
| 161 | +## Monitor |
| 162 | + |
| 163 | +Three checks run on a polling loop until either (a) `duration` elapses, or (b) any |
| 164 | +failure trips. All three are captured in the run history. |
| 165 | + |
| 166 | +### Check A — Block height progress |
| 167 | + |
| 168 | +`duration_secs` is the parsed `duration=Nm` input (default 1200). Three terminal |
| 169 | +outcomes: |
| 170 | + |
| 171 | +- **STABLE** — chain produced blocks and the duration elapsed without a stall. |
| 172 | +- **STALL** — chain produced at least one block, then stopped advancing for |
| 173 | + `>3 × seconds_per_slot`. |
| 174 | +- **NO_PROGRESS** — chain never produced a block within `duration_secs` (e.g. |
| 175 | + validators didn't start). |
| 176 | + |
| 177 | +```bash |
| 178 | +duration_secs=${duration_secs:-1200} |
| 179 | +prev=0 |
| 180 | +slot_secs=$(grep -E '^\s*seconds_per_slot:' "$CONFIG" | awk '{print $2}'); slot_secs=${slot_secs:-12} |
| 181 | +poll_interval=$(( slot_secs * 2 )) |
| 182 | +stall_window=$(( slot_secs * 3 )) |
| 183 | +start=$(date +%s) |
| 184 | +end=$(( start + duration_secs )) |
| 185 | +stall_deadline=$(( start + stall_window )) |
| 186 | +outcome="" |
| 187 | + |
| 188 | +while [ "$(date +%s)" -lt "$end" ]; do |
| 189 | + height_hex=$(curl -s --max-time 5 "http://127.0.0.1:${EL_RPC_PORT}" \ |
| 190 | + -H 'Content-Type: application/json' \ |
| 191 | + -d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \ |
| 192 | + | jq -r '.result // empty') |
| 193 | + if [[ "$height_hex" =~ ^0x[0-9a-fA-F]+$ ]]; then |
| 194 | + height=$(printf '%d\n' "$height_hex") |
| 195 | + echo "[$(date -u +%H:%M:%S)] height=$height" |
| 196 | + if [ "$height" -gt "$prev" ]; then |
| 197 | + prev=$height |
| 198 | + stall_deadline=$(( $(date +%s) + stall_window )) |
| 199 | + fi |
| 200 | + else |
| 201 | + echo "[$(date -u +%H:%M:%S)] RPC unreachable or invalid response — retrying" |
| 202 | + fi |
| 203 | + # Only declare a stall after seeing at least one block. Many configs set |
| 204 | + # genesis_delay > stall_window (e.g. glamsterdam.io: 20s delay vs 18s window |
| 205 | + # at 6s slots), so the pre-genesis gap would otherwise trip a false stall. |
| 206 | + if [ "$prev" -gt 0 ] && [ "$(date +%s)" -gt "$stall_deadline" ]; then |
| 207 | + outcome="STALL: chain not progressing for >${stall_window}s (last height=$prev)" |
| 208 | + break |
| 209 | + fi |
| 210 | + sleep "$poll_interval" |
| 211 | +done |
| 212 | + |
| 213 | +if [ -z "$outcome" ]; then |
| 214 | + if [ "$prev" -eq 0 ]; then |
| 215 | + outcome="NO_PROGRESS: chain never produced a block within ${duration_secs}s" |
| 216 | + else |
| 217 | + outcome="STABLE: chain progressed for full ${duration_secs}s window (final height=$prev)" |
| 218 | + fi |
| 219 | +fi |
| 220 | +echo "$outcome" |
| 221 | +``` |
| 222 | + |
| 223 | +Pass: height advances ≥1 within every `3 × seconds_per_slot` (the stall window), |
| 224 | +sustained for the full `duration_secs` (poll cadence is `2 × seconds_per_slot`). |
| 225 | +Fail: STALL or NO_PROGRESS. |
| 226 | + |
| 227 | +### Check B — Assertoor results |
| 228 | + |
| 229 | +```bash |
| 230 | +curl -s "http://127.0.0.1:${ASSERTOOR_PORT}/api/v1/test_runs" \ |
| 231 | + | jq '.data[] | {name, status, result}' |
| 232 | +``` |
| 233 | + |
| 234 | +Pass: every test_run has `result=success`. Fail: any `result=failure`, or any test |
| 235 | +stuck `pending` / `running` past 3× its expected duration. The assertoor web UI at |
| 236 | +`http://127.0.0.1:${ASSERTOOR_PORT}/` shows per-step trees; use it for deep dives. |
| 237 | + |
| 238 | +### Check C — Erigon-focused log scan |
| 239 | + |
| 240 | +```bash |
| 241 | +kurtosis service logs "$ENCLAVE" "$EL_SERVICE" 2>&1 \ |
| 242 | + | grep -iE 'panic|fatal|^ERROR|"lvl"="error"|consensus failure|invalid block' \ |
| 243 | + | tail -200 |
| 244 | +``` |
| 245 | + |
| 246 | +For cross-client comparison (used by the triage section), run the same scan across |
| 247 | +every EL/CL service: |
| 248 | + |
| 249 | +```bash |
| 250 | +for svc in $(kurtosis enclave inspect "$ENCLAVE" --full-uuids \ |
| 251 | + | awk '$2 ~ /^(el|cl|vc)-[0-9]+-[a-z]+-[a-z]+$/ {print $2}'); do |
| 252 | + echo "=== $svc ===" |
| 253 | + kurtosis service logs "$ENCLAVE" "$svc" 2>&1 \ |
| 254 | + | grep -iE 'error|panic|fatal' | tail -30 |
| 255 | +done |
| 256 | +``` |
| 257 | + |
| 258 | +If `snooper-engine-*` services exist (when `snooper_enabled: true` in the YAML), pull |
| 259 | +their logs too — they capture the full Engine API request/response trace, invaluable |
| 260 | +when an EL bug is suspected. |
| 261 | + |
| 262 | +## Issue detection criteria |
| 263 | + |
| 264 | +| User check | Pass | Fail | |
| 265 | +|---|---|---| |
| 266 | +| 1. Block production / height progress | `eth_blockNumber` advances ≥1 every `2 × seconds_per_slot`, sustained for `duration` | No advance for `>3 × seconds_per_slot`, OR explicit chain reorg / fork-choice loop in CL logs | |
| 267 | +| 2. EL/CL log errors (focus erigon) | No `panic`, `fatal`, or error-level lines in any erigon service | Any erigon-side panic/fatal/consensus failure. Non-erigon errors recorded but informational unless they crash the peer. | |
| 268 | +| 3. Assertoor test failures | All assertoor `test_runs` reach `result=success` | Any `result=failure`, OR a test stuck `pending`/`running` past 3× expected duration | |
| 269 | + |
| 270 | +A single failed check trips the triage section. Block-stall + erigon panic + assertoor |
| 271 | +fail are independent signals — record all three in the run history; do not stop at the |
| 272 | +first. |
| 273 | + |
| 274 | +## Debugging methodology — triage erigon vs peer-client vs network/config |
| 275 | + |
| 276 | +Decision tree: |
| 277 | + |
| 278 | +1. **Reproduce.** A single one-shot failure gets one re-run before triaging. Truly |
| 279 | + intermittent failures still get triaged, but flag them as flaky. |
| 280 | +2. **Classify the symptom.** One of: block-stall, EL panic, EL invalid-payload, CL |
| 281 | + fork-choice mismatch, assertoor opcode/EIP test failure, spamoor tx-submission |
| 282 | + failure. |
| 283 | +3. **Cross-client comparison.** For each erigon-side error, find the equivalent moment |
| 284 | + in the peer-client log at the same slot/block. Three outcomes: |
| 285 | + - **Erigon wrong**: erigon rejects/panics; peer client + assertoor accept the |
| 286 | + block → erigon bug, fix locally. |
| 287 | + - **Peer wrong**: erigon accepts; peer rejects → check peer-client image tag |
| 288 | + against the fork's expected tag (often a stale image). Surface to user; do not |
| 289 | + fix erigon. |
| 290 | + - **Both disagree with spec**: clients produce different "valid" answers from what |
| 291 | + the EIP spec says → escalate to the user. Likely spec ambiguity or a misread. |
| 292 | +4. **Cross-reference the spec.** Pull the relevant EIP (`/erigon-implement-eip` Step 1) |
| 293 | + and the devnet spec (`/erigon-implement-eip` Step 4) for the failing block, opcode, |
| 294 | + or state transition. |
| 295 | +5. **Rule out config drift.** Diff the YAML's `el_extra_params`, `network_params`, and |
| 296 | + fork epochs against the equivalent CI suite under `.github/workflows/kurtosis/`. |
| 297 | + Mismatches there are config bugs, not erigon bugs. |
| 298 | +6. **Rule out enclave plumbing.** `kurtosis service exec <enclave> <svc> "ping |
| 299 | + <other_svc>"` to verify network reachability; check JWT mounting via |
| 300 | + `kurtosis service exec <enclave> <el-svc> "ls -la /jwt/"`. The CLI takes the |
| 301 | + command as a single positional arg (multi-word commands must be quoted) — there |
| 302 | + is no `--` separator. |
| 303 | + |
| 304 | +### Triage table |
| 305 | + |
| 306 | +| Symptom | Likely owner | Next action | |
| 307 | +|---|---|---| |
| 308 | +| Erigon panic with stack trace inside `execution/...` | Erigon | Capture stack, find offending call in repo, propose fix | |
| 309 | +| `eth_newPayloadV4` returns INVALID; CL logs say block is valid; assertoor passes elsewhere | Erigon (likely block-validation divergence) | Replay the payload via `debug_traceBlockByNumber` / `debug_traceBlockByHash`; check fork activation timestamp | |
| 310 | +| All EL clients stop progressing after a specific slot | Config (fork epoch wrong) or shared dep | Diff YAML against working CI suite; check ethereum-package branch | |
| 311 | +| Assertoor `block-proposal-check` fails on slot N for `vc-N-erigon-…` | Erigon block builder | Fetch block N body via RPC; replay locally | |
| 312 | +| Assertoor `synchronized-check` fails | Network plumbing | Inspect peer counts; `kurtosis service exec` connectivity test | |
| 313 | +| `caplin` panics but `lighthouse` runs fine on the same EL | Caplin | Edit YAML to swap CL to lighthouse for bisection; report to user | |
| 314 | +| Spamoor reports persistent "insufficient funds" / "nonce too low" | Spamoor config | Increase `funding_gas_limit`; lower `throughput`; check prefunded keys | |
| 315 | +| Snooper shows malformed Engine API request | Erigon RPC layer | Capture the request from snooper logs; inspect erigon engine handler | |
| 316 | +| Erigon accepts a payload that lighthouse + teku both reject | Erigon (single-client divergence) | Almost always an erigon bug — fix locally | |
| 317 | +| `eth_blockNumber` stays at 0x0 after >2 epochs | Validators not running | Check `vc-*` service logs; verify keystore mounting | |
| 318 | + |
| 319 | +## Fix-rebuild-rerun loop |
| 320 | + |
| 321 | +Auto-iterates by default (`auto=true`), capped at `max-attempts=5`. Per attempt: |
| 322 | + |
| 323 | +1. **Tear down**: `kurtosis enclave rm -f "$ENCLAVE"`. |
| 324 | +2. **Apply the fix** to erigon source via `Edit`. Only auto-apply when the triage |
| 325 | + classified the issue as "Erigon wrong" with high confidence. If ambiguous (peer |
| 326 | + could be wrong, or spec interpretation unclear), pause and surface to the user |
| 327 | + even before the cap — that overrides `auto=true`. |
| 328 | +3. **Rebuild image**: `docker build -t test/erigon:current --build-arg BINARIES="erigon caplin" .`. |
| 329 | +4. **Re-launch**: same config, fresh timestamped enclave name (so each attempt's dump |
| 330 | + stays separate). |
| 331 | +5. **Re-run monitor**: same three checks. |
| 332 | +6. **Record**: per attempt — symptom, hypothesis, fix applied, outcome. |
| 333 | + |
| 334 | +After `max-attempts` consecutive failures, halt and print the per-attempt history. Do |
| 335 | +not auto-apply more fixes once the cap is hit. Reference: `/autoresearch` follows the |
| 336 | +same iterate-and-record pattern. |
| 337 | + |
| 338 | +If `auto=false`, pause for user approval before steps 2–4 of every attempt. |
| 339 | + |
| 340 | +## Cleanup (always run) |
| 341 | + |
| 342 | +Run as the final step of every iteration regardless of outcome (success, failure, |
| 343 | +or user interrupt): |
| 344 | + |
| 345 | +```bash |
| 346 | +DUMP_DIR="/tmp/kurtosis-dump-${ENCLAVE}" |
| 347 | +# `kurtosis enclave dump` refuses if the destination already exists — never pre-create it. |
| 348 | +kurtosis enclave dump "$ENCLAVE" "$DUMP_DIR" || true |
| 349 | +kurtosis enclave rm -f "$ENCLAVE" || true |
| 350 | +echo "Logs dumped to: $DUMP_DIR" |
| 351 | +``` |
| 352 | + |
| 353 | +The dump contains per-service logs (`el-*`, `cl-*`, `vc-*`, `assertoor`, `spamoor`, |
| 354 | +`dora`, `snooper-*`) — keep this directory until triage is complete; GitHub blob |
| 355 | +storage is not in play here so the logs are only on disk locally. |
| 356 | + |
| 357 | +After multiple iterations, also prune dangling docker images: |
| 358 | + |
| 359 | +```bash |
| 360 | +docker image prune -f |
| 361 | +``` |
| 362 | + |
| 363 | +## Troubleshooting |
| 364 | + |
| 365 | +| Problem | Solution | |
| 366 | +|---|---| |
| 367 | +| `kurtosis run` fails with Starlark error mentioning `GpuConfig` | Use one of the pinned `--package@branch` values from the mapping table (e.g. `6.1.0` for glamsterdam, `5.0.1` for regular/pectra) — they predate the `GpuConfig` built-in. OR upgrade kurtosis CLI to ≥ 1.18.1 if you need an `ethereum-package` branch that includes it. | |
| 368 | +| `kurtosis enclave dump` errors "destination exists" | Use a fresh dir name or `rm -rf` it first | |
| 369 | +| All `el-*-erigon-*` services missing from `enclave inspect` | Image build failed; check `docker images \| grep test/erigon` and rerun docker build | |
| 370 | +| `eth_blockNumber` returns `0x0` forever | Validators didn't start; check `vc-*` service logs; verify keystore mounting | |
| 371 | +| Connection refused on assertoor port | Service still booting; or `assertoor` not in `additional_services` in the YAML | |
| 372 | +| `kurtosis service logs` truncates very long logs | Use `kurtosis enclave dump` for the full per-service log files | |
| 373 | +| `caplin-minimal` config fails with "binary not found" | Confirm `BINARIES="erigon caplin"` in the docker build args | |
| 374 | +| `eth_blockNumber` advances but assertoor reports timeout | Slot time / preset mismatch; check `seconds_per_slot` and `preset` in YAML | |
| 375 | +| Erigon image stale despite rebuild | `docker image rm test/erigon:current && docker build ...` to force; check BuildKit cache scope | |
| 376 | +| Port already allocated | Another enclave is running — `kurtosis enclave ls` then `kurtosis enclave rm -f <old>` | |
| 377 | +| Engine API JWT mismatch in EL logs | Check `kurtosis service exec <enclave> el-1-erigon-… "ls -la /jwt/"`; restart enclave if missing | |
0 commit comments