Skip to content

Commit c235e84

Browse files
authored
fix: dkg use different ports and adjust timeout (#435)
* fix: dkg use different ports and adjust timeout * fix: change shutdown delay to 30s
1 parent 1bcec93 commit c235e84

5 files changed

Lines changed: 22 additions & 16 deletions

File tree

.github/workflows/dkg-runner.yml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -105,8 +105,11 @@ jobs:
105105
CHARON_BIN: ${{ github.workspace }}/bin/charon
106106
PLUTO_BIN: ${{ github.workspace }}/target/debug/pluto
107107
TIMEOUT: ${{ inputs.timeout || '120' }}
108+
SHUTDOWN_DELAY: "30s"
109+
NODE_EXIT_TIMEOUT: "180"
108110
RUN_SMOKE_VERIFY: "1"
109-
SMOKE_SECONDS: "3"
111+
SMOKE_SECONDS: "8"
112+
SMOKE_PORT_BASE: "19000"
110113
run: ./scripts/dkg-runner/run.sh
111114

112115
- name: Upload work dir on failure

scripts/dkg-runner/README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -60,13 +60,13 @@ All variables are optional. Set them in the environment before calling any scrip
6060
| `FEE_RECIPIENT` | `0xDeaDBeef…` | Fee recipient address for the cluster |
6161
| `WITHDRAWAL_ADDR` | `0xDeaDBeef…` | Withdrawal address for the cluster |
6262
| `TIMEOUT` | `120` | Seconds to wait before declaring the ceremony failed |
63-
| `SHUTDOWN_DELAY` | `30s` | Graceful shutdown delay passed to each node via `--shutdown-delay` |
64-
| `NODE_EXIT_TIMEOUT` | `90` | Seconds to wait for node processes to exit cleanly after artifacts appear |
63+
| `SHUTDOWN_DELAY` | `120s` | Graceful shutdown delay passed to each node via `--shutdown-delay` |
64+
| `NODE_EXIT_TIMEOUT` | `180` | Seconds to wait for node processes to exit cleanly after artifacts appear |
6565
| `PLUTO_BIN` | `./target/debug/pluto` | Path to the Pluto binary (only required when `PLUTO_NODES > 0`) |
6666
| `CHARON_BIN` | `charon` | Path to the Charon binary |
6767
| `RUN_SMOKE_VERIFY` | `1` | Smoke-start the collected node dirs with `charon run` after output collection |
68-
| `SMOKE_SECONDS` | `8` | Seconds the smoke-started nodes must stay alive |
69-
| `SMOKE_PORT_BASE` | `39000` | First local port used by smoke verification |
68+
| `SMOKE_SECONDS` | `8` | Seconds to wait for smoke validator APIs to become ready |
69+
| `SMOKE_PORT_BASE` | `19000` | First local port used by smoke verification |
7070
| `WORK_DIR` | `/tmp/dkg-run` | Scratch directory — wiped at the start of every run |
7171
| `KEEP_NODES` | `0` | Leave node processes running after a successful ceremony when set to `1`/`true`/`yes`/`on` |
7272
| `CI` | _(unset)_ | When truthy, suppresses per-node tee to stdout; logs go to `WORK_DIR/node-*/node.log` only |
@@ -83,7 +83,7 @@ All variables are optional. Set them in the environment before calling any scrip
8383
| 4 | `wait-node-exits.sh` | Waits for each node process to exit with status `0` unless `KEEP_NODES` is enabled |
8484
| 5 | `collect.sh` | Copies keystores and `cluster-lock.json` to `WORK_DIR/output/`; prints a summary |
8585
| 6 | `ci/verify-output-semantic.sh` | Validates the collected output is internally consistent across nodes |
86-
| 7 | `ci/verify-run-smoke.sh` | Starts the collected node dirs with `charon run` and checks they stay up through the smoke window |
86+
| 7 | `ci/verify-run-smoke.sh` | Starts the collected node dirs with `charon run` and checks every validator API reaches readiness |
8787

8888
On success, outputs are under `$WORK_DIR/output/`. On failure or timeout, partial outputs are still collected and `WORK_DIR` is preserved for inspection. `run.sh` never deletes `WORK_DIR`; use `./scripts/dkg-runner/reset.sh` when you're done.
8989

scripts/dkg-runner/ci/verify-run-smoke.sh

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,22 +8,22 @@
88
# SMOKE_SECONDS seconds allowed for monitoring endpoints to become ready
99
# (default: 8)
1010
# SMOKE_PORT_BASE
11-
# first local port used by this check (default: 39000)
11+
# first local port used by this check (default: 19000)
1212
#
1313
# This verifies the generated full node data dirs are loadable by a later
1414
# Charon/Pluto-style runtime: cluster lock, p2p key, validator keystores, and
1515
# passwords are all usable enough for the process to start.
1616
#
1717
# It does not prove real beacon duties. It uses Charon simnet mocks and kills
18-
# the processes after the smoke window.
18+
# the processes after every validator API reaches readiness.
1919

2020
set -euo pipefail
2121

2222
WORK_DIR="${WORK_DIR:-/tmp/dkg-run}"
2323
NODES="${NODES:-4}"
2424
CHARON_BIN="${CHARON_BIN:-charon}"
2525
SMOKE_SECONDS="${SMOKE_SECONDS:-8}"
26-
SMOKE_PORT_BASE="${SMOKE_PORT_BASE:-39000}"
26+
SMOKE_PORT_BASE="${SMOKE_PORT_BASE:-19000}"
2727
SMOKE_DIR="${WORK_DIR}/run-smoke"
2828

2929
fail() {

scripts/dkg-runner/config.sh

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,15 +11,15 @@
1111
: "${CHARON_NODES:=2}"
1212
: "${RELAY_URL:=https://0.relay.obol.tech}"
1313
: "${TIMEOUT:=120}"
14-
: "${SHUTDOWN_DELAY:=5s}"
15-
: "${NODE_EXIT_TIMEOUT:=90}"
14+
: "${SHUTDOWN_DELAY:=120s}"
15+
: "${NODE_EXIT_TIMEOUT:=180}"
1616
: "${PLUTO_BIN:=./target/debug/pluto}"
1717
: "${CHARON_BIN:=charon}"
1818
: "${WORK_DIR:=/tmp/dkg-run}"
1919
: "${KEEP_NODES:=0}"
2020
: "${RUN_SMOKE_VERIFY:=1}"
2121
: "${SMOKE_SECONDS:=8}"
22-
: "${SMOKE_PORT_BASE:=39000}"
22+
: "${SMOKE_PORT_BASE:=19000}"
2323
: "${NETWORK:=holesky}"
2424
: "${FEE_RECIPIENT:=0xDeaDbeefdEAdbeefdEadbEEFdeadbeEFdEaDbeeF}"
2525
: "${WITHDRAWAL_ADDR:=0xDeaDbeefdEAdbeefdEadbEEFdeadbeEFdEaDbeeF}"

scripts/dkg-runner/run.sh

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,18 +12,20 @@
1212
# RELAY_URL=https://0.relay.obol.tech
1313
# Relay ENR endpoint used by the DKG nodes.
1414
# TIMEOUT=120 Seconds to wait for all nodes before aborting.
15-
# NODE_EXIT_TIMEOUT=90 Seconds to wait for nodes to exit after completion.
15+
# SHUTDOWN_DELAY=120s Graceful shutdown delay passed to each node.
16+
# NODE_EXIT_TIMEOUT=180
17+
# Seconds to wait for nodes to exit after completion.
1618
# PLUTO_BIN=./target/debug/pluto
1719
# Path to the Pluto binary.
1820
# CHARON_BIN=charon Path to the Charon binary.
1921
# WORK_DIR=/tmp/dkg-run
2022
# Scratch directory for the run (wiped on every call).
2123
# KEEP_NODES=0 Leave nodes running after a successful ceremony when
2224
# set to 1/true/yes/on.
23-
# RUN_SMOKE_VERIFY=0 Smoke-start generated node dirs with charon run after
25+
# RUN_SMOKE_VERIFY=1 Smoke-start generated node dirs with charon run after
2426
# successful output collection.
25-
# SMOKE_SECONDS=8 Seconds smoke-started nodes must stay alive.
26-
# SMOKE_PORT_BASE=39000
27+
# SMOKE_SECONDS=8 Seconds to wait for smoke validator APIs to become ready.
28+
# SMOKE_PORT_BASE=19000
2729
# First local port used by runtime smoke verification.
2830
# NETWORK=holesky Ethereum network for the cluster definition.
2931
# FEE_RECIPIENT=0xDeaD...
@@ -110,6 +112,7 @@ log_info " CHARON_NODES = ${CHARON_NODES}"
110112
log_info " RELAY_URL = ${RELAY_URL}"
111113
log_info " NETWORK = ${NETWORK}"
112114
log_info " TIMEOUT = ${TIMEOUT}s"
115+
log_info " SHUTDOWN_DELAY = ${SHUTDOWN_DELAY}"
113116
log_info " NODE_EXIT_TIMEOUT = ${NODE_EXIT_TIMEOUT}s"
114117
log_info " PLUTO_BIN = ${PLUTO_BIN}"
115118
log_info " CHARON_BIN = ${CHARON_BIN}"

0 commit comments

Comments
 (0)