Skip to content

validator-ejector default endpoints break with EL=el-none / CL=cl-none #316

@nickh-obol

Description

@nickh-obol

🎯 Problem to be solved

In .env.sample.{hoodi,mainnet}, the validator-ejector endpoint defaults are:

VE_BEACON_NODE_URL=http://${CL}:5052
VE_EXECUTION_NODE_URL=http://${EL}:8545

When an operator runs the BN-less / EL-less pattern (CL=cl-none and/or EL=el-none, e.g. when using an external BN or sharing a BN across two stacks on one host), these resolve to http://cl-none:5052 / http://el-none:8545 — hostnames that don't exist. validator-ejector fails to start unless the operator remembers to override VE_BEACON_NODE_URL and VE_EXECUTION_NODE_URL explicitly.

This is currently called out as a gotcha in CLAUDE.md (see "Use an external beacon node + execution client" and "Run two DV stacks on one host" sections), with operator-side workarounds. Tracking here for a proper fix.

💡 Discussion: where the fallback should live

Two compose-level options were considered:

(A) Nested fallback in docker-compose.yml:

- CONSENSUS_NODE=${VE_BEACON_NODE_URL:-${CHARON_BEACON_NODE_ENDPOINTS:-http://lighthouse:5052}}
- EXECUTION_NODE=${VE_EXECUTION_NODE_URL:-${CHARON_EXECUTION_CLIENT_RPC_ENDPOINT:-http://nethermind:8545}}

Then .env.sample.* comments out VE_BEACON_NODE_URL / VE_EXECUTION_NODE_URL with a note that they only need to be set if VE should use a different endpoint than Charon.

(B) Inheritance in .env.sample.*:

VE_BEACON_NODE_URL=${CHARON_BEACON_NODE_ENDPOINTS:-http://${CL}:5052}

Lower blast radius but keeps the http://${CL}:... trap half-alive.

(A) is cleaner.

⚠️ Backwards-compat concern (reason this is deferred)

CHARON_BEACON_NODE_ENDPOINTS is comma-separated — Charon accepts a list. validator-ejector's CONSENSUS_NODE is single-valued: parsed via str() in src/services/config/service.ts upstream. So if an operator already runs Charon with multi-BN (CHARON_BEACON_NODE_ENDPOINTS=http://bn1,http://bn2), inheriting that into VE silently produces an invalid config and breaks a previously-working cluster on update.

Multi-BN Charon is uncommon in DV practice (fallback BN latency is bad for DV performance), but "uncommon" is not "impossible" — and avoiding regressions for working clusters is more important to us than fixing a gotcha that's mostly theoretical for current operators.

Possible mitigations if/when this gets picked up:

  • Document the caveat and ship anyway (small audience affected).
  • Add a compose-level shim that picks the first entry of the list (entrypoint override; fragile against image updates).
  • Restructure with a shared STACK_BEACON_NODE_URL single source of truth (bigger diff).
  • Wait for upstream multi-BN support: lidofinance/validator-ejector#94 (open since Oct 2023, no movement).

🔗 Related

  • Open upstream issue requesting list/failover support: lidofinance/validator-ejector#94
  • In-flight branch that relocates the validator-ejector service definition into a new compose-lido.yml: nick/option-to-shut-off-lidodvexit-and-ejector (last activity Feb 2026). Any fix here will need to be applied in whichever file the service ends up in.

👐 Additional acceptance criteria

  • Operator can set EL=el-none and/or CL=cl-none and have validator-ejector start cleanly without needing to set VE_* overrides explicitly.
  • No regression for clusters that currently set VE_* explicitly (those values must continue to win).
  • No regression for clusters running default local BN/EL.
  • .env.sample.* and CLAUDE.md updated together so the gotcha note can be removed.

❌ Out of scope

  • Adding multi-BN support to validator-ejector itself (upstream concern).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions