Skip to content

Commit fbf0bd4

Browse files
committed
Add skills
1 parent 713030b commit fbf0bd4

9 files changed

Lines changed: 680 additions & 0 deletions

File tree

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
---
2+
name: add-operators
3+
description: Add new operators to an existing Charon distributed validator cluster
4+
user-invokable: true
5+
---
6+
7+
# Add Operators
8+
9+
> **Warning:** This is an alpha feature and is not yet recommended for production use.
10+
11+
Expand a Charon cluster by adding new operators. This is a coordinated operation involving both existing and new operators.
12+
13+
## Prerequisites
14+
15+
Read `scripts/edit/add-operators/README.md` for full details if needed.
16+
17+
Common prerequisites:
18+
1. `.env` file exists with `NETWORK` and `VC` variables set
19+
2. `.charon` directory with `cluster-lock.json` and `charon-enr-private-key`
20+
3. Docker is running
21+
4. `jq` installed
22+
23+
## Role Selection
24+
25+
Ask the user: **"Are you an existing operator in the cluster, or a new operator joining?"**
26+
27+
### If Existing Operator
28+
29+
**Script**: `scripts/edit/add-operators/existing-operator.sh`
30+
31+
**Additional prerequisites**:
32+
- `.charon/cluster-lock.json` and `.charon/validator_keys/` must exist
33+
- The script will automatically stop the VC container for ASDB export
34+
35+
**Arguments to gather**:
36+
- `--new-operator-enrs`: Comma-separated ENRs of the new operators joining
37+
- Whether to use `--dry-run` first
38+
39+
**Run**:
40+
```bash
41+
./scripts/edit/add-operators/existing-operator.sh \
42+
--new-operator-enrs "enr:-...,enr:-..." \
43+
[--dry-run]
44+
```
45+
46+
Set `WORK_DIR` env var to override the repository root directory if running from a custom location.
47+
48+
49+
The script will export the anti-slashing database, run the P2P ceremony, update keys, and print commands to start containers manually. After completion, remind the user to **wait ~2 epochs before starting** containers.
50+
51+
### If New Operator
52+
53+
**Script**: `scripts/edit/add-operators/new-operator.sh`
54+
55+
This is a **two-step process**:
56+
57+
#### Step 1: Generate ENR
58+
59+
Ask if the user needs to generate an ENR (first time setup):
60+
61+
```bash
62+
./scripts/edit/add-operators/new-operator.sh --generate-enr
63+
```
64+
65+
This creates `.charon/charon-enr-private-key` and displays the ENR. Tell the user to **share this ENR with the existing operators**.
66+
The existing operators, in turn, need to share the `cluster-lock.json` with the new operators, which contains the current cluster configuration and is required for the P2P ceremony.
67+
68+
#### Step 2: Join the Ceremony
69+
70+
After the existing operators have the ENR, gather:
71+
- `--new-operator-enrs`: Comma-separated ENRs of ALL new operators (including their own)
72+
- `--cluster-lock`: Path to the `cluster-lock.json` received from existing operators
73+
- Whether to use `--dry-run` first
74+
75+
```bash
76+
./scripts/edit/add-operators/new-operator.sh \
77+
--new-operator-enrs "enr:-...,enr:-..." \
78+
--cluster-lock ./received-cluster-lock.json \
79+
[--dry-run]
80+
```
81+
82+
Set `WORK_DIR` env var to override the repository root directory if running from a custom location.
83+
84+
Remind the user that **all operators (existing AND new) must participate simultaneously** in the P2P ceremony. After completion, the script will print commands to start containers manually. The new operator does NOT have slashing protection history (fresh start).
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
---
2+
name: add-validators
3+
description: Add new validators to an existing Charon distributed validator cluster
4+
user-invokable: true
5+
---
6+
7+
# Add Validators
8+
9+
> **Warning:** This is an alpha feature and is not yet recommended for production use.
10+
11+
Add new validators to an existing Charon distributed validator cluster. All operators must run this simultaneously as it requires a P2P ceremony.
12+
13+
## Prerequisites
14+
15+
Before running, verify:
16+
1. `.env` file exists with `NETWORK` and `VC` variables set
17+
2. `.charon/cluster-lock.json` and `.charon/deposit-data*.json` exist
18+
3. Docker is running
19+
4. `jq` is installed
20+
21+
Read `scripts/edit/add-validators/README.md` for full details if needed.
22+
23+
## Gather Arguments
24+
25+
Ask the user for the following required arguments using AskUserQuestion:
26+
27+
1. **Number of validators** (`--num-validators`): How many new validators to add (positive integer)
28+
2. **Withdrawal addresses** (`--withdrawal-addresses`): Comma-separated Ethereum withdrawal address(es)
29+
3. **Fee recipient addresses** (`--fee-recipient-addresses`): Comma-separated fee recipient address(es)
30+
31+
Also ask whether they want to:
32+
- Run with `--dry-run` first to preview the operation
33+
- Use `--unverified` flag (skip key verification, used for remote KeyManager API setups)
34+
35+
## Execution
36+
37+
Run the script from the repository root:
38+
39+
```bash
40+
./scripts/edit/add-validators/add-validators.sh \
41+
--num-validators <N> \
42+
--withdrawal-addresses <addrs> \
43+
--fee-recipient-addresses <addrs> \
44+
[--unverified] [--dry-run]
45+
```
46+
47+
Set `WORK_DIR` env var to override the repository root directory if running from a custom location.
48+
49+
The script will:
50+
1. Validate prerequisites
51+
2. Display current cluster info (operators, validators)
52+
3. Run a P2P ceremony (all operators must participate simultaneously)
53+
4. Stop containers if they were running
54+
5. Backup `.charon/` to `./backups/`
55+
6. Install new configuration
56+
7. Print commands to start containers manually
57+
58+
Remind the user that **all operators must run this script at the same time** for the P2P ceremony to succeed.
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
---
2+
name: export-asdb
3+
description: Export the anti-slashing database (EIP-3076) from the validator client
4+
user-invokable: true
5+
---
6+
7+
# Export Anti-Slashing Database
8+
9+
> **Warning:** This is an alpha feature and is not yet recommended for production use.
10+
11+
Export the EIP-3076 anti-slashing database from the validator client. The VC container must be stopped before export.
12+
13+
## Prerequisites
14+
15+
1. `.env` file exists with `VC` variable set
16+
2. VC container must be **stopped**
17+
18+
Read `scripts/edit/vc/README.md` for full details if needed.
19+
20+
## Gather Arguments
21+
22+
Ask the user for:
23+
- `--output-file`: Path to write the exported JSON file (e.g., `./asdb-export/slashing-protection.json`)
24+
25+
## Execution
26+
27+
```bash
28+
./scripts/edit/vc/export_asdb.sh --output-file <path>
29+
```
30+
31+
The `VC` variable is read from `.env` automatically. The script routes to the appropriate VC-specific export implementation (lodestar, teku, prysm, or nimbus).
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
---
2+
name: import-asdb
3+
description: Import an anti-slashing database (EIP-3076) into the validator client
4+
user-invokable: true
5+
---
6+
7+
# Import Anti-Slashing Database
8+
9+
> **Warning:** This is an alpha feature and is not yet recommended for production use.
10+
11+
Import an EIP-3076 anti-slashing database into the validator client. The VC container must be stopped.
12+
13+
## Prerequisites
14+
15+
1. `.env` file exists with `VC` variable set
16+
2. VC container must be **stopped**
17+
18+
Read `scripts/edit/vc/README.md` for full details if needed.
19+
20+
## Gather Arguments
21+
22+
Ask the user for:
23+
- `--input-file`: Path to the JSON file to import (e.g., `./asdb-export/slashing-protection.json`)
24+
25+
## Execution
26+
27+
```bash
28+
./scripts/edit/vc/import_asdb.sh --input-file <path>
29+
```
30+
31+
The `VC` variable is read from `.env` automatically. The script routes to the appropriate VC-specific import implementation (lodestar, teku, prysm, or nimbus).
Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
---
2+
name: local-monitoring
3+
description: Query the local Grafana/Prometheus/Loki stack shipped with this CDVN repo. Use when investigating cluster health, charon/beacon/EL errors, peer connectivity, validator performance, or log patterns against the locally-running monitoring stack (not Obol's hosted Grafana).
4+
user-invokable: true
5+
---
6+
7+
# Local Monitoring
8+
9+
Query the local monitoring stack (Grafana, Prometheus, Loki) that ships with this repo to investigate cluster health and diagnose issues.
10+
11+
For Obol's hosted Grafana (across all clusters), use the `obol-monitoring` skill instead. This skill is for the local stack only.
12+
13+
## Prerequisites
14+
15+
Before running, verify:
16+
1. The monitoring stack is up: `docker compose ps prometheus grafana loki` shows them running
17+
2. Grafana is reachable on the host at `http://localhost:${MONITORING_PORT_GRAFANA:-3000}` (default 3000)
18+
3. The user knows their Grafana admin credentials, or has unauthenticated access enabled (default in this repo's `grafana.ini`)
19+
20+
If the stack isn't up, point the user to `docker compose up -d prometheus grafana loki` first.
21+
22+
## Architecture notes
23+
24+
- **Prometheus** (`:9090`) and **Loki** (`:3100`) are on the docker network only — not exposed to the host by default. Query them through one of:
25+
- **Grafana datasource proxy** (preferred): `http://localhost:3000/api/datasources/proxy/uid/<prometheus|loki>/<path>` — uses Grafana's own connection
26+
- **`docker compose exec`** fallback: `docker compose exec prometheus wget -qO- 'http://localhost:9090/api/v1/query?query=...'`
27+
- Datasource UIDs (from `grafana/datasource.yml`): `prometheus`, `loki`, `tempo`
28+
- Charon metrics are labeled with `cluster_name` and `cluster_peer` — get these from `.env` (`CLUSTER_NAME`, `CLUSTER_PEER`) before querying
29+
30+
## Gather Arguments
31+
32+
Use AskUserQuestion to clarify what the user wants to investigate. Common shapes:
33+
34+
1. **What to investigate** — pick one:
35+
- Cluster health snapshot (readyz, peers, active validators)
36+
- Charon error/log search (last N minutes)
37+
- Beacon node performance (latency, sync status)
38+
- Peer connectivity (ping latency, connection types)
39+
- Custom PromQL / LogQL query
40+
2. **Time range** — default last 15m; ask if investigating a specific incident
41+
3. **Cluster scope** — usually their own (`$CLUSTER_NAME` from `.env`); ask only if multiple clusters share this Prometheus
42+
43+
If the request is already specific (e.g. "show me charon errors from the last hour"), skip AskUserQuestion and proceed.
44+
45+
## Execution
46+
47+
### Instant query (Prometheus)
48+
49+
```bash
50+
GRAFANA_URL="http://localhost:${MONITORING_PORT_GRAFANA:-3000}"
51+
curl -sG "$GRAFANA_URL/api/datasources/proxy/uid/prometheus/api/v1/query" \
52+
--data-urlencode 'query=<PROMQL>'
53+
```
54+
55+
### Range query (Prometheus)
56+
57+
```bash
58+
curl -sG "$GRAFANA_URL/api/datasources/proxy/uid/prometheus/api/v1/query_range" \
59+
--data-urlencode 'query=<PROMQL>' \
60+
--data-urlencode "start=$(date -u -v-15M +%s)" \
61+
--data-urlencode "end=$(date -u +%s)" \
62+
--data-urlencode 'step=30s'
63+
```
64+
65+
### Log search (Loki)
66+
67+
```bash
68+
curl -sG "$GRAFANA_URL/api/datasources/proxy/uid/loki/loki/api/v1/query_range" \
69+
--data-urlencode 'query={service_name="charon"} |= "error"' \
70+
--data-urlencode "start=$(date -u -v-15M +%s)000000000" \
71+
--data-urlencode "end=$(date -u +%s)000000000" \
72+
--data-urlencode 'limit=200'
73+
```
74+
75+
### Fallback via `docker compose exec`
76+
77+
If the Grafana proxy is unavailable:
78+
```bash
79+
docker compose exec prometheus wget -qO- "http://localhost:9090/api/v1/query?query=<URL_ENCODED_PROMQL>"
80+
docker compose exec loki wget -qO- "http://localhost:3100/loki/api/v1/query_range?query=<...>"
81+
```
82+
83+
For a query cookbook (cluster health, charon errors, peer ping, BN latency, validator effectiveness), see [queries.md](queries.md).
84+
85+
## Output handling
86+
87+
Parse the JSON response and present results clearly:
88+
89+
- **Prometheus instant query** — show metric labels + value, flag anomalies (zeros where non-zero expected, threshold breaches)
90+
- **Prometheus range query** — summarise min/max/avg over the window; call out spikes
91+
- **Loki logs** — group by `cluster_peer` if present; surface error/warn lines verbatim with timestamps; suppress repetitive noise
92+
- Always print the **exact query that was run** so the user can re-run it in Grafana
93+
94+
If the response contains `"status":"error"`, surface the `error` and `errorType` fields and stop — do not invent results.
95+
96+
## Common diagnoses
97+
98+
When showing results, watch for these patterns and call them out:
99+
100+
- **`app_monitoring_readyz != 1`** — node is not ready; explain what readyz state means (1=ready, other=various failure modes documented in charon docs)
101+
- **High `p2p_ping_latency_secs` p90** — peer network is slow; check `p2p_peer_connection_types` for relayed vs direct
102+
- **`p2p_ping_success == 0`** for a peer — that operator is unreachable
103+
- **Charon log `error` spikes** — group by `topic` / `component` to identify which subsystem
104+
- **`core_scheduler_validators_active` lower than `cluster_validators`** — some validators not active (not yet activated, or exited)
105+
- **EL/CL container missing from metrics** — check `docker compose ps` and respective container logs
106+
107+
## Pointers to dashboards
108+
109+
Direct the user to the pre-provisioned dashboards in `grafana/dashboards/` rather than reinventing them:
110+
- `charon_overview_dashboard.json` — readyz, peers, validator activity (start here)
111+
- `cluster_dashboard.json` — full cluster view across operators
112+
- `node_overview_dashboard.json` — host/EL/CL/VC resource usage
113+
- `logs_dashboard.json` — Loki log explorer with charon filters
114+
115+
Open in browser: `http://localhost:${MONITORING_PORT_GRAFANA:-3000}/dashboards`.
116+
117+
## Dependencies
118+
119+
- `curl`, `jq` (for parsing responses cleanly)
120+
- Running `prometheus`, `grafana`, `loki` containers from this compose stack
121+
- `CLUSTER_NAME` and `CLUSTER_PEER` set in `.env` (used as Prometheus label values)

0 commit comments

Comments
 (0)