diff --git a/docs/run-arbitrum-node/02-run-full-node.mdx b/docs/run-arbitrum-node/02-run-full-node.mdx index 7c67978c62..c522376942 100644 --- a/docs/run-arbitrum-node/02-run-full-node.mdx +++ b/docs/run-arbitrum-node/02-run-full-node.mdx @@ -18,6 +18,12 @@ To view the short and long term support policy, visit the [Nitro support policy] ::: + + +This page covers running a node with Docker. If you want to deploy on Kubernetes, or you want a more production-ready setup with monitoring, log signals, and network egress guidance, follow [How to run a full node with Helm on Kubernetes](/run-arbitrum-node/run-full-node-with-helm.mdx) instead. + + + ## Putting it into practice: run a node :::warning Caution diff --git a/docs/run-arbitrum-node/run-full-node-with-helm.mdx b/docs/run-arbitrum-node/run-full-node-with-helm.mdx new file mode 100644 index 0000000000..3121059eb4 --- /dev/null +++ b/docs/run-arbitrum-node/run-full-node-with-helm.mdx @@ -0,0 +1,239 @@ +--- +title: 'How to run a full node with Helm on Kubernetes' +sidebar_label: 'Run a full node (Helm)' +description: 'Deploy an Arbitrum chain full node on Kubernetes with the community Helm chart, including memory configuration, monitoring, log signals, and network egress.' +user_story: 'As a node operator, I want to deploy and reliably operate an Arbitrum chain full node on Kubernetes using the community Helm chart.' +content_type: how-to +author: 'Jason-W123' +sme: 'Jason-W123' +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +This guide shows how to deploy a full node for an Arbitrum chain on Kubernetes using the [community Helm chart](https://github.com/OffchainLabs/community-helm-charts/tree/main/charts/nitro). It applies to any Arbitrum chain — [Arbitrum One](https://arbitrum.io), Nova, Sepolia, and your own Arbitrum chain — and covers installation, memory configuration, first sync from a snapshot, monitoring, the log signals that distinguish a healthy node from a broken one, and the outbound endpoints to allow through a firewall. + +The chart's defaults target Arbitrum One, so Arbitrum One is used as the worked example throughout. Each step notes what changes for other chains. + + + +If you want to run a node with Docker instead of Kubernetes, see [How to run a full node for an Arbitrum chain](/run-arbitrum-node/02-run-full-node.mdx). This page assumes you've reviewed the [Start here](/run-arbitrum-node/start-here.mdx) page, which explains the RPC endpoints, Nitro version, and database snapshots required to run a node. + + + +## Prerequisites + +- **Parent chain access:** an execution RPC endpoint for the chain's parent. For Arbitrum One, Nova, and Sepolia the parent is Ethereum, so you also need an **L1 beacon/blob** endpoint (see [L1 Ethereum RPC providers](/run-arbitrum-node/04-l1-ethereum-beacon-chain-rpc-providers.mdx)). For an Arbitrum chain whose parent is another chain, supply that parent chain's RPC; a beacon endpoint is only required when the chain posts blobs to its parent chain. +- **Cluster:** Kubernetes with Helm installed. +- **Disk:** size storage for the chain you're running. A node's database size varies widely from chain to chain and grows over time, so check the expected size with your chain operator before provisioning (for Arbitrum One it runs to multiple TB). Raise the chart's default `persistence.size` (`500Gi`) accordingly, and allow roughly 2× the snapshot size of additional temporary disk for extraction on the first run. + +## Step 1: Deploy with Helm + +First add the chart repo: + +```shell +helm repo add offchainlabs https://charts.arbitrum.io +helm repo update +``` + +Then install, selecting the configuration for your chain: + + + + +The chart defaults to Arbitrum One (`chain.id` `42161`, `parent-chain.id` `1`), so an Arbitrum One node needs only three values: + +```shell +# Arbitrum One +helm install arb1-fullnode offchainlabs/nitro \ + --set configmap.data.parent-chain.connection.url= \ + --set configmap.data.parent-chain.blob-client.beacon-url= \ + --set configmap.data.init.latest=pruned # pruned snapshot, recommended for Arbitrum One +``` + +For Nova or Sepolia, override the chain and parent-chain IDs. For example, Arbitrum Sepolia: + +```shell +# Arbitrum Sepolia (parent chain is Ethereum Sepolia, 11155111) +helm install arbsepolia-fullnode offchainlabs/nitro \ + --set configmap.data.parent-chain.id=11155111 \ + --set configmap.data.parent-chain.connection.url= \ + --set configmap.data.parent-chain.blob-client.beacon-url= \ + --set configmap.data.chain.id=421614 +``` + + + + +A node for your own Arbitrum chain needs its chain info, sequencer endpoint, and feed URL. Because `chain.info-json` is a large JSON string, supply these through a values file rather than `--set`: + +```yaml +# values-mychain.yaml +configmap: + data: + parent-chain: + id: + connection: + url: + chain: + id: + name: + info-json: '' + execution: + forwarding-target: + node: + feed: + input: + url: +``` + +```shell +helm install mychain-fullnode offchainlabs/nitro -f values-mychain.yaml +``` + +If the parent chain is Ethereum, also set `configmap.data.parent-chain.blob-client.beacon-url`. AnyTrust chains additionally need a Data Availability configuration (`node.da.anytrust.*`, including a `rest-aggregator`); see [Data Availability](/run-arbitrum-node/data-availability.mdx). The older `node.data-availability.*` flags still work but are deprecated in Nitro and will be removed in a future release. + + + + + + +- **Don't copy the chart README's init example verbatim.** It uses `nitro-genesis.tar`, which syncs from genesis — huge and slow on a chain with a long history such as Arbitrum One. Where a pruned snapshot exists, prefer `configmap.data.init.latest=pruned` (default base `https://snapshot.arbitrum.foundation/`). +- **The chart changes the RPC path prefix.** Defaults are `http.rpcprefix=/rpc` and `ws.rpcprefix=/ws`, so RPC is served at `http://host:8547/rpc`, not vanilla Nitro's `/`. Clients that omit `/rpc` get a 404. +- **Metrics are off by default.** Enable `configmap.data.metrics=true` and `serviceMonitor.enabled=true` to scrape them (see [Step 4](#step-4-enable-monitoring)). + + + +## Step 2: Configure memory management (optional) + +For a containerized node, set a memory limit so the chart can size the Go runtime. This is optional but recommended, and applies to every chain: + +- **`resources.limits.memory`** — set this to activate the chart's automatic `GOMEMLIMIT`. When a memory limit is present, the chart derives `GOMEMLIMIT` from it (`env.nitro.goMemLimit`, on by default — it subtracts an estimate of non-Go memory and applies a `0.9` multiplier). +- **`MALLOC_ARENA_MAX=2`** — already set by the chart by default (`env.nitro.mallocArenaMax`, `enabled: true`, `value: 2`), so you don't normally need to configure it. Adjust or disable it under `env.nitro.mallocArenaMax` if needed. +- **`node.resource-mgmt.mem-free-limit`** (optional) — an RPC memory throttle that's disabled by default. Recommended if you expose this node as a public RPC. + +To set any other environment variable, use `extraEnv`, whose entries are spliced directly into the pod's `env`: + +```yaml +extraEnv: + - name: SOME_VAR + value: 'value' +``` + +For the allocator details, the `GOMEMLIMIT` formula, and `MALLOC_ARENA_MAX`, see the [memory management deep-dive](/run-arbitrum-node/nitro/05-nitro-memory-management.mdx) and the [memory management section](/run-arbitrum-node/02-run-full-node.mdx#memory-management) of the Docker full-node guide. + +## Step 3: First sync from a snapshot + +Whether a snapshot is required depends on the chain: + +- **Arbitrum One** requires a snapshot on the first run because of its Classic-era history. Use `configmap.data.init.latest=pruned`. +- **Nova and Sepolia** have published snapshots that speed up the initial sync but aren't strictly required. +- **Your Arbitrum chain** typically syncs from genesis with no snapshot, unless you provide one via `init.url`. + +When using a snapshot, the default base is `https://snapshot.arbitrum.foundation/`. Initial sync takes a while. The init flag is ignored once a database already exists, so it's safe to leave it in place across restarts. Confirm the current snapshot type and download details in the [Nitro database snapshots](/run-arbitrum-node/nitro/03-nitro-database-snapshots.mdx) guide. + +## Step 4: Enable monitoring + +Turn on metrics and a `ServiceMonitor` so Prometheus can scrape the node: + +```shell +helm upgrade offchainlabs/nitro \ + --reuse-values \ + --set configmap.data.metrics=true \ + --set serviceMonitor.enabled=true +``` + +Once enabled, the node exposes Prometheus metrics at `http://:6070/debug/metrics/prometheus` — the metrics server binds to `0.0.0.0:6070` by default (`configmap.data.metrics-server.port`), and the `ServiceMonitor` scrapes that port and path (`serviceMonitor.path` defaults to `/debug/metrics/prometheus`). + +### Grafana dashboard + +The repository's README points to the [Releases page](https://github.com/OffchainLabs/community-helm-charts/releases) for the Grafana dashboard. If a release doesn't attach the dashboard JSON, pull the last published copy from the repository's git history and import it: + +```shell +git clone https://github.com/OffchainLabs/community-helm-charts +git -C community-helm-charts show 430a2cc~1:operations/grafana/dashboards/overview.json > overview.json +``` + +Then in Grafana, go to **Dashboards → Import** and paste `overview.json`. + + + +This exported dashboard has no `__inputs` datasource prompt, and its `Source` variable is hard-pinned to an internal Mimir UID. After importing, you **must** repoint the `Source` variable to your own Prometheus, or every panel reads "No data." Then set `nitronodejob` to this node's scrape job, and leave `sequencerjob`, `validatorjob`, and `relayjob` empty so only the full-node rows render. + + + +### Probes + +The chart wires a built-in **startup probe** by default, but **liveness and readiness probes are off** unless you set `livenessProbe` and `readinessProbe`. The startup probe's long failure window protects the initial sync (so a slow sync won't trigger a restart), but it doesn't catch a hang after the node is up. **Configure a liveness probe yourself** for ongoing hang detection. + +## Log signals to watch + +The chart sets `log-type=json`. The following INFO/WARN/ERROR strings distinguish a healthy node from a broken one on any Arbitrum chain (the sequencer hostnames in the examples below are Arbitrum One's; your chain's will differ): + +| Event | Healthy signal | Broken / warning signal | +| ---------------------------------- | ------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- | +| **New block created** | `INFO created block` with `l2Block` / `l2BlockHash`, advancing continuously while producing | No advancing `created block` line → block production stalled | +| **User tx forwarded to sequencer** | Success is not logged | `WARN error forwarding transaction trying different target`; `ERROR Failed to publish transaction to any of the forwarding targets` | +| **Sequencer feed message** | `INFO Feed connected`; `DEBUG received batch item` | `WARN failed connect to sequencer broadcast, waiting and retrying`; `ERROR Server connection timed out without receiving data` | +| **Inbox messages (parent chain)** | `INFO InboxTracker` with `sequencerBatchCount` / `messageCount` / `l1Block` advancing | `WARN error reading inbox`. Note: `backwards reorg of delayed messages` is logged at INFO level — it's normal on parent-chain reorgs, not an alert. | + +### Why a full node can't forward transactions to the sequencer + +A full node forwards user transactions to the sequencer. When forwarding fails, the cause falls into one of three buckets. Forwarding failures are **log-based, not metric-based** — the forwarder emits no metrics — so alert by grepping the log strings below. + +| Cause | Signal | +| ------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------- | +| **All forwarding targets unreachable** (egress blocked, or sequencer + all fallbacks down) | `ERROR Failed to publish transaction to any of the forwarding targets`, preceded by per-target `WARN` lines | +| **Sequencer returns a business error** (non-connection, e.g. `nonce too low`) | `WARN error forwarding transaction trying different target` with the sequencer's `err` — not escalated; the error is returned to the caller | +| **Memory 429 throttling** (rejected before forwarding) | The client receives `HTTP 429 Too many requests` and the metric `arb/rpc/limitcheck/failure` increments. There is **no per-request log line**. | + +Representative log lines (hostnames shown are Arbitrum One's): + +```text +# (a) ALL TARGETS UNREACHABLE — egress blocked, or sequencer + all fallbacks down +WARN error forwarding transaction trying different target current target=https://arb1-sequencer.arbitrum.io/rpc err="dial tcp: lookup ...: no such host" +WARN error forwarding transaction to a backup target target=https://arb1-sequencer-fallback-1.arbitrum.io/rpc pos=1 total targets=6 err="timeout exceeded" +ERROR Failed to publish transaction to any of the forwarding targets numTargets=6 + +# (b) SEQUENCER RETURNS A BUSINESS ERROR (non-connection) — logged once, NOT escalated, returned to caller +WARN error forwarding transaction trying different target current target=https://arb1-sequencer.arbitrum.io/rpc err="nonce too low: address 0x..., tx: 5 state: 7" +# (no "Failed to publish..." line follows — the error is handed straight back to the RPC client) + +# (c) MEMORY 429 THROTTLING — rejected BEFORE it reaches the forwarder; no per-request log line. +# The signals are an HTTP 429 to the client and an increment of arb/rpc/limitcheck/failure. +INFO Cgroups v2 detected, enabling memory limit RPC throttling # startup, confirms throttling is active +ERROR Error checking memory limit err=... checker=CgroupsMemoryLimitChecker # only if the cgroup read fails +``` + +## Network egress allowlist + +A full node reaches out to a fixed set of endpoints. If you run on a cloud provider, outbound traffic is often restricted by default, so you'll likely need to allow these destinations explicitly in your cloud security group or firewall rules (for example, AWS security groups, GCP firewall rules, or an egress `NetworkPolicy`). The specific hostnames depend on the chain: for DAO-governed networks they come from the chain's built-in chain info, and for your own Arbitrum chain they come from the `chain.info-json`, `execution.forwarding-target`, and feed URLs you configure. The categories are the same across chains: + +| Purpose | Where the endpoint comes from | When | +| ----------------------------- | ------------------------------------------------------------------- | ------------------------------------------- | +| **Sequencer** (tx forwarding) | chain info `sequencer-url` / `execution.forwarding-target` | Always — the node forwards user txs here | +| Sequencer fallbacks | chain info `secondary-forwarding-target` | Failover | +| **Sequencer feed** | chain info `feed-url` / `node.feed.input.url` | Always | +| Feed fallbacks / delayed | chain info `secondary-feed-url` / `node.feed.input.secondary-url` | Failover | +| Block metadata | chain info `block-metadata-url` | Only if tracking block metadata / Timeboost | +| DB snapshot | the `init` source host | Initial sync only | +| **Parent chain** | your parent-chain RPC + beacon (beacon only for an Ethereum parent) | Always | +| DA / REST endpoints | `rest-aggregator` URL list | AnyTrust chains only (e.g. Nova) | + +For **Arbitrum One**, those resolve to: + +| Purpose | Endpoint(s) | +| ------------------------- | ---------------------------------------------------------------------------------------------- | +| Sequencer (tx forwarding) | `https://arb1-sequencer.arbitrum.io/rpc` | +| Sequencer fallbacks | `https://arb1-sequencer-fallback-{1..5}.arbitrum.io/rpc` | +| Sequencer feed (primary) | `wss://arb1-feed.arbitrum.io/feed` | +| Feed fallbacks / delayed | `wss://arb1-delayed-feed.arbitrum.io/feed`, `wss://arb1-feed-fallback-{1..5}.arbitrum.io/feed` | +| Block metadata | `https://arb1.arbitrum.io/rpc` | +| DB snapshot | `https://snapshot.arbitrum.foundation/` | +| Parent chain | Your Ethereum L1 execution RPC + L1 beacon/blob endpoint | + + + +For a default Arbitrum One full node, allow `*.arbitrum.io` (443 HTTPS + WSS), `snapshot.arbitrum.foundation` (443, init only), and your own L1 RPC + beacon hosts. For other chains, allow that chain's sequencer, feed, and snapshot hosts plus your parent-chain endpoints. **Additional outbound paths exist only if you enable them:** the version alerter (off by default, queries an operator-set endpoint), the classic redirect, or DA/REST endpoints (AnyTrust chains such as Nova). + + diff --git a/sidebars.js b/sidebars.js index b52620b4b7..059ddbd4cc 100644 --- a/sidebars.js +++ b/sidebars.js @@ -644,6 +644,11 @@ const sidebars = { id: 'run-arbitrum-node/run-full-node', label: 'Run a full node', }, + { + type: 'doc', + id: 'run-arbitrum-node/run-full-node-with-helm', + label: 'Run a full node (Helm)', + }, { type: 'doc', id: 'run-arbitrum-node/run-local-full-chain-simulation',