Skip to content

Commit 9272100

Browse files
author
Ignacio Van Droogenbroeck
committed
docs(operations): document pprof gate + opt-in profiling workflow (v26.06.1+)
Adds operations/profiling.md to both the OSS (docs/) and Enterprise (docs-arc-enterprise/) trees. Covers: - Why pprof is off by default (pre-26.06.1 unauthenticated /debug/pprof/* on the public API port — GHSA-j93g-rp6m-j32m). - The three env vars: ARC_DEBUG_PPROF, ARC_DEBUG_PPROF_ADDR, ARC_DEBUG_PPROF_ALLOW_NON_LOOPBACK (two-step opt-in for non-loopback exposure). - Single-node, docker-compose, and Kubernetes enable recipes (with SSH and kubectl port-forward workflows). - go tool pprof workflows for heap, CPU, goroutines, trace. - Operational notes: startup logging, shutdown force-close (so a long capture can't hold the cluster's shutdown budget), port-conflict handling. - 5-item pre-flight security checklist for production use. Two :::danger admonitions: (1) production exposure is hostile by default; (2) firewall is mandatory in cross-host mode. One :::info naming the v26.06.1 release that ships the gate. Pages render cleanly under both variants: https://docs.basekick.net/arc/operations/profiling/ https://docs.basekick.net/arc-enterprise/operations/profiling/ Closes a documentation gap from the 2026-05-19 audit fixes — the gate landed in Arc PR #443 but operators had no docs explaining how to use it.
1 parent 85b13e7 commit 9272100

2 files changed

Lines changed: 486 additions & 0 deletions

File tree

Lines changed: 243 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,243 @@
1+
---
2+
sidebar_position: 3
3+
---
4+
5+
# Profiling with pprof
6+
7+
Arc exposes Go's built-in `net/http/pprof` profiler — heap, goroutine, CPU, allocations, blocking, mutex, and execution-trace endpoints — for diagnosing memory pressure, hot CPU paths, goroutine leaks, and deadlocks in production. The endpoints are **opt-in** and bound to `localhost` by default; exposing them anywhere else requires a deliberate two-step configuration.
8+
9+
:::info Available since v26.06.1
10+
The opt-in pprof listener ships in Arc v26.06.1 ([PR #443](https://github.com/Basekick-Labs/arc/pull/443), [GHSA-j93g-rp6m-j32m](https://github.com/Basekick-Labs/arc/security/advisories/GHSA-j93g-rp6m-j32m)). Prior versions registered pprof on the public API port without authentication — upgrade and adopt the env-var gate below.
11+
:::
12+
13+
:::danger Production exposure is hostile by default
14+
A reachable `/debug/pprof/*` endpoint leaks process internals: in-flight SQL strings and msgpack records (via heap dumps), goroutine stacks, environment variables on some Go versions, and lets any caller pin a CPU core for arbitrary seconds via `/debug/pprof/profile?seconds=N`. Treat the pprof listener like a root shell — bind to loopback, restrict by firewall, and turn it off when you're done debugging.
15+
:::
16+
17+
## Why pprof Is Off by Default
18+
19+
Pre-v26.06.1, `/debug/pprof/*` was mounted on Arc's public Fiber app — no token, no allowlist. An unauthenticated network caller could fetch heap dumps containing recent query text and ingested records. The hardening PR removed pprof from the public app entirely and moved it to a separate listener that only starts when the `ARC_DEBUG_PPROF` env var is set.
20+
21+
The new design has three properties:
22+
23+
1. **Off by default**`ARC_DEBUG_PPROF` unset means no socket is opened, no goroutine is spawned, the endpoints don't exist on Arc's process.
24+
2. **Loopback-bound by default** — even with `ARC_DEBUG_PPROF=1`, the listener binds to `127.0.0.1:6060` unless you explicitly override.
25+
3. **Two-step opt-in for non-loopback** — binding to any non-loopback address (`0.0.0.0:6060`, a public IP, etc.) requires both `ARC_DEBUG_PPROF_ADDR` AND `ARC_DEBUG_PPROF_ALLOW_NON_LOOPBACK=1`, so a typo in the bind address can't accidentally expose the endpoint cross-host.
26+
27+
## Configuration
28+
29+
All configuration is via environment variables — pprof is a debugging surface, not a runtime feature, so there's no `[debug]` block in `arc.toml`.
30+
31+
| Variable | Default | Description |
32+
|---|---|---|
33+
| `ARC_DEBUG_PPROF` | unset (off) | Set to `1`, `true`, `yes`, or `on` to enable the pprof listener. Any other value (including unset) leaves it off. |
34+
| `ARC_DEBUG_PPROF_ADDR` | `127.0.0.1:6060` | Bind address for the pprof listener. Accepts any form `net.Listen("tcp", …)` accepts — `127.0.0.1:6060`, `localhost:6060`, `[::1]:6060`, `0.0.0.0:6060`, etc. |
35+
| `ARC_DEBUG_PPROF_ALLOW_NON_LOOPBACK` | unset (off) | Required when `ARC_DEBUG_PPROF_ADDR` is non-loopback. Set to `1`/`true`/`yes`/`on`. Without it, Arc logs an error and refuses to start the pprof listener. |
36+
37+
## Enabling pprof on a Single Node
38+
39+
The common case — investigate a single production node from the same host via SSH and a local port-forward:
40+
41+
```bash
42+
# On the node you want to profile:
43+
ARC_DEBUG_PPROF=1 ./arc
44+
```
45+
46+
Arc emits a startup warning:
47+
48+
```
49+
WARN ARC_DEBUG_PPROF is set — pprof endpoints are exposed on this address.
50+
Restrict access via firewall or unset ARC_DEBUG_PPROF in production.
51+
addr=127.0.0.1:6060
52+
```
53+
54+
From your laptop, SSH-tunnel the port:
55+
56+
```bash
57+
ssh -L 6060:127.0.0.1:6060 user@node
58+
```
59+
60+
Then point `go tool pprof` at `localhost:6060` on your laptop. See [Profiling Workflows](#profiling-workflows) below.
61+
62+
### With docker-compose
63+
64+
```yaml
65+
services:
66+
arc-writer:
67+
image: basekick/arc:latest
68+
environment:
69+
ARC_DEBUG_PPROF: "1"
70+
# No host port mapping for 6060 — the listener stays inside the container.
71+
# Use `docker exec` or a sidecar to reach it.
72+
```
73+
74+
To reach the in-container listener:
75+
76+
```bash
77+
docker exec -it arc-writer wget -qO heap.pprof http://127.0.0.1:6060/debug/pprof/heap
78+
docker cp arc-writer:/heap.pprof ./
79+
go tool pprof -http=:8080 heap.pprof
80+
```
81+
82+
### With Kubernetes
83+
84+
```yaml
85+
env:
86+
- name: ARC_DEBUG_PPROF
87+
value: "1"
88+
```
89+
90+
Then port-forward:
91+
92+
```bash
93+
kubectl port-forward arc-writer-0 6060:6060
94+
```
95+
96+
`kubectl port-forward` only listens on the local machine, so the pprof endpoint stays loopback-bound on the Arc pod AND on your laptop simultaneously. No cluster-network exposure.
97+
98+
## Exposing pprof Cross-Host (Discouraged)
99+
100+
There are cases where loopback isn't enough — for example, a remote profiler that can't open an SSH tunnel, or a multi-tenant box where the operator workstation isn't on the Arc host. Arc supports this with a deliberate two-step opt-in:
101+
102+
```bash
103+
ARC_DEBUG_PPROF=1 \
104+
ARC_DEBUG_PPROF_ADDR=0.0.0.0:6060 \
105+
ARC_DEBUG_PPROF_ALLOW_NON_LOOPBACK=1 \
106+
./arc
107+
```
108+
109+
Without `ARC_DEBUG_PPROF_ALLOW_NON_LOOPBACK=1`, Arc logs an **error** and refuses to start the pprof listener — the rest of Arc continues to run normally, but pprof stays off:
110+
111+
```
112+
ERROR ARC_DEBUG_PPROF=1 with a non-loopback ARC_DEBUG_PPROF_ADDR requires
113+
ARC_DEBUG_PPROF_ALLOW_NON_LOOPBACK=1; refusing to start pprof listener
114+
addr=0.0.0.0:6060
115+
```
116+
117+
When the second opt-in IS set and Arc binds to a non-loopback address, the startup log line is escalated to **error** level (instead of warn) so default alerting policies notice the cross-host exposure on this node:
118+
119+
```
120+
ERROR ARC_DEBUG_PPROF is set — pprof endpoints are exposed on this address.
121+
Restrict access via firewall or unset ARC_DEBUG_PPROF in production.
122+
addr=0.0.0.0:6060
123+
```
124+
125+
:::danger Firewall is mandatory in this mode
126+
The pprof listener has no authentication. Anyone who can reach `0.0.0.0:6060` (or whatever address you bound) can fetch heap dumps containing recent query text and ingested records, dump goroutine stacks, and pin CPU cores. Restrict by network ACL, security group, or iptables before turning this on. Unset all three env vars the moment you're done.
127+
:::
128+
129+
## Profiling Workflows
130+
131+
Once the listener is reachable at `http://localhost:6060` (whether direct or via SSH/kubectl port-forward), `go tool pprof` does the rest. The recipes below assume Go 1.20+.
132+
133+
### Heap (Memory)
134+
135+
The most common case — Arc's RSS is high and you want to know what's holding it.
136+
137+
```bash
138+
# Live snapshot:
139+
go tool pprof -http=:8080 http://localhost:6060/debug/pprof/heap
140+
141+
# Save for later analysis:
142+
curl -o heap.pprof http://localhost:6060/debug/pprof/heap
143+
go tool pprof -http=:8080 heap.pprof
144+
```
145+
146+
The `-http=:8080` flag launches the interactive web UI at `http://localhost:8080` — flame graph, top callers, source view. Without it you get the CLI prompt.
147+
148+
Common starting commands at the pprof CLI prompt:
149+
150+
```
151+
(pprof) top20 # 20 largest in-use allocations by bytes
152+
(pprof) top20 -cum # 20 largest by cumulative (function + callees)
153+
(pprof) list <func> # source-level breakdown of one function
154+
```
155+
156+
### CPU Profile
157+
158+
Capture 30 seconds of CPU activity:
159+
160+
```bash
161+
go tool pprof -http=:8080 'http://localhost:6060/debug/pprof/profile?seconds=30'
162+
```
163+
164+
The `seconds` parameter is configurable — 30s is a reasonable default. **Don't go above ~300s** unless you know what you're doing: each in-flight capture holds a connection open and consumes scheduler overhead. Arc's pprof listener has a 10-minute write timeout as the hard ceiling.
165+
166+
### Goroutines
167+
168+
Diagnose a goroutine leak or deadlock:
169+
170+
```bash
171+
# Summary (top goroutine call sites + counts):
172+
curl -s 'http://localhost:6060/debug/pprof/goroutine?debug=1' | head -50
173+
174+
# Full stacks for every goroutine (text):
175+
curl -s 'http://localhost:6060/debug/pprof/goroutine?debug=2' > goroutines.txt
176+
177+
# Or via pprof for the UI:
178+
go tool pprof -http=:8080 http://localhost:6060/debug/pprof/goroutine
179+
```
180+
181+
A healthy idle Arc writer typically has ~50–200 goroutines (Fiber workers, WAL writer, ingest shards, compaction scheduler, Raft loops). Thousands of goroutines stuck on the same `chan receive` or `sync.Mutex.Lock` is the diagnostic signature of a stall.
182+
183+
### Execution Trace
184+
185+
Captures every scheduler event for `N` seconds — useful for diagnosing latency spikes:
186+
187+
```bash
188+
curl -o trace.out 'http://localhost:6060/debug/pprof/trace?seconds=5'
189+
go tool trace -http=:8080 trace.out
190+
```
191+
192+
The trace UI shows per-goroutine timelines, GC pauses, and network/syscall waits. Use sparingly — even 5 seconds of trace produces ~10–50 MB of data on a busy writer.
193+
194+
### Block & Mutex Profiles
195+
196+
By default these profiles are zero-rate (Go runtime samples nothing). To enable, you'd need to call `runtime.SetBlockProfileRate` / `runtime.SetMutexProfileFraction` from inside Arc — currently not exposed via env var. If you need block/mutex profiles, open an issue describing the problem you're chasing and we'll add the knobs.
197+
198+
## Operational Notes
199+
200+
### Startup Logging
201+
202+
When `ARC_DEBUG_PPROF` is unset, Arc emits nothing at startup about pprof. The listener is genuinely absent — no port, no handlers, no log noise.
203+
204+
When set, a single warn-level (loopback) or error-level (non-loopback) line names the bind address and reminds you to restrict access. Grep for `ARC_DEBUG_PPROF is set` in your logs to find nodes that left it on accidentally.
205+
206+
### Shutdown Behavior
207+
208+
Arc registers pprof with the same shutdown priority as the main HTTP server. On `SIGTERM` / `SIGINT`, the pprof listener closes **immediately** — in-flight captures (especially long `/debug/pprof/profile?seconds=N` requests) are aborted. This is deliberate: a long pprof capture would otherwise hold the cluster's shared shutdown budget and risk skipping downstream hooks (WAL flush, storage close, auth close), which is a data-loss path on what the operator expected to be a graceful exit.
209+
210+
If your capture was killed by shutdown, just re-run it after Arc restarts.
211+
212+
### Port Conflicts
213+
214+
If the configured bind address is already in use, Arc logs an **error** and continues without the pprof listener — Arc itself doesn't fail to start. Look for:
215+
216+
```
217+
ERROR ARC_DEBUG_PPROF=1 but failed to bind pprof listener; continuing without pprof
218+
addr=127.0.0.1:6060 error="listen tcp 127.0.0.1:6060: bind: address already in use"
219+
```
220+
221+
Common causes:
222+
- A previous Arc process didn't release the port (`lsof -nP -iTCP:6060`).
223+
- Another Go service on the host already runs pprof on `:6060` (the Go-runtime convention).
224+
- A non-Arc service grabbed the port.
225+
226+
Resolve the conflict and restart Arc, or set `ARC_DEBUG_PPROF_ADDR` to a different port.
227+
228+
## Security Checklist
229+
230+
Before enabling pprof on a production node:
231+
232+
- [ ] `ARC_DEBUG_PPROF_ADDR` is loopback (default) **or** the host is firewalled to allow only your jumphost / operator workstation.
233+
- [ ] If non-loopback, `ARC_DEBUG_PPROF_ALLOW_NON_LOOPBACK=1` is set deliberately (not by env-var inheritance from a parent process).
234+
- [ ] You have a plan to unset `ARC_DEBUG_PPROF` when the investigation is done — pprof should not be left on indefinitely.
235+
- [ ] On Kubernetes / docker-compose, the pprof port is **not** in the service's port list or compose `ports:` block — only reachable via `kubectl port-forward` or `docker exec`.
236+
- [ ] Heap dumps you save (`heap.pprof`, `goroutines.txt`, `trace.out`) are treated as sensitive: they contain in-flight query text and ingested records. Don't paste them into public issues; share via your team's secure channel.
237+
238+
## Reference
239+
240+
- Source: [`cmd/arc/debug_pprof.go`](https://github.com/Basekick-Labs/arc/blob/main/cmd/arc/debug_pprof.go) — the listener and the two-step gate.
241+
- PR that introduced the gate: [#443](https://github.com/Basekick-Labs/arc/pull/443).
242+
- Advisory: [GHSA-j93g-rp6m-j32m](https://github.com/Basekick-Labs/arc/security/advisories/GHSA-j93g-rp6m-j32m).
243+
- Upstream Go docs: [`net/http/pprof`](https://pkg.go.dev/net/http/pprof) and [`runtime/pprof`](https://pkg.go.dev/runtime/pprof).

0 commit comments

Comments
 (0)