Skip to content

Commit e145365

Browse files
authored
Merge pull request #5 from rkoliada-cl/docs/add-design-specs
Docs/add design specs
2 parents d2508bf + e0477ad commit e145365

4 files changed

Lines changed: 289 additions & 0 deletions

File tree

.gitattributes

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
# AI assistant tooling — must not ship in src.rpm or binary packages
2+
/.agents export-ignore
3+
/.claude export-ignore
4+
/.cursor export-ignore
5+
AGENTS.md export-ignore
6+
CLAUDE.md export-ignore

CLAUDE.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# node_exporter (CloudLinux fork)
2+
3+
This repository is CloudLinux's fork of the upstream
4+
[prometheus/node_exporter](https://github.com/prometheus/node_exporter). It is
5+
packaged as `cl-node-exporter` (RPM) and `cl-node-exporter` (deb) and is
6+
consumed internally by the `cl_plus` telemetry stack. Upstream `master` is
7+
merged in periodically; all CloudLinux-specific changes live on top of the
8+
upstream history.
9+
10+
## What the fork adds
11+
12+
The fork is deliberately small. Out of the box upstream, plus:
13+
14+
1. A unix-socket transport for `/metrics` (`--web.socket-path`,
15+
`--web.socket-permissions`).
16+
2. CloudLinux packaging recipes (`node_exporter.spec`, `debian/`).
17+
3. A versioned tests subpackage at `/opt/node_exporter_tests/` used by the
18+
CloudLinux QA pipeline.
19+
4. A `/usr/share/cloudlinux/cl-node-exporter` version file, read by Sentry
20+
for package-version tagging.
21+
5. A Makefile change that runs `test-e2e` twice (TCP + unix-socket) so the
22+
fork-local feature is exercised on every build.
23+
24+
Everything else in this repo — collectors, metric semantics, command-line
25+
flags, build targets — is upstream and should be understood by reading
26+
upstream documentation, not by treating this repo as authoritative.
27+
28+
## Design Specifications
29+
30+
This project maintains design specs for the features where business rules,
31+
invariants, and CloudLinux-specific decisions are not obvious from source
32+
code. Check the index below before starting work — read any spec that
33+
relates to your task. If your changes affect behavior described in a spec,
34+
update the spec in the same commit.
35+
36+
- [Unix Socket Listener](docs/design/unix-socket-listener.md)`--web.socket-path`, `--web.socket-permissions`, unix domain socket, cl_plus scraping, socket cleanup, SIGTERM shutdown, e2e `-s` flag, `node_exporter.go` main
37+
- [CloudLinux Packaging](docs/design/cloudlinux-packaging.md)`cl-node-exporter` RPM, deb, `node_exporter.spec`, `debian/rules`, `/usr/share/cloudlinux/cl_plus/`, version file, Sentry tagging, tests subpackage, pinned Go toolchain, amd64-only
38+
39+
## Working on this fork
40+
41+
- **Before changing CloudLinux-specific code** (unix socket, RPM/deb
42+
recipes, `/usr/share/cloudlinux/*` layout): read the relevant design
43+
spec first, and update it in the same commit as your code change.
44+
- **Before changing upstream-owned files** (anything under `collector/`,
45+
`node_exporter.go` outside the unix-socket block, Makefile targets not
46+
listed above): prefer forwarding the change upstream. Fork-local diffs
47+
make the next upstream sync harder.
48+
- **Upstream syncs:** history from upstream is merged periodically (see
49+
commits tagged `Sync ... with upstream`). When resolving conflicts,
50+
preserve every CloudLinux-specific invariant listed in the design
51+
specs; if upstream has reimplemented something equivalent (e.g. unix
52+
socket support), prefer deleting the fork-local copy and documenting
53+
the change.
Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
# CloudLinux Packaging — Design Specification
2+
3+
## Overview
4+
5+
This fork is shipped as the `cl-node-exporter` RPM (for CloudLinux OS 7/8/9,
6+
AlmaLinux) and `cl-node-exporter` `.deb` (for Ubuntu 20.04 / 22.04 servers
7+
running CloudLinux components). Packages are built from this repository's
8+
`node_exporter.spec` and `debian/` tree. The binary is installed into the
9+
CloudLinux-private tree (`/usr/share/cloudlinux/cl_plus/`) rather than onto
10+
`$PATH`, because it is an internal component of the `cl_plus` telemetry
11+
stack, not a general-purpose system service. This spec covers only
12+
packaging-level invariants — runtime flags are covered in other specs.
13+
14+
## Package Layout
15+
16+
### Binary package `cl-node-exporter`
17+
18+
| Path | Source | Purpose |
19+
|------|--------|---------|
20+
| `/usr/share/cloudlinux/cl_plus/node_exporter` | `node_exporter` binary, built from source during packaging | The exporter binary. Executed by the external `cl_plus` service; not intended to be invoked by operators directly. |
21+
| `/usr/share/cloudlinux/cl-node-exporter` | Generated during `%install` / `override_dh_auto_install` | Plain-text file containing `<version>-<release>`. Consumed by Sentry for package-version tagging of crash reports. |
22+
23+
The package deliberately omits: a systemd unit, a default config file, a
24+
`/usr/bin/` symlink, any `sysusers.d` entry, and any firewall or SELinux
25+
policy. All lifecycle and configuration concerns are owned by the consumer
26+
package (`cl_plus`).
27+
28+
### Tests subpackage `cl-node-exporter-tests`
29+
30+
| Path | Purpose |
31+
|------|---------|
32+
| `/opt/node_exporter_tests/node_exporter` | Second copy of the built binary, used by the e2e harness. |
33+
| `/opt/node_exporter_tests/end-to-end-test.sh` | E2E harness script. |
34+
| `/opt/node_exporter_tests/collector/` | Fixture data (procfs/sysfs/udev snapshots). Broken symlinks under `fixtures/` are stripped during `%install` because dh on Ubuntu rejects them. |
35+
| `/opt/node_exporter_tests/tools/tools` | Build-tag matcher helper used by the e2e script. |
36+
37+
This subpackage exists so the QA pipeline can run the upstream e2e suite on
38+
the exact binary that ships, including the CloudLinux unix-socket mode (see
39+
`unix-socket-listener.md`).
40+
41+
## Build Mechanism
42+
43+
Both packages download and use a pinned upstream Go toolchain at build time
44+
rather than relying on the distro's `golang` package:
45+
46+
- **Pinned version: `go1.24.0`.** Hard-coded in both `node_exporter.spec`
47+
(`%build` section) and `debian/rules` (`override_dh_auto_build`).
48+
- **Source:** `https://dl.google.com/go/go1.24.0.linux-<arch>.tar.gz`.
49+
- **Location:** extracted to `%{_tmppath}/go` (RPM) or `/tmp/go` (deb).
50+
- The pinned toolchain is prepended to `PATH` for the duration of the build.
51+
52+
RPM spec also runs 32-bit cross-testing (`make test-32bit`) on x86_64/amd64
53+
builds. The deb rules do not.
54+
55+
### RPM-only conventions (`node_exporter.spec`)
56+
57+
- `Autoreq: 0` and `%define debug_package %{nil}` — auto-dependency scanning
58+
and debuginfo generation are disabled because the binary is a statically
59+
linked Go artifact.
60+
- Version file path is derived from macros: `%{cl_dir}%{name}` resolves to
61+
`/usr/share/cloudlinux/cl-node-exporter`. The file's content is
62+
`%{version}-%{release}` as a single line.
63+
64+
### Debian-only conventions (`debian/rules`)
65+
66+
- After install, `find $buildroot/opt/node_exporter_tests/collector/fixtures
67+
-xtype l -delete` removes broken symlinks produced by the procfs fixture
68+
ttar archive. Without this, `dh_*` fails the build on Ubuntu.
69+
- `override_dh_auto_clean` only removes `debian/tmp` — it does not invoke
70+
`make clean`, so the vendored Go toolchain in `/tmp/go` may persist
71+
between builds on a long-lived worker.
72+
- Release string is hard-coded as `.ubuntu.cloudlinux` (parsed from the
73+
`debian/changelog` version by `dpkg-parsechangelog`).
74+
75+
## Invariants
76+
77+
- **Install path is stable.** `/usr/share/cloudlinux/cl_plus/node_exporter`
78+
is a contract with the consumer package. Moving the binary requires a
79+
coordinated change in `cl_plus`.
80+
- **Version file is stable.** `/usr/share/cloudlinux/cl-node-exporter`
81+
contains exactly `<rpm-or-deb-version>-<release>` and is consumed by
82+
Sentry tagging. Format change requires coordinating with the reporter.
83+
- **Go toolchain is pinned in the recipe, not the CI image.** The pinned
84+
version lives in `node_exporter.spec` and `debian/rules`. Bumping Go
85+
means editing both files in the same commit.
86+
- **The binary package does not own any runtime config, user, or unit.**
87+
All CloudLinux-specific runtime wiring (socket path, user, scraping
88+
group, startup ordering) is owned by the consumer.
89+
- **Tests subpackage is optional.** The binary package must function
90+
without `cl-node-exporter-tests` installed; the test subpackage is a
91+
QA-only artifact.
92+
- **Both architectures are amd64-only today.** Both `node_exporter.spec`
93+
(via the `%ifarch` x86_64/amd64/ia32e branches being the only curl'd Go
94+
archives) and `debian/control` (`Architecture: amd64`) restrict the
95+
package to x86_64. Adding another arch requires touching both recipes.
96+
97+
## Test Coverage
98+
99+
| Aspect | Test | Type | Covers |
100+
|--------|------|------|--------|
101+
| Binary builds and e2e passes on RPM build workers | `%build` section of `node_exporter.spec` runs `make build`, `make test`, `make test-32bit` | RPM build-time | Compilation + unit tests + 32-bit cross-compile + e2e socket/TCP tests (`make test-e2e`) on RPM workers. Failure aborts the build. |
102+
| Binary builds on Ubuntu build workers | `override_dh_auto_build` in `debian/rules` runs `make build`, `make tools`, `make test` | deb build-time | Compilation + unit tests on Ubuntu. (No `test-e2e` is wired in deb.) |
103+
| Fixture ttar archive is extractable | `make test-e2e` depends on `collector/fixtures/sys/.unpacked` and `collector/fixtures/udev/.unpacked` | Build | If the ttar archives are corrupt or missing, the build fails at extraction time. |
104+
105+
### Known gaps
106+
107+
- **No packaging-smoke test.** Nothing verifies post-install that
108+
`/usr/share/cloudlinux/cl_plus/node_exporter --version` returns the
109+
expected version string, or that the version file content matches the
110+
package version. A trivial `%posttrans` or `debian/postinst` smoke check
111+
would close this.
112+
- **Version-file format is not asserted.** If a future change to the spec
113+
accidentally drops the newline, quotes the string, or appends the
114+
architecture, Sentry tagging will silently degrade.
115+
- **Tests subpackage is not smoke-tested after install.** No CI job
116+
installs `cl-node-exporter-tests` on a fresh VM and runs
117+
`/opt/node_exporter_tests/end-to-end-test.sh` against the shipped
118+
binary.
119+
- **No coverage for non-amd64 targets.** Non-x86_64 arches are not built
120+
and therefore not exercised at all for the RPM or deb paths, even
121+
though upstream supports them.
122+
- **Deb does not run e2e.** `override_dh_auto_build` intentionally skips
123+
`make test-e2e`, so the unix-socket listener is not exercised on Ubuntu
124+
build workers.
Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
# Unix Socket Listener — Design Specification
2+
3+
## Overview
4+
5+
This CloudLinux fork adds the ability to expose the `/metrics` endpoint over a
6+
filesystem unix domain socket instead of a TCP port. The feature exists so that
7+
other CloudLinux end-server tooling (the primary consumer being `cl_plus`) can
8+
scrape `node_exporter` locally without opening a network port or relying on
9+
HTTP authentication/TLS. Access control is delegated to filesystem permissions
10+
on the socket file.
11+
12+
This feature is CloudLinux-specific — it does not exist in upstream
13+
`prometheus/node_exporter`.
14+
15+
## Flags
16+
17+
| Flag | Default | Behavior |
18+
|------|---------|----------|
19+
| `--web.socket-path` | `""` (empty — disabled) | Filesystem path of the unix socket to listen on. When non-empty, disables the upstream TCP/TLS listener entirely. |
20+
| `--web.socket-permissions` | `0640` | `chmod` bits applied to the socket file after it is created. Accepts an integer (octal literal recognised by Go's `Int32` parser). |
21+
22+
Flags are parsed by `kingpin` and defined in `node_exporter.go`. Both flags
23+
ship in the fork's main package and are always visible in `--help`, regardless
24+
of OS. Upstream flags (`--web.listen-address`, `--web.config.file`,
25+
`--web.systemd-socket`) are still present but are mutually exclusive with
26+
`--web.socket-path` at runtime (see Invariants below).
27+
28+
## Mechanism
29+
30+
When `--web.socket-path` is non-empty, the exporter:
31+
32+
1. Calls `os.Remove` on the socket path before binding. Any pre-existing file
33+
(stale socket from a previous run, regular file, symlink) is removed
34+
unconditionally.
35+
2. Binds a `net.Listen("unix", path)` listener.
36+
3. `chmod`s the newly created socket to `--web.socket-permissions`. If the
37+
chmod fails, the socket file is removed and the process exits non-zero.
38+
4. Serves HTTP over the unix listener in a goroutine.
39+
5. Installs a `SIGINT` / `SIGTERM` handler. On signal the server is closed and
40+
the socket file is `os.Remove`d before exit (exit code 0).
41+
6. Registers a `defer os.Remove` on the socket path as a secondary cleanup in
42+
case the signal handler path is bypassed.
43+
44+
When `--web.socket-path` is empty (default), the exporter falls through to the
45+
upstream `web.ListenAndServe(...)` path using `toolkitFlags` (TCP + optional
46+
TLS). The unix-socket branch and the TCP branch are mutually exclusive in the
47+
same process.
48+
49+
## Invariants
50+
51+
- **Exclusive listener.** When `--web.socket-path` is non-empty, no TCP
52+
listener is opened. `--web.listen-address`, TLS config, and systemd socket
53+
activation are ignored for that run.
54+
- **Socket is always removed on startup.** The exporter unconditionally
55+
`os.Remove`s the path before binding. Operators must not point
56+
`--web.socket-path` at a non-socket file they care about.
57+
- **Socket is always removed on clean shutdown.** On `SIGINT`/`SIGTERM`, or
58+
on any error path after successful bind, the socket file must not be left
59+
behind. The e2e test `end-to-end-test.sh -s` asserts this explicitly and
60+
fails the build if the socket file is still present after shutdown.
61+
- **Permissions are applied before first accept.** The chmod step happens
62+
synchronously before the `Serve` goroutine is started, so no client can
63+
connect to an over-permissive socket.
64+
- **Permissions failure is fatal.** If chmod fails, the socket file is
65+
removed and the exporter exits non-zero rather than serving with
66+
unintended permissions.
67+
- **Default `0640` is intentional.** It allows the exporter process (owner)
68+
to write and a scraping group (e.g., the `cl_plus` group) to read, while
69+
denying world access. Operators overriding this value take responsibility
70+
for access control.
71+
72+
## Packaging Integration
73+
74+
The `cl-node-exporter` RPM and deb packages install the binary at
75+
`/usr/share/cloudlinux/cl_plus/node_exporter`. They do **not** ship a
76+
systemd unit or a default socket path — the invoking CloudLinux service
77+
(external to this repo) is responsible for choosing the socket path, owning
78+
its parent directory, and setting the scraping group.
79+
80+
## Test Coverage
81+
82+
| Aspect | Test | Type | Covers |
83+
|--------|------|------|--------|
84+
| Metrics over unix socket match metrics over TCP | `end-to-end-test.sh -s` (invoked by `make test-e2e`) | E2E | Full `/metrics` exposition via `curl --unix-socket` must diff-equal the fixture produced via TCP. |
85+
| Socket file is removed on clean shutdown | `end-to-end-test.sh` finish trap (socket mode) | E2E | After SIGTERM, `ls` on the socket path must fail; test exits non-zero otherwise. |
86+
| Both transports still work after refactors | `Makefile` `test-e2e` target | E2E | Runs the e2e suite twice — once with TCP (`--web.listen-address`) and once with `--web.socket-path`. |
87+
88+
### Known gaps
89+
90+
- **Permission mode semantics are not tested.** No automated test verifies
91+
that `--web.socket-permissions` actually produces the requested mode on
92+
disk, nor that a non-default value (e.g., `0600`, `0660`) is honoured.
93+
- **Concurrent-start / stale-socket scenarios are not tested.** The e2e
94+
suite does not cover the case where a previous process crashed leaving a
95+
socket file behind, nor the case where two exporters race on the same
96+
path.
97+
- **Chmod-failure path is not tested.** Exit behaviour when `chmod` fails
98+
(e.g., socket path on a filesystem that rejects mode changes) is not
99+
exercised.
100+
- **Signal-handling coverage is shallow.** Only the graceful
101+
`SIGINT`/`SIGTERM` path is exercised; `SIGKILL` or panic paths (which
102+
leak the socket file by design) are not asserted anywhere.
103+
- **No assertion that TCP flags are ignored in socket mode.** A user
104+
passing both `--web.listen-address` and `--web.socket-path` gets
105+
socket-only behaviour silently; this is not documented in `--help` or
106+
checked at flag-parse time.

0 commit comments

Comments
 (0)