Skip to content

Commit 9b5cde8

Browse files
committed
Make Rust test run dirs umask-independent
The Rust POSIX test suites created their /tmp run directories with bare create_dir_all, inheriting the process umask. With umask 0002 the directories land at mode 0775, the stale-unlink safety guard correctly refuses to unlink inside group-writable directories, and six stale-recovery tests fail (UDS and SHM). Pin the run directories to mode 0700 after creation, matching the existing C mkdir(dir, 0700) and Go MkdirAll(dir, 0700) test conventions. Test-only change; guard semantics are untouched. Full suite passes under both umask 0002 and 0022. Tracked as SOW-0017.
1 parent 0d0bdce commit 9b5cde8

5 files changed

Lines changed: 232 additions & 0 deletions

File tree

Lines changed: 216 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,216 @@
1+
# SOW-0017 - Rust POSIX tests: umask-independent run directories
2+
3+
## Status
4+
5+
Status: completed
6+
7+
Sub-state: fix implemented, validated under umask 0002 and 0022, committed with this SOW.
8+
9+
## Requirements
10+
11+
### Purpose
12+
13+
Rust netipc tests must pass regardless of the developer/CI process umask, without weakening the stale-unlink safety guard.
14+
15+
### User Request
16+
17+
User shared an external discovery note (`netipc.md`) reporting that `transport::shm::tests::test_stale_recovery` and related tests fail with `umask 0002`, then asked: "fix it and commit please". The fix direction chosen in discussion was the surgical test-setup change (chmod 0700 on the test run dir), not weakening the guard.
18+
19+
### Assistant Understanding
20+
21+
Facts:
22+
23+
- `run_dir_allows_stale_unlink` (`src/crates/netipc/src/transport/shm.rs:976`) refuses stale unlinks when the run dir is group/other-writable (`mode & 0o022 != 0`). The same guard exists for UDS (`src/crates/netipc/src/transport/posix.rs`), in C (`src/libnetdata/netipc/src/transport/posix/`), and in Go (`src/go/pkg/netipc/transport/posix/`).
24+
- Four Rust test files create their run dirs with bare `std::fs::create_dir_all`, which inherits the process umask: `transport/shm_tests.rs`, `transport/posix_tests.rs`, `service/raw_unix_tests.rs`, `service/cgroups_unix_tests.rs`.
25+
- Under `umask 0002` the dirs land at mode `0775`, the guard refuses to unlink stale files, and 6 tests fail (full list in Analysis).
26+
- C tests use `mkdir(TEST_RUN_DIR, 0700)` and Go tests use `os.MkdirAll(dir, 0700)`; `0700` is unaffected by any umask, so C and Go are not exposed.
27+
- The external note also claimed the guard is missing from this upstream repo; that claim was verified false — it compared against a stale clone at `48c74fe` (2026-04-06), 105 commits behind. The guard landed in `8a23810` (2026-06-03).
28+
29+
Inferences:
30+
31+
- Any CI runner or developer machine with a permissive umask (0002 is the default on several distros) silently fails these tests; the guard behavior is correct, the test setup is environmentally fragile.
32+
33+
Unknowns:
34+
35+
- None blocking.
36+
37+
### Acceptance Criteria
38+
39+
- `cargo test` in `src/crates/netipc` passes with `umask 0002` (previously 6 failures).
40+
- `cargo test` in `src/crates/netipc` passes with `umask 0022` (no regression).
41+
- No production code changed; the safety guard semantics are untouched.
42+
43+
## Analysis
44+
45+
Sources checked:
46+
47+
- `src/crates/netipc/src/transport/shm.rs:972-1062` (guard and call sites)
48+
- `src/crates/netipc/src/transport/shm_tests.rs:13-15`, `posix_tests.rs:11-13`, `service/raw_unix_tests.rs:22-24`, `service/cgroups_unix_tests.rs:16-18` (run-dir setup)
49+
- C tests `tests/fixtures/c/*.c` and Go tests `src/go/pkg/netipc/**/*_test.go` (same-failure scan)
50+
- External discovery note `netipc.md` (user-provided, claims cross-checked against git history)
51+
52+
Current state (pre-fix evidence, `umask 0002`, full `cargo test` in `src/crates/netipc`):
53+
54+
- FAILED: `transport::posix::tests::test_stale_recovery`
55+
- FAILED: `transport::shm::tests::test_cleanup_stale_invalid_entries`
56+
- FAILED: `transport::shm::tests::test_cleanup_stale_unlinks_dangling_symlink`
57+
- FAILED: `transport::shm::tests::test_cleanup_stale_unlinks_directory_symlink_when_mmap_fails`
58+
- FAILED: `transport::shm::tests::test_server_create_recovers_invalid_stale_file`
59+
- FAILED: `transport::shm::tests::test_stale_recovery`
60+
- Result: 327 passed; 6 failed.
61+
62+
Risks:
63+
64+
- None significant: change is test-only, 4 files, identical 1-line addition each.
65+
66+
## Pre-Implementation Gate
67+
68+
Status: ready
69+
70+
Problem / root-cause model:
71+
72+
- Test run dirs under `/tmp` inherit the process umask; with `umask 0002` they are group-writable (`0775`), and the stale-unlink safety guard correctly refuses to unlink inside group/other-writable directories. Stale files survive, subsequent `O_EXCL` creates fail with `EEXIST`/`AddrInUse`, and stale-recovery assertions fail.
73+
74+
Evidence reviewed:
75+
76+
- See Analysis. Reproduced locally: 6 failures under `umask 0002`, all pass under `umask 0022`.
77+
78+
Affected contracts and surfaces:
79+
80+
- Rust test setup only. No public API, wire format, transport behavior, or production code.
81+
82+
Existing patterns to reuse:
83+
84+
- C tests: `mkdir(TEST_RUN_DIR, 0700)` (`tests/fixtures/c/test_hardening.c:47`).
85+
- Go tests: `os.MkdirAll(dir, 0700)` (`src/go/pkg/netipc/transport/posix/uds_test.go:36` and others).
86+
- Rust mirrors this by `std::fs::set_permissions(.., from_mode(0o700))` after `create_dir_all`, which also repairs a pre-existing dir left with a permissive mode by an earlier run.
87+
88+
Risk and blast radius:
89+
90+
- Test-only. Worst case: a test environment where `set_permissions` fails — error is ignored like the existing `create_dir_all` error, and the affected test fails exactly as before.
91+
92+
Sensitive data handling plan:
93+
94+
- No secrets, customer data, or personal data involved. SOW cites only repo paths, public commit hashes, and test names.
95+
96+
Implementation plan:
97+
98+
1. Add `set_permissions(0o700)` to `ensure_run_dir()` in the four Rust POSIX test files.
99+
100+
Validation plan:
101+
102+
- Full `cargo test` in `src/crates/netipc` under `umask 0002` and `umask 0022`, with `/tmp/nipc_*` test dirs removed first to exercise fresh creation.
103+
104+
Artifact impact plan:
105+
106+
- AGENTS.md: unaffected — no workflow/process change.
107+
- Runtime project skills: none exist; nothing to update.
108+
- Specs: unaffected — no protocol/API/transport behavior change; guard semantics untouched.
109+
- End-user/operator docs: unaffected — test-internal change; `docs/` documents the product, not test scaffolding.
110+
- End-user/operator skills: unaffected — `docs/netipc-integrator-skill.md` covers integration workflow, not test internals.
111+
- SOW lifecycle: single SOW, completed and committed with the change.
112+
113+
Open-source reference evidence:
114+
115+
- None checked; the failure is local and fully reproducible, no external reference needed.
116+
117+
Open decisions:
118+
119+
- Fix approach (chmod 0700 vs per-test tempdir) was presented to the user before this SOW; user accepted the recommended surgical chmod-0700 variant by requesting the fix.
120+
121+
## Implications And Decisions
122+
123+
1. Fix approach: (A) `chmod 0700` the existing shared run dirs — zero new dependencies, mirrors C/Go; (B) per-test `tempfile::tempdir()` — stronger isolation, new dev-dependency, larger diff. Selected: A (surgical), per user request following the assistant's recommendation.
124+
125+
## Plan
126+
127+
1. Patch `ensure_run_dir()` in the four Rust POSIX test files (low risk, no dependencies).
128+
2. Validate under both umasks; complete and commit.
129+
130+
## Execution Log
131+
132+
### 2026-06-10
133+
134+
- Reproduced 6 failures under `umask 0002` on clean `/tmp` state.
135+
- Patched `ensure_run_dir()` in `src/crates/netipc/src/transport/shm_tests.rs`, `src/crates/netipc/src/transport/posix_tests.rs`, `src/crates/netipc/src/service/raw_unix_tests.rs`, `src/crates/netipc/src/service/cgroups_unix_tests.rs`.
136+
- Verified full Rust suite green under `umask 0002` and `umask 0022`.
137+
138+
## Validation
139+
140+
Acceptance criteria evidence:
141+
142+
- `umask 0002`: `cargo test` in `src/crates/netipc` — 333 passed; 0 failed (previously 327 passed; 6 failed).
143+
- `umask 0022`: `cargo test` in `src/crates/netipc` — 333 passed; 0 failed.
144+
- `git diff --stat` shows only the four test files changed; no production code touched.
145+
146+
Tests or equivalent validation:
147+
148+
- Full `cargo test` run twice (umask 0002 and 0022), `/tmp/nipc_*` dirs removed before each run.
149+
150+
Real-use evidence:
151+
152+
- The tests are the runnable path; both runs above are real executions on this workstation.
153+
154+
Reviewer findings:
155+
156+
- External reviewers not run: 4-file test-only setup change falls under the documented "wasteful reviewer calls" category.
157+
158+
Same-failure scan:
159+
160+
- Grepped all C (`tests/fixtures/c/*.c`) and Go (`**/*_test.go`) test dir creation: all use mode `0700` explicitly; not exposed to umask. Grepped all Rust `create_dir_all` test call sites: the four patched files were the complete set of POSIX run-dir creators (Windows test files are not subject to POSIX modes).
161+
162+
Sensitive data gate:
163+
164+
- No secrets, credentials, customer or personal data in this SOW, the diff, or comments.
165+
166+
Artifact maintenance gate:
167+
168+
- AGENTS.md: no update — no workflow, guardrail, or process change.
169+
- Runtime project skills: none exist (per AGENTS.md index); no update.
170+
- Specs: no update — no behavior, contract, wire-format, or operational-guarantee change; guard semantics untouched.
171+
- End-user/operator docs: no update — change is internal to test scaffolding, not visible to integrators or operators.
172+
- End-user/operator skills: no update — `docs/netipc-integrator-skill.md` triggers (public APIs, examples, workflow, transports, validation commands) are unaffected.
173+
- SOW lifecycle: `Status: completed`, file placed in `.agents/sow/done/`, committed together with the fix in one commit.
174+
175+
Specs update:
176+
177+
- Not needed — see artifact maintenance gate.
178+
179+
Project skills update:
180+
181+
- Not needed — no runtime project skills exist.
182+
183+
End-user/operator docs update:
184+
185+
- Not needed — see artifact maintenance gate.
186+
187+
End-user/operator skills update:
188+
189+
- Not needed — no docs/spec changes occurred.
190+
191+
Lessons:
192+
193+
- POSIX test scaffolding that creates directories must set modes explicitly; inheriting the process umask makes test outcomes environment-dependent, especially when production code legitimately inspects directory permissions. C and Go already followed this; Rust did not.
194+
- External discovery notes must be re-verified against current upstream HEAD before acting: the triggering note compared against a 105-commit-stale clone and drew a false "divergence" conclusion.
195+
196+
Follow-up mapping:
197+
198+
- The note's netdata-side recommendations (CI umask check in `netdata/netdata`, vendor-sync commit trailers) concern the downstream repo, not this one; this fix flows to netdata via the normal vendor sync. No follow-up SOW needed here.
199+
200+
## Outcome
201+
202+
All Rust netipc tests pass independent of the process umask. Test run directories are now created (or repaired) with mode `0700`, matching the existing C and Go test conventions. No production code changed.
203+
204+
## Lessons Extracted
205+
206+
See Lessons under Validation.
207+
208+
## Followup
209+
210+
None.
211+
212+
## Regression Log
213+
214+
None yet.
215+
216+
Append regression entries here only after this SOW was completed or closed and later testing or use found broken behavior. Use a dated `## Regression - YYYY-MM-DD` heading at the end of the file. Never prepend regression content above the original SOW narrative.

src/crates/netipc/src/service/cgroups_unix_tests.rs

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,11 @@ const RESPONSE_BUF_SIZE: usize = 65536;
1414
static SERVICE_COUNTER: AtomicU64 = AtomicU64::new(0);
1515

1616
fn ensure_run_dir() {
17+
use std::os::unix::fs::PermissionsExt;
1718
let _ = std::fs::create_dir_all(TEST_RUN_DIR);
19+
// The stale-unlink guard refuses group/other-writable run dirs; pin the
20+
// mode so the process umask cannot decide test outcomes.
21+
let _ = std::fs::set_permissions(TEST_RUN_DIR, std::fs::Permissions::from_mode(0o700));
1822
}
1923

2024
fn unique_service(prefix: &str) -> String {

src/crates/netipc/src/service/raw_unix_tests.rs

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,11 @@ const RESPONSE_BUF_SIZE: usize = 65536;
2020
static RAW_SERVICE_COUNTER: std::sync::atomic::AtomicU64 = std::sync::atomic::AtomicU64::new(0);
2121

2222
fn ensure_run_dir() {
23+
use std::os::unix::fs::PermissionsExt;
2324
let _ = std::fs::create_dir_all(TEST_RUN_DIR);
25+
// The stale-unlink guard refuses group/other-writable run dirs; pin the
26+
// mode so the process umask cannot decide test outcomes.
27+
let _ = std::fs::set_permissions(TEST_RUN_DIR, std::fs::Permissions::from_mode(0o700));
2428
}
2529

2630
fn cleanup_all(service: &str) {

src/crates/netipc/src/transport/posix_tests.rs

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,11 @@ const TEST_RUN_DIR: &str = "/tmp/nipc_rust_test";
99
const AUTH_TOKEN: u64 = 0xDEADBEEFCAFEBABE;
1010

1111
fn ensure_run_dir() {
12+
use std::os::unix::fs::PermissionsExt;
1213
let _ = std::fs::create_dir_all(TEST_RUN_DIR);
14+
// The stale-unlink guard refuses group/other-writable run dirs; pin the
15+
// mode so the process umask cannot decide test outcomes.
16+
let _ = std::fs::set_permissions(TEST_RUN_DIR, std::fs::Permissions::from_mode(0o700));
1317
}
1418

1519
fn cleanup_socket(service: &str) {

src/crates/netipc/src/transport/shm_tests.rs

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,11 @@ const TEST_RUN_DIR: &str = "/tmp/nipc_shm_rust_test";
1111
const SERVER_READY_TIMEOUT: Duration = Duration::from_secs(2);
1212

1313
fn ensure_run_dir() {
14+
use std::os::unix::fs::PermissionsExt;
1415
let _ = std::fs::create_dir_all(TEST_RUN_DIR);
16+
// The stale-unlink guard refuses group/other-writable run dirs; pin the
17+
// mode so the process umask cannot decide test outcomes.
18+
let _ = std::fs::set_permissions(TEST_RUN_DIR, std::fs::Permissions::from_mode(0o700));
1519
}
1620

1721
fn cleanup_shm(service: &str, session_id: u64) {

0 commit comments

Comments
 (0)