Skip to content

Commit 5287497

Browse files
authored
security(homecore-migrate): redact secret value from malformed secrets.yaml error (ruvnet#1089)
* fix(homecore-migrate): redact secret value from malformed secrets.yaml error (secret-leak) `read_secrets` wrapped serde_yaml's parse error into `MigrateError::YamlParse { source }`. serde_yaml's message for a typed-tag coercion failure embeds the offending scalar verbatim, e.g. `invalid value: string "<the-secret-value>"`. That error propagates out of `read_secrets`, is `?`-returned by the `InspectSecrets` CLI path in main.rs, and printed to stderr by anyhow — leaking a secret value despite the CLI's deliberate `<redacted>` design. Fix: secrets.yaml parse failures now map to a new redacting variant `MigrateError::SecretsParse { path, line, column }` that carries only the file path and a coarse location (from `serde_yaml::Error::location()`), never the scalar content. Other (non-secret) YAML files keep `YamlParse`. Pinned by `secrets::tests::malformed_secrets_error_never_contains_secret_value` (asserts the rendered error AND its full #[source] chain never contain the secret value; fails on the old `YamlParse` path) plus `malformed_secrets_error_reports_location` (still fail-closed + locatable). ADR-165 secret-handling rule: a secret value must never appear in output. Co-Authored-By: claude-flow <ruv@ruv.net> * docs(homecore-migrate): record secret-leak fix in ADR-165 + CHANGELOG Note the secrets.yaml error-redaction fix and the review's clean dimensions (read-only source / no traversal / no panic / fail-closed versioning / no injection) in ADR-165 §2.4, bump the test-evidence count 19→21 in §2.6, and add an [Unreleased] Security entry to CHANGELOG. Co-Authored-By: claude-flow <ruv@ruv.net>
1 parent bf1dfe7 commit 5287497

4 files changed

Lines changed: 105 additions & 5 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
99

1010
### Security
1111
- **`homecore` foundational state-machine review (ADR-127) — one real concurrency bug fixed (state-set TOCTOU dropping/reordering `state_changed` events) + two hardening fixes (entity_id memory-DoS, service-handler panic isolation), each pinned by a fails-on-old test; event-bus lag & lock discipline confirmed clean with evidence.** Beyond-SOTA security+concurrency review of the crate every other HOMECORE module builds on (state store `state.rs`, event bus `bus.rs`, service/entity registries, the `HomeCore` coordinator), un-covered by the ADR-154–159 sweep — a bug here is high-blast-radius. **HC-RACE-01 (state-set TOCTOU, the crux — race/lost-event).** `StateMachine::set` did `get()` (releasing the DashMap shard lock) → compute the next snapshot + the no-op/`last_changed` decision → `insert()` (re-acquiring the lock) → `send()`; the read-modify-write was **not atomic** w.r.t. a concurrent writer on the same entity, contradicting ADR-127 §2.1's promise that "the writer atomically replaces the map entry." A writer that read a **stale `old`** could mis-classify a genuine transition as a no-op and **silently drop its `state_changed` event** (a missed automation trigger) or fire an event whose `new_state` duplicated the previously delivered one (a spurious trigger for any automation keyed on `old_state != new_state`). **Fixed** by holding the shard write-lock across the whole read→decide→insert→fire sequence via `entry()`/`insert_entry()` — `tx.send` is non-blocking, non-async, and never re-enters the map, so firing under the shard lock cannot deadlock and keeps global event order in lock-step with global commit order. Pinned by `concurrent_set_fires_no_duplicate_adjacent_events` (4 writers toggling one entity A/B; asserts no two consecutive fired events carry an identical `new_state` — impossible under correct serialisation; an instrumented probe observed ~93k such duplicate-adjacent events across 200 trials on the racy code, **zero** on the fix; the test fails reliably on the first trial pre-fix). **HC-EID-LEN-01 (unbounded `entity_id`, memory-DoS).** `homecore-api/src/rest.rs` parses untrusted REST path segments straight through `EntityId::parse`; with no length cap an otherwise-valid id (`a.` + many MB of `[a-z0-9_]`) was accepted, and a `POST /api/states/<giant>` would persist it into the DashMap state store (permanent growth across distinct ids). **Fixed** by rejecting ids longer than `MAX_ENTITY_ID_LEN` (255, HA-compatible) up front in `parse()`, before any per-char scan, with a new `EntityIdError::TooLong` — fail-closed at the boundary type protects every caller (REST, registry deserialize, automation). Pinned by `entity_id_length_boundary` (exactly-MAX accepted; MAX+1 and a 4 MiB id rejected — oversized parses `Ok` on old code). **HC-SVC-PANIC-01 (service-handler panic not isolated).** `ServiceRegistry::call` already ran handlers **outside** the registry lock (the `Arc<dyn ServiceHandler>` is cloned out of the read guard first → no `RwLock` poisoning, no blocking of other callers — clean), but a panicking handler unwound through `call()` into the caller's task (the task driving the engine). **Hardened** by wrapping the handler future in `AssertUnwindSafe` + `catch_unwind`, converting a panic to `ServiceError::HandlerPanicked`; the registry stays fully usable (a sibling healthy service still returns, the bad service stays registered). Pinned by `panicking_handler_is_isolated_and_registry_survives` (unwinds through `call` on old code). **Dimensions confirmed clean (with evidence, no invented issues):** (1) **event-bus bounds / lag** (the homecore-api WS lag-DoS class) — both `StateMachine` and `EventBus` use **bounded** `tokio::sync::broadcast` (capacity 4,096); a slow subscriber gets a recoverable `Lagged(n)` (drop-oldest + re-sync) while `fire_*` is non-blocking and never waits on slow receivers, so a lagging subscriber **cannot block the publisher, grow the channel without bound, or kill a fast subscriber** (evidenced by `slow_subscriber_does_not_block_publisher_or_kill_the_bus` — fire 3× capacity at an idle subscriber, publisher unblocked, bus stays live, fresh fast subscriber receives, lagged one recovers); (2) **lock ordering / lock-across-await** (deadlock) — no code path holds two of `{state DashMap, registry RwLock, service RwLock}` simultaneously, so no inconsistent-ordering deadlock can exist; every `tokio::sync::RwLock` guard in `registry.rs`/`service.rs` is used in one synchronous statement and dropped before any `.await` (`call` explicitly scopes the read guard out before awaiting the handler); the only guard held across a send is the DashMap shard lock in `set`, across a **synchronous** broadcast send — safe; (3) **panic-on-input** — no reachable `unwrap`/`expect`/index in non-test code beyond the safe `send().unwrap_or(0)` and the dead-but-harmless `split_once(...).unwrap_or(...)` fallbacks on already-validated ids. `cargo test -p homecore --no-default-features`: **20 → 24 passed, 0 failed** (+4 pins). Workspace green; Python deterministic proof unchanged (`f8e76f21…46f7a`, bit-exact — `homecore` is off the signal proof path). Review notes appended to ADR-127 §9.
12+
- **`homecore-migrate` security review (ADR-165 surfaces) — one real secret-leak fix; traversal / data-loss / panic / injection dimensions confirmed clean with evidence.** Beyond-SOTA review of the Home-Assistant `.storage`/`secrets.yaml`/`automations.yaml` migrator, the two sharp surfaces being secret handling (`secrets.rs`) and untrusted-file parsing. **Finding + fix (secret-leak, `secrets.rs`):** a malformed `secrets.yaml` whose offending scalar fails a typed-tag coercion (e.g. `port: !!int <value>`) produced a `serde_yaml` error whose message **embeds the scalar verbatim** — `invalid value: string "<the-secret-value>"`. The old code wrapped that message into `MigrateError::YamlParse { source }`; the error propagates out of `read_secrets`, is `?`-returned by the `InspectSecrets` CLI path in `main.rs`, and printed to stderr by `anyhow` — **leaking a secret value despite the CLI's deliberate `<redacted>` design** (`main.rs` only ever prints keys as `<key> = <redacted>`). Fix: `secrets.yaml` parse failures now map to a dedicated redacting variant `MigrateError::SecretsParse { path, line, column }` carrying only the file path + a coarse location (from `serde_yaml::Error::location()`), never the scalar; other (non-secret) YAML files keep `YamlParse`. **Pinned** by `secrets::tests::malformed_secrets_error_never_contains_secret_value`, which asserts the rendered error **and its full `#[source]` chain** never contain the secret value and that the error is still the structured `SecretsParse` (fail-closed) — it **fails on the old `YamlParse` path** (observed leak: `... invalid value: string "s3cr3t_TOKEN_VALUE" ...`) and passes on the fix; plus `malformed_secrets_error_reports_location` (still locatable). **Confirmed clean with evidence:** *secret leakage elsewhere* — the only secret sink is the value map; `main.rs` redacts values, and the `MissingField`/`Io` paths surface only the path, never content. *Source mutation / data-loss* — **structurally impossible**: there is no `fs::write`/`fs::remove`/`fs::create`/`File::create`/`OpenOptions` anywhere in the crate; P1 reads source and writes nothing (`import-entities` is in-memory only), so re-runs are trivially idempotent and the HA source is never touched. *Path traversal* — CLI takes a `--config-dir`/`--storage` dir and joins **fixed** filenames (`secrets.yaml`, `core.entity_registry`, …); no user-controlled path component, no `..`/absolute escape beyond the user's own privileges. *Panic-on-input* — probed duplicate-key, bad-indent, tab/control-char, multi-doc, non-mapping-root, unterminated-flow, `!input` blueprint tags, deep nesting, anchors: **every** malformed/typed/truncated input **errors, never panics** (all production code is panic-free; every `unwrap`/`expect` is `#[cfg(test)]`). *Fail-closed versioning* — unknown storage `minor_version` hard-errors (no silent fallback to an older parser). *Injection* — no SQL/shell/path interpolation; the tool emits diagnostics only and persists nothing in P1. `homecore-migrate` **19 → 21** tests (`--no-default-features`), 0 failed. Behaviour otherwise unchanged; Python deterministic proof PASS, hash unchanged (`homecore-migrate` is off the signal proof path).
1213
- **`homecore-recorder` security review (ADR-132 surfaces) — two real bounding fixes; SQL-injection & NaN-index dimensions confirmed clean with evidence.** Beyond-SOTA review of the HA-compat state recorder (DB persistence + history + ruvector semantic search), the crux being its DB-backed SQL-injection surface. **Findings + fixes:** (1) **Memory-DoS — unbounded `get_state_history`.** The history query carried no `LIMIT`, so a wide `[since, until]` window over a high-frequency entity (a per-second sensor ≈ 86k rows/day) would load an unbounded row set into a single in-memory `Vec`. Added a hard `LIMIT MAX_HISTORY_ROWS` (1,000,000 — generous enough never to truncate a realistic history graph, bounded enough to cap the worst case); the sibling search paths were already `k`-bounded. (2) **Disk-DoS / documented-but-missing `purge`.** The README + HA-compat table advertised `Recorder::purge(older_than)` as a capability, but **no such method existed** — i.e. no retention path at all → unbounded disk growth. Implemented a **transactional** `purge` that deletes `states` + `events` strictly **older than** the cutoff (**exclusive** boundary — idempotent, no off-by-one; a row at the cutoff instant is kept) and **garbage-collects** orphaned `state_attributes` blobs (a dedup-shared blob is dropped only once its last referencing state is gone); all three deletes run in one transaction so a mid-purge failure rolls back cleanly (no states-deleted-but-events-kept corruption). **Confirmed clean with evidence:** SQL injection — **every** query in `db.rs` uses bound `?` parameters (no `format!`/string-concat of user data into SQL); the lone `format!` builds the LIKE *pattern*, which is itself bound as a parameter with `ESCAPE '\\'` and metacharacter escaping. Pinned: a state value `'; DROP TABLE states; --` is stored/queried **literally** (table survives), and a `%`/`_` in a search query matches **literally**, not as a wildcard. NaN-index poisoning (the calibration/vitals/geo class) — **structurally impossible** here: embeddings are SHA-256 → `i32` → `f32` (an `i32` cast to `f32` is always finite, never NaN/Inf), with an all-zero-digest norm guard; probed empty-index search, empty-string query, and `k=0` — all return `Ok(0)`, **no panic**. Fail-closed write path — a removal event yields `Ok(None)`, semantic-index failure is logged not propagated (best-effort, never blocks the durable SQLite write), and `EntityId` parsing failures fall back rather than panic. **6 new pinning tests** (SQL-injection literal-storage, LIKE-metacharacter literalness, history `LIMIT`, purge exclusive-boundary, purge attribute-GC-keeps-shared, purge old-events): `homecore-recorder` **19 → 25** (`--no-default-features`) / **25 → 31** (`--features ruvector`), 0 failed; the purge-boundary test is a true pin (fails deleting 2 rows under an inclusive cutoff, passes deleting 1 under the exclusive cutoff). Behaviour otherwise unchanged; Python deterministic proof unchanged (recorder is off the signal proof path).
1314

1415
### Added

docs/adr/ADR-165-homecore-migrate-from-home-assistant.md

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,23 @@ converts the entity registry; full conversion of the remaining artifacts is defe
7878

7979
- `MigrateError` carries context (`path`, line/field) for I/O, JSON, YAML, missing-field,
8080
unsupported-schema-version, and entity-id parse failures (`src/lib.rs`).
81+
- **Secret-leak hardening (security review, 2026-06).** `secrets.yaml` parse failures must
82+
NOT use the generic `MigrateError::YamlParse { source }` variant: `serde_yaml`'s message
83+
for a typed-tag coercion error (e.g. `port: !!int <value>`) embeds the offending scalar
84+
verbatim (`invalid value: string "<the-secret-value>"`), and that error propagates through
85+
the `InspectSecrets` CLI path to stderr — leaking a secret value despite the CLI's
86+
deliberate `<redacted>` design. `read_secrets` now maps such failures to a dedicated
87+
redacting variant `MigrateError::SecretsParse { path, line, column }` that carries only the
88+
file path and a coarse location (`serde_yaml::Error::location()`), never the scalar content.
89+
Pinned by `secrets::tests::malformed_secrets_error_never_contains_secret_value` (asserts the
90+
rendered error **and its full `#[source]` chain** never contain the secret value).
91+
**Review dimensions confirmed clean with evidence:** source is never mutated (no
92+
`fs::write`/`remove`/`create` anywhere — P1 reads source, writes nothing); paths are
93+
user-supplied dirs joined with fixed filenames (no `..`/absolute traversal beyond the
94+
user's own privileges); malformed/typed/truncated `.storage` JSON and YAML **error, never
95+
panic** (every production `unwrap`/`expect` is test-only); unknown schema `minor_version`
96+
hard-errors fail-closed; no SQL/shell/path injection surface (the tool emits diagnostics
97+
only, persists nothing in P1).
8198

8299
### 2.5 Deferred to P2+ (NOT built — honestly labelled)
83100

@@ -89,7 +106,9 @@ converts the entity registry; full conversion of the remaining artifacts is defe
89106

90107
### 2.6 Test evidence (as shipped)
91108

92-
- 19 tests (`cargo test -p homecore-migrate`), per the crate README badge.
109+
- 21 tests (`cargo test -p homecore-migrate`) — 19 as originally shipped plus 2 added by the
110+
2026-06 security review (`secrets::tests::malformed_secrets_error_never_contains_secret_value`,
111+
`malformed_secrets_error_reports_location`).
93112

94113
## 3. Consequences
95114

v2/crates/homecore-migrate/src/lib.rs

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,25 @@ pub enum MigrateError {
5555
source: serde_yaml::Error,
5656
},
5757

58+
/// Parse failure in a SECRET-bearing file (`secrets.yaml`).
59+
///
60+
/// Unlike [`MigrateError::YamlParse`], this variant deliberately does NOT
61+
/// embed the underlying `serde_yaml::Error` message — that message can quote
62+
/// the offending scalar verbatim (e.g. a typed-tag coercion error renders
63+
/// `invalid value: string "<the-secret-value>"`), which would leak a secret
64+
/// into stderr/logs. We carry only the file path plus a coarse line/column
65+
/// so the user can locate the problem without the value being printed.
66+
/// (ADR-165 secret-handling rule: a secret value must never appear in output.)
67+
#[error(
68+
"secrets.yaml parse error in {path} (line {line}, column {column}): \
69+
malformed YAML (value content redacted)"
70+
)]
71+
SecretsParse {
72+
path: String,
73+
line: usize,
74+
column: usize,
75+
},
76+
5877
/// Fired when the outer `{version, minor_version}` envelope version is
5978
/// known but the `minor_version` is not supported by any compiled parser.
6079
/// Per ADR-165 §6 Q5: hard error on unknown minor_version.

v2/crates/homecore-migrate/src/secrets.rs

Lines changed: 65 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -33,11 +33,19 @@ pub fn read_secrets(path: &Path) -> Result<HashMap<String, String>, MigrateError
3333
return Ok(HashMap::new());
3434
}
3535

36-
let parsed: serde_yaml::Value =
37-
serde_yaml::from_str(&raw).map_err(|e| MigrateError::YamlParse {
36+
// SECURITY: do NOT use `MigrateError::YamlParse` here. serde_yaml error
37+
// messages can quote the offending scalar verbatim (a typed-tag coercion
38+
// error renders `invalid value: string "<the-secret-value>"`), and that
39+
// message would be printed to stderr by the CLI — leaking a secret value.
40+
// `MigrateError::SecretsParse` carries only the path + line/column.
41+
let parsed: serde_yaml::Value = serde_yaml::from_str(&raw).map_err(|e| {
42+
let loc = e.location();
43+
MigrateError::SecretsParse {
3844
path: path.display().to_string(),
39-
source: e,
40-
})?;
45+
line: loc.as_ref().map_or(0, |l| l.line()),
46+
column: loc.as_ref().map_or(0, |l| l.column()),
47+
}
48+
})?;
4149

4250
let map = match parsed {
4351
serde_yaml::Value::Mapping(m) => m,
@@ -94,6 +102,59 @@ mod tests {
94102
assert!(secrets.is_empty());
95103
}
96104

105+
/// SECURITY regression (fails on the pre-fix `YamlParse` path): a malformed
106+
/// `secrets.yaml` whose offending scalar is a secret value must NOT have that
107+
/// value rendered in the returned error. serde_yaml's own error message for a
108+
/// typed-tag coercion failure embeds the scalar verbatim
109+
/// (`invalid value: string "<secret>"`); the old code wrapped that message
110+
/// into `MigrateError::YamlParse { source }`, so `Display` leaked the secret.
111+
#[test]
112+
fn malformed_secrets_error_never_contains_secret_value() {
113+
// `!!int` forces integer coercion of a string scalar; serde_yaml reports
114+
// the scalar text in its message. The scalar here is a stand-in secret.
115+
let yaml = "api_port: !!int s3cr3t_TOKEN_VALUE\n";
116+
let mut f = NamedTempFile::new().unwrap();
117+
f.write_all(yaml.as_bytes()).unwrap();
118+
119+
let err = read_secrets(f.path()).unwrap_err();
120+
let rendered = err.to_string();
121+
122+
// The secret VALUE must never appear in the error output...
123+
assert!(
124+
!rendered.contains("s3cr3t_TOKEN_VALUE"),
125+
"secret value leaked into error: {rendered}"
126+
);
127+
// ...and the full chain (with #[source]) must also be clean, since the
128+
// CLI/anyhow prints the source chain too.
129+
let mut source = std::error::Error::source(&err);
130+
while let Some(s) = source {
131+
assert!(
132+
!s.to_string().contains("s3cr3t_TOKEN_VALUE"),
133+
"secret value leaked into error source chain: {s}"
134+
);
135+
source = s.source();
136+
}
137+
138+
// It should still be a structured, locatable error (fail-closed).
139+
assert!(
140+
matches!(err, MigrateError::SecretsParse { .. }),
141+
"expected SecretsParse, got: {err:?}"
142+
);
143+
}
144+
145+
/// A secret KEY name is non-sensitive context and is fine to surface, but the
146+
/// redacting error must still help the user locate the problem (line/column).
147+
#[test]
148+
fn malformed_secrets_error_reports_location() {
149+
let yaml = "api_port: !!int notanumber\n";
150+
let mut f = NamedTempFile::new().unwrap();
151+
f.write_all(yaml.as_bytes()).unwrap();
152+
let err = read_secrets(f.path()).unwrap_err();
153+
let rendered = err.to_string();
154+
assert!(rendered.contains("line"), "should report a line: {rendered}");
155+
assert!(rendered.contains("redacted"), "should signal redaction: {rendered}");
156+
}
157+
97158
#[test]
98159
fn secret_count_is_correct() {
99160
let yaml = "a: 1\nb: 2\nc: 3\n";

0 commit comments

Comments
 (0)