Skip to content

Commit ff6c317

Browse files
avrabeclaude
andauthored
fix(determinism): replace HashMap.iter().find with sorted-key iteration (#154)
Three sites resolved cross-component routing via HashMap::iter().find() over keyed maps. HashMap iteration order is hash-seed-randomised per process, so when more than one entry matched the lookup predicate the chosen entry varied across runs — producing non-reproducible fused output bytes and, in some cases, routing a call to the wrong target. Affected sites: - merger.rs::component_realloc_index fallback (2907-2919) - merger.rs handle-table fallbacks (~1040 and 1090-1100) - resolver.rs::resolve_resource_positions last-resort (1295-1308) Wasm validator does not catch any of these because structural types match across candidates (all reallocs share (i32,i32,i32,i32) -> i32; all resource-rep imports share (i32) -> i32). Fix: collect candidate keys, sort, pick .first(). Deterministic tie-breaking by lowest module index / smallest type-id / lexicographic key. Picks may still be semantically wrong if multiple distinct resources share a prefix (Step 6 alias propagation is the right structural mitigation), but the output is now reproducible across runs. Tests (2 new): - ls_a_15_component_realloc_index_picks_lowest_module_deterministically - ls_a_15_component_realloc_index_prefers_module_0 LS-A-15 added to safety/stpa/loss-scenarios.yaml. Discovered by the post-v0.8.0 Mythos delta-pass sweep on merger.rs and resolver.rs. Refs: LS-A-15 (UCA-M-10, H-7, H-3, H-4.3) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
1 parent cb06f4b commit ff6c317

4 files changed

Lines changed: 204 additions & 47 deletions

File tree

CHANGELOG.md

Lines changed: 20 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -6,22 +6,26 @@ All notable changes to this project will be documented in this file.
66

77
### Fixed
88

9-
- **Extended-const initializer / offset truncation** (LS-A-11, UCA-M-6,
10-
H-1 / H-2 / H-3.3). Two const-expression parsers in meld-core read
11-
only the first operator and discarded the rest, silently truncating
12-
any wasm 2.0 `extended-const` expression. A data / element segment
13-
with offset `(i32.const 5)(i32.const 10) i32.add` landed at offset 5
14-
instead of 15; a global initialized to `(i32.const 100)(i32.const 23)
15-
i32.add` (intended 123) was emitted as 100. Affected
16-
`segments.rs::parse_const_expr_with_value` (data + element offsets)
17-
and `merger.rs::convert_init_expr` (global initializers). Fix
18-
introduces shared `fold_extended_const_i32` /
19-
`fold_extended_const_i64` helpers that walk all operators with a
20-
small stack-machine interpreter (i32/i64 add/sub/mul with wrapping
21-
semantics) and return the folded scalar. Regression pinned by 6
22-
tests covering all three arithmetic ops and the single-const
23-
passthrough. Surfaced by the post-v0.8.0 Mythos delta-pass sweep on
24-
the remaining 8 Tier-5 files (the protocol introduced in #151).
9+
- **HashMap iteration non-determinism in resource / realloc fallbacks**
10+
(LS-A-15, UCA-M-10, H-7 / H-3 / H-4.3). Three sites resolved
11+
cross-component routing via `HashMap::iter().find(...)`, which picks
12+
an arbitrary entry per hash-seed-randomised iteration order. When
13+
multiple entries matched, the chosen entry varied per run —
14+
producing non-reproducible fused output bytes and, in some cases,
15+
routing a call to the wrong target. Affected:
16+
- `merger.rs::component_realloc_index` fallback when a component
17+
has multiple modules each with `cabi_realloc` — could pick a
18+
realloc bound to a different memory than the calling site.
19+
- `merger.rs` handle-table fallbacks at two sites — ties between
20+
candidate handle tables broken by iteration order.
21+
- `resolver.rs::resolve_resource_positions` last-resort fallback
22+
on `[resource-rep]` / `[resource-new]` prefix collisions.
23+
Fix: collect candidate keys, sort, pick `.first()`. Deterministic
24+
tie-breaking by lowest module index / smallest type-id /
25+
lexicographic key. Picks may still be semantically wrong if
26+
multiple distinct resources share a prefix (Step 6 alias propagation
27+
is the right structural mitigation), but the output is now
28+
reproducible across runs.
2529

2630
### Added
2731

meld-core/src/merger.rs

Lines changed: 106 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1037,11 +1037,21 @@ impl Merger {
10371037
// peers will hand it pointers it can't
10381038
// dereference. Match by resource_name only
10391039
// since the iface differs across the alias.
1040-
let found = merged
1040+
// Sort keys for deterministic tie-breaking
1041+
// (LS-A-15).
1042+
let mut keys: Vec<&(
1043+
usize,
1044+
String,
1045+
String,
1046+
)> = merged
10411047
.handle_tables
1042-
.iter()
1043-
.find(|((_, _, r), _)| r == rn)
1044-
.map(|(_, ht)| ht);
1048+
.keys()
1049+
.filter(|(_, _, r)| r == rn)
1050+
.collect();
1051+
keys.sort();
1052+
let found = keys
1053+
.first()
1054+
.and_then(|k| merged.handle_tables.get(*k));
10451055
if found.is_some() {
10461056
log::info!(
10471057
"alias-fallback: comp {} mod {} import {}/{} → ht for resource '{}'",
@@ -1087,17 +1097,27 @@ impl Merger {
10871097
if self_owns_specific {
10881098
None
10891099
} else {
1090-
merged
1100+
// Look up (any-owner, iface, rn) first, then
1101+
// fall back to (any-owner, any-iface, rn).
1102+
// Iterate in sorted-key order so ties are
1103+
// broken deterministically (LS-A-15).
1104+
let mut iface_keys: Vec<&(usize, String, String)> = merged
10911105
.handle_tables
1092-
.iter()
1093-
.find(|((_, i, r), _)| i == iface && r == rn)
1094-
.map(|(_, ht)| ht)
1106+
.keys()
1107+
.filter(|(_, i, r)| i == iface && r == rn)
1108+
.collect();
1109+
iface_keys.sort();
1110+
iface_keys
1111+
.first()
1112+
.and_then(|k| merged.handle_tables.get(*k))
10951113
.or_else(|| {
1096-
merged
1114+
let mut any_keys: Vec<&(usize, String, String)> = merged
10971115
.handle_tables
1098-
.iter()
1099-
.find(|((_, _, r), _)| r == rn)
1100-
.map(|(_, ht)| ht)
1116+
.keys()
1117+
.filter(|(_, _, r)| r == rn)
1118+
.collect();
1119+
any_keys.sort();
1120+
any_keys.first().and_then(|k| merged.handle_tables.get(*k))
11011121
})
11021122
}
11031123
};
@@ -2921,18 +2941,32 @@ pub(crate) fn component_memory_index(merged: &MergedModule, comp_idx: usize) ->
29212941
}
29222942

29232943
/// Find the merged function index of a component's cabi_realloc.
2944+
///
2945+
/// Prefers module 0's realloc (the main module). If module 0 has no
2946+
/// realloc, falls back to the realloc bound to the **lowest** module
2947+
/// index for this component — chosen deterministically rather than via
2948+
/// HashMap iteration order, which would let the hasher state pick a
2949+
/// different module on every run and produce non-reproducible output
2950+
/// (LS-A-15).
29242951
pub(crate) fn component_realloc_index(merged: &MergedModule, comp_idx: usize) -> Option<u32> {
29252952
// Prefer module 0's realloc (the main module)
29262953
if let Some(&idx) = merged.realloc_map.get(&(comp_idx, 0)) {
29272954
return Some(idx);
29282955
}
2929-
// Fallback: any module's realloc for this component
2930-
for (&(ci, _mi), &merged_idx) in &merged.realloc_map {
2931-
if ci == comp_idx {
2932-
return Some(merged_idx);
2933-
}
2934-
}
2935-
None
2956+
// Fallback: pick the smallest module index belonging to this component,
2957+
// deterministically. HashMap.iter() returns entries in hash-seed
2958+
// order, which varies per process; collect-and-sort gives reproducible
2959+
// output and removes the multi-realloc race condition.
2960+
let mut module_idxs: Vec<usize> = merged
2961+
.realloc_map
2962+
.keys()
2963+
.filter(|(ci, _)| *ci == comp_idx)
2964+
.map(|(_, mi)| *mi)
2965+
.collect();
2966+
module_idxs.sort_unstable();
2967+
module_idxs
2968+
.first()
2969+
.and_then(|mi| merged.realloc_map.get(&(comp_idx, *mi)).copied())
29362970
}
29372971

29382972
///
@@ -4641,15 +4675,12 @@ mod tests {
46414675

46424676
// ---------------------------------------------------------------
46434677
// LS-A-11 — extended-const truncation in global initializers
4678+
// LS-A-15 — HashMap iteration non-determinism
46444679
//
4645-
// Prior to the fix, `convert_init_expr` read only the first
4646-
// operator of a global's init expression and emitted just that
4647-
// single const, silently dropping any subsequent extended-const
4648-
// ops. A global initialized with `(i32.const 100)(i32.const 23)
4649-
// i32.add` (intended value 123) was emitted as `(i32.const 100)`.
4680+
// Shared empty-merged fixture used by both regression suites.
46504681
// ---------------------------------------------------------------
46514682

4652-
fn empty_merged_for_init_expr() -> MergedModule {
4683+
fn empty_merged_fixture() -> MergedModule {
46534684
MergedModule {
46544685
types: Vec::new(),
46554686
imports: Vec::new(),
@@ -4679,13 +4710,15 @@ mod tests {
46794710
}
46804711
}
46814712

4713+
// LS-A-11: convert_init_expr must fold multi-op extended-const
4714+
// expressions (was previously truncating to the first operator).
46824715
#[test]
46834716
fn ls_a_11_convert_init_expr_folds_extended_const_i32_add() {
46844717
// Init expr bytes WITHOUT trailing `end` (the function appends
46854718
// it). Use small operands that fit in single-byte sleb (no sign
46864719
// bit at position 6): i32.const 5, i32.const 10, i32.add → 15.
46874720
let bytes: Vec<u8> = vec![0x41, 5, 0x41, 10, 0x6A];
4688-
let merged = empty_merged_for_init_expr();
4721+
let merged = empty_merged_fixture();
46894722

46904723
let expr = convert_init_expr(&bytes, 0, 0, &merged, &ValType::I32);
46914724

@@ -4706,7 +4739,7 @@ mod tests {
47064739
// Regression: the fold path must not change behavior for a
47074740
// simple single-const initializer.
47084741
let bytes: Vec<u8> = vec![0x41, 5];
4709-
let merged = empty_merged_for_init_expr();
4742+
let merged = empty_merged_fixture();
47104743

47114744
let expr = convert_init_expr(&bytes, 0, 0, &merged, &ValType::I32);
47124745
let mut encoded = Vec::new();
@@ -4715,6 +4748,52 @@ mod tests {
47154748
wasm_encoder::Encode::encode(&ConstExpr::i32_const(5), &mut expected);
47164749
assert_eq!(encoded, expected);
47174750
}
4751+
4752+
// LS-A-15: component_realloc_index fallback must be deterministic
4753+
// when multiple modules carry cabi_realloc (was hash-seed dependent).
4754+
#[test]
4755+
fn ls_a_15_component_realloc_index_picks_lowest_module_deterministically() {
4756+
// Component 0 has reallocs at modules 1 (idx 10), 2 (idx 11),
4757+
// and 3 (idx 12), but no module 0. The function must
4758+
// deterministically pick the realloc bound to the LOWEST module
4759+
// index (module 1 → idx 10), not whatever HashMap happens to
4760+
// iterate first.
4761+
//
4762+
// Rebuilding the HashMap from scratch each iteration gives
4763+
// each instance a fresh hash seed, so iteration order varies
4764+
// across iterations — if the impl picked an arbitrary entry,
4765+
// we'd see more than one observed value.
4766+
let mut observed = std::collections::HashSet::new();
4767+
for _ in 0..64 {
4768+
let mut merged = empty_merged_fixture();
4769+
merged.realloc_map.insert((0, 1), 10);
4770+
merged.realloc_map.insert((0, 2), 11);
4771+
merged.realloc_map.insert((0, 3), 12);
4772+
let got = component_realloc_index(&merged, 0).unwrap();
4773+
observed.insert(got);
4774+
}
4775+
assert_eq!(
4776+
observed.len(),
4777+
1,
4778+
"component_realloc_index must return a deterministic value \
4779+
across runs; saw {observed:?}",
4780+
);
4781+
assert!(
4782+
observed.contains(&10),
4783+
"lowest module index (1 → realloc 10) must be selected; \
4784+
observed {observed:?}",
4785+
);
4786+
}
4787+
4788+
#[test]
4789+
fn ls_a_15_component_realloc_index_prefers_module_0() {
4790+
// Regression: when module 0 has a realloc, the function must
4791+
// return it regardless of other modules in the map.
4792+
let mut merged = empty_merged_fixture();
4793+
merged.realloc_map.insert((0, 0), 100);
4794+
merged.realloc_map.insert((0, 1), 200);
4795+
assert_eq!(component_realloc_index(&merged, 0), Some(100));
4796+
}
47184797
}
47194798

47204799
// ---------------------------------------------------------------------------

meld-core/src/resolver.rs

Lines changed: 18 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1300,11 +1300,25 @@ fn resolve_resource_positions(
13001300
// cases). Step 6 normally covers this, but keep as a
13011301
// safety net for components whose ExportAlias chain isn't
13021302
// captured.
1303-
resource_map
1303+
//
1304+
// Determinism: iterate in sorted type-id order rather than
1305+
// HashMap iteration order, so a tie between two resource
1306+
// imports with the same prefix is broken by the lowest
1307+
// type-id deterministically rather than by HashMap hasher
1308+
// state. Picks may still be wrong if two distinct resources
1309+
// share a prefix (Step 6 alias propagation is the right
1310+
// mitigation), but at least the output is now reproducible
1311+
// across runs (LS-A-15).
1312+
let mut keys: Vec<(u32, &str)> = resource_map
13041313
.by_type_id
1305-
.iter()
1306-
.find(|((_, k), _)| *k == field_prefix)
1307-
.map(|(_, v)| (v.clone(), false))
1314+
.keys()
1315+
.filter(|(_, k)| *k == field_prefix)
1316+
.map(|(ti, k)| (*ti, *k))
1317+
.collect();
1318+
keys.sort_unstable_by_key(|(ti, _)| *ti);
1319+
keys.first()
1320+
.and_then(|k| resource_map.by_type_id.get(k))
1321+
.map(|v| (v.clone(), false))
13081322
});
13091323
if let Some(((module_name, field_name), via_name_fallback)) = lookup {
13101324
// Check if the callee truly defines this resource (has ownership of the

safety/stpa/loss-scenarios.yaml

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -822,6 +822,66 @@ loss-scenarios:
822822
status: approved
823823
priority: high
824824

825+
- id: LS-A-15
826+
title: HashMap iteration non-determinism in resource / realloc lookups
827+
uca: UCA-M-10
828+
hazards: [H-7, H-3, H-4.3]
829+
type: inadequate-control-algorithm
830+
scenario: >
831+
Three sites in `meld-core` resolved cross-component routing via
832+
`HashMap::iter().find(...)` over keyed maps. HashMap iteration
833+
order is hash-seed-randomised per process, so when more than one
834+
entry matched the lookup predicate the chosen entry varied
835+
across runs — producing **non-reproducible** fused output bytes
836+
and, in some cases, routing a call to the wrong target on a
837+
given run.
838+
839+
Affected sites:
840+
* `merger.rs::component_realloc_index` (lines 2907-2919) — when
841+
a component has multiple modules each carrying a
842+
`cabi_realloc` and module 0 is not among them, the fallback
843+
picked whichever module HashMap iteration visited first. The
844+
chosen realloc may be bound to a different linear memory than
845+
the calling site, producing cross-memory address corruption
846+
[H-4.3].
847+
* `merger.rs` handle-table fallbacks (~lines 1040 and 1090-1100)
848+
— when looking up a handle table by `(any-owner, iface, rn)`
849+
or `(any-owner, any-iface, rn)`, ties were broken by iteration
850+
order.
851+
* `resolver.rs::resolve_resource_positions` last-resort fallback
852+
(lines 1295-1308) — `by_type_id.iter().find(|((_, k), _)| ...)`
853+
on `[resource-rep]` / `[resource-new]` / `[resource-drop]`
854+
prefix collisions returned an arbitrary import; cross-resource
855+
confusion at the adapter call site [H-3].
856+
857+
Validator does not catch any of these because the structural
858+
types match across candidates (all reallocs share
859+
`(i32, i32, i32, i32) -> i32`; all resource-rep imports share
860+
`(i32) -> i32`).
861+
862+
Fix: collect candidate keys, sort, then pick `.first()`.
863+
`component_realloc_index` now deterministically chooses the
864+
lowest module index. Resource fallback picks the lowest type-id
865+
with the matching prefix. Handle-table fallbacks pick the
866+
lexicographically smallest `(owner, iface, rn)` tuple. Picks may
867+
still be wrong if multiple distinct resources share a prefix
868+
(Step 6 alias propagation is the right structural mitigation),
869+
but the output is now reproducible across runs.
870+
871+
Hazards: H-7 (non-deterministic output for identical inputs),
872+
H-3 (cross-component reference integrity when the wrong target
873+
is chosen), H-4.3 (canonical-ABI realloc bound to wrong memory).
874+
causal-factors:
875+
- "`HashMap::iter().find(predicate)` was used as a \"pick any matching\" idiom without recognising that the choice is hash-seed-randomised"
876+
- No reproducibility test on the fuser's output bytes for
877+
inputs with multi-module realloc or with prefix-collision
878+
resource imports
879+
- Three distinct sites carried the same bug pattern in
880+
different files; a refactor pass would have surfaced them
881+
together
882+
status: approved
883+
priority: high
884+
825885
# ==========================================================================
826886
# Merger scenario (discovered during gap analysis)
827887
# ==========================================================================

0 commit comments

Comments
 (0)