fix: override random_get with deterministic PRNG in NoDAG WASM execution#1967
fix: override random_get with deterministic PRNG in NoDAG WASM execution#1967
Conversation
The v2/NoDAG WASM execution path uses newWasiLinker which calls DefineWasi() but never overrides the random_get WASI function. This means the WASM guest receives real OS entropy, causing Go map iteration order to differ across nodes in the DON. When a workflow iterates a map (e.g. response headers) and uses the result in a downstream capability call, each node produces a different request body, leading to divergent requestHash values and quorum failure. Fix: override random_get in newWasiLinker with a deterministic PRNG seeded from exec.donSeed, which is an FNV hash of the workflow execution ID. Since all nodes share the same execution ID for a given run, they get identical random output and therefore identical map iteration order. The legacy DAG path (newDagWasiLinker) already had this override gated behind ModuleConfig.Determinism. The NoDAG path was missing it entirely.
|
👋 vreff, thanks for creating this pull request! To help reviewers, please consider creating future PRs as drafts first. This allows you to self-review and make any final changes before notifying the team. Once you're ready, you can mark it as "Ready for review" to request feedback. Thanks! |
📊 API Diff Results
|
There was a problem hiding this comment.
Pull request overview
This PR fixes non-deterministic behavior in the v2/NoDAG WASM execution linker by overriding the WASI random_get syscall with a deterministic PRNG seeded from the workflow execution ID (exec.donSeed). This ensures Go guest modules (notably runtime.fastrand → map iteration order) behave deterministically across DON nodes and avoids quorum failures caused by divergent downstream request hashes.
Changes:
- Override
wasi_snapshot_preview1.random_getinnewWasiLinkerusing a deterministic PRNG seeded fromexec.donSeed. - Add
createSeedRandomGet(seed int64)helper to provide a seed-drivenrandom_getimplementation for the NoDAG path.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Override random_get with a deterministic PRNG seeded from the workflow | ||
| // execution ID (via exec.donSeed). All nodes in the DON share the same | ||
| // execution ID, so they get the same seed and therefore the same random | ||
| // output. This ensures Go map iteration order — which is randomized via | ||
| // runtime.fastrand seeded from WASI random_get — is identical across nodes, | ||
| // preventing quorum failures from non-deterministic map ranging in WASM. | ||
| err = linker.FuncWrap( | ||
| "wasi_snapshot_preview1", | ||
| "random_get", | ||
| createSeedRandomGet(exec.donSeed), | ||
| ) |
There was a problem hiding this comment.
random_get override is applied unconditionally in the NoDAG path, which makes the WASM guest’s crypto/rand (and any other consumer of WASI random_get) deterministic and predictable. If NoDAG execution can run outside strict DON-consensus contexts (e.g., local/dev or NODE-mode operations), consider gating this behind an explicit determinism flag / mode check, or at minimum documenting that cryptographic uses of randomness inside WASM are not safe here and should use a different mechanism.
| func createSeedRandomGet(seed int64) func(caller *wasmtime.Caller, buf, bufLen int32) int32 { | ||
| randSource := rand.New(rand.NewSource(seed)) //nolint:gosec | ||
| return func(caller *wasmtime.Caller, buf, bufLen int32) int32 { | ||
| randOutput := make([]byte, bufLen) | ||
| if _, err := io.ReadAtLeast(randSource, randOutput, int(bufLen)); err != nil { | ||
| return ErrnoFault | ||
| } | ||
|
|
||
| if n := wasmWrite(caller, randOutput, buf, bufLen); n != int64(len(randOutput)) { | ||
| return ErrnoFault |
There was a problem hiding this comment.
createSeedRandomGet trusts bufLen from the guest: if bufLen is negative (possible due to u32→i32 wrapping) make([]byte, bufLen) will panic; if bufLen is very large it can trigger large allocations/OOM. Add a guard for bufLen < 0 and consider enforcing a reasonable maximum or writing into WASM memory in bounded chunks to avoid allocating bufLen bytes at once.
| // createSeedRandomGet overrides random_get with a PRNG seeded from the given | ||
| // seed value. Unlike createRandomGet (which reads the seed from ModuleConfig), | ||
| // this accepts the seed directly — intended for the NoDAG path where the seed | ||
| // is derived from the workflow execution ID. | ||
| func createSeedRandomGet(seed int64) func(caller *wasmtime.Caller, buf, bufLen int32) int32 { | ||
| randSource := rand.New(rand.NewSource(seed)) //nolint:gosec | ||
| return func(caller *wasmtime.Caller, buf, bufLen int32) int32 { |
There was a problem hiding this comment.
createSeedRandomGet seeds the PRNG once and advances it across calls, while createRandomGet re-seeds on every random_get invocation (returning the same byte sequence each call). If consumers expect consistent deterministic random_get semantics across DAG vs NoDAG execution paths, consider refactoring to share one implementation and align the per-call behavior.
| // Override random_get with a deterministic PRNG seeded from the workflow | ||
| // execution ID (via exec.donSeed). All nodes in the DON share the same | ||
| // execution ID, so they get the same seed and therefore the same random | ||
| // output. This ensures Go map iteration order — which is randomized via | ||
| // runtime.fastrand seeded from WASI random_get — is identical across nodes, | ||
| // preventing quorum failures from non-deterministic map ranging in WASM. | ||
| err = linker.FuncWrap( | ||
| "wasi_snapshot_preview1", | ||
| "random_get", | ||
| createSeedRandomGet(exec.donSeed), | ||
| ) |
There was a problem hiding this comment.
There’s test coverage for random_get determinism in the legacy DAG path (TestModule_Sandbox_RandomGet), but nothing asserting determinism for the NoDAG linker path. Consider adding a NoDAG-focused test that verifies random_get output is stable across runs for a fixed workflow execution ID (and changes when the execution ID changes).
Adds a standard test (TestStandardMapIteration) that creates a 10-entry map inside WASM, iterates it, and returns the key order. The host test verifies that: - Two executions with the same workflow execution ID produce identical map iteration order (proving random_get seeding is deterministic). - A different execution ID produces different order (proving the seed actually varies). This is a regression test for the random_get override added to the NoDAG WASI linker path.
Both createRandomGet and createSeedRandomGet trusted the guest-supplied bufLen without validation. A negative value (from u32→i32 wrapping) would panic in make(), and a very large value could OOM the host. Extract shared fillWasmMemRand helper that: - Returns ErrnoSuccess for bufLen == 0 (no-op) - Returns ErrnoInval for bufLen < 0 - Bounds-checks buf+bufLen against actual WASM linear memory - Writes PRNG output directly into the WASM memory slice, avoiding any host-side allocation proportional to bufLen
This reverts commit f2c3cc0.
Problem
The v2/NoDAG WASM execution path (
newWasiLinker) callsDefineWasi()but never overrides therandom_getWASI function. This means the WASM guest receives real OS entropy, causing Go'sruntime.fastrand— and therefore map iteration order — to differ across nodes in the DON.When a workflow iterates a map (e.g. response headers from a previous step) and uses the result in a downstream capability call, each node produces a different request body, leading to divergent behavior (e.g., in conf. HTTP the
requestHashvalues at the enclave host differ and preventing quorum).History
random_getoverride was added to the legacy DAG path (newDagWasiLinker) viaDeterminismConfig(feat(wasm): override random_get #831)newWasiLinker) was written from scratch and only shadowedpoll_oneoffandclock_time_get. Therandom_getoverride was never ported.random_seedenv function was added (Seed random for modes #1236), giving workflows an explicit API for deterministic seeds. But this doesn't affect the WASIrandom_getsyscall that Go's runtime uses internally.Fix
Override
random_getinnewWasiLinkerwith a deterministic PRNG seeded fromexec.donSeed(FNV hash of the workflow execution ID). Since all nodes share the same execution ID for a given run, they get identical random output and therefore identical map iteration order.Adds a new
createSeedRandomGet(seed)helper that takes the seed directly, unlike the existingcreateRandomGet(*ModuleConfig)which reads from config.Testing
All existing tests pass, including
TestModule_Sandbox_RandomGet(which tests the legacy DAG path and is unaffected by this change).