Skip to content

fix: override random_get with deterministic PRNG in NoDAG WASM execution#1967

Closed
vreff wants to merge 4 commits intomainfrom
fix/nodag-deterministic-random-get
Closed

fix: override random_get with deterministic PRNG in NoDAG WASM execution#1967
vreff wants to merge 4 commits intomainfrom
fix/nodag-deterministic-random-get

Conversation

@vreff
Copy link
Copy Markdown
Contributor

@vreff vreff commented Apr 6, 2026

Problem

The v2/NoDAG WASM execution path (newWasiLinker) calls DefineWasi() but never overrides the random_get WASI function. This means the WASM guest receives real OS entropy, causing Go's runtime.fastrand — and therefore map iteration order — to differ across nodes in the DON.

When a workflow iterates a map (e.g. response headers from a previous step) and uses the result in a downstream capability call, each node produces a different request body, leading to divergent behavior (e.g., in conf. HTTP the requestHash values at the enclave host differ and preventing quorum).

History

  • Oct 2024random_get override was added to the legacy DAG path (newDagWasiLinker) via DeterminismConfig (feat(wasm): override random_get #831)
  • ~2025 — The NoDAG v2 linker (newWasiLinker) was written from scratch and only shadowed poll_oneoff and clock_time_get. The random_get override was never ported.
  • Jun 2025 — An SDK-level random_seed env function was added (Seed random for modes #1236), giving workflows an explicit API for deterministic seeds. But this doesn't affect the WASI random_get syscall that Go's runtime uses internally.

Fix

Override random_get in newWasiLinker with a deterministic PRNG seeded from exec.donSeed (FNV hash of the workflow execution ID). Since all nodes share the same execution ID for a given run, they get identical random output and therefore identical map iteration order.

Adds a new createSeedRandomGet(seed) helper that takes the seed directly, unlike the existing createRandomGet(*ModuleConfig) which reads from config.

Testing

All existing tests pass, including TestModule_Sandbox_RandomGet (which tests the legacy DAG path and is unaffected by this change).

The v2/NoDAG WASM execution path uses newWasiLinker which calls
DefineWasi() but never overrides the random_get WASI function. This
means the WASM guest receives real OS entropy, causing Go map
iteration order to differ across nodes in the DON.

When a workflow iterates a map (e.g. response headers) and uses the
result in a downstream capability call, each node produces a different
request body, leading to divergent requestHash values and quorum failure.

Fix: override random_get in newWasiLinker with a deterministic PRNG
seeded from exec.donSeed, which is an FNV hash of the workflow execution
ID. Since all nodes share the same execution ID for a given run, they
get identical random output and therefore identical map iteration order.

The legacy DAG path (newDagWasiLinker) already had this override gated
behind ModuleConfig.Determinism. The NoDAG path was missing it entirely.
@vreff vreff requested a review from a team as a code owner April 6, 2026 16:53
Copilot AI review requested due to automatic review settings April 6, 2026 16:53
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 6, 2026

👋 vreff, thanks for creating this pull request!

To help reviewers, please consider creating future PRs as drafts first. This allows you to self-review and make any final changes before notifying the team.

Once you're ready, you can mark it as "Ready for review" to request feedback. Thanks!

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 6, 2026

📊 API Diff Results

No changes detected for module github.com/smartcontractkit/chainlink-common

View full report

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes non-deterministic behavior in the v2/NoDAG WASM execution linker by overriding the WASI random_get syscall with a deterministic PRNG seeded from the workflow execution ID (exec.donSeed). This ensures Go guest modules (notably runtime.fastrand → map iteration order) behave deterministically across DON nodes and avoids quorum failures caused by divergent downstream request hashes.

Changes:

  • Override wasi_snapshot_preview1.random_get in newWasiLinker using a deterministic PRNG seeded from exec.donSeed.
  • Add createSeedRandomGet(seed int64) helper to provide a seed-driven random_get implementation for the NoDAG path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +50 to +60
// Override random_get with a deterministic PRNG seeded from the workflow
// execution ID (via exec.donSeed). All nodes in the DON share the same
// execution ID, so they get the same seed and therefore the same random
// output. This ensures Go map iteration order — which is randomized via
// runtime.fastrand seeded from WASI random_get — is identical across nodes,
// preventing quorum failures from non-deterministic map ranging in WASM.
err = linker.FuncWrap(
"wasi_snapshot_preview1",
"random_get",
createSeedRandomGet(exec.donSeed),
)
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

random_get override is applied unconditionally in the NoDAG path, which makes the WASM guest’s crypto/rand (and any other consumer of WASI random_get) deterministic and predictable. If NoDAG execution can run outside strict DON-consensus contexts (e.g., local/dev or NODE-mode operations), consider gating this behind an explicit determinism flag / mode check, or at minimum documenting that cryptographic uses of randomness inside WASM are not safe here and should use a different mechanism.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elaborate?

Comment on lines +287 to +296
func createSeedRandomGet(seed int64) func(caller *wasmtime.Caller, buf, bufLen int32) int32 {
randSource := rand.New(rand.NewSource(seed)) //nolint:gosec
return func(caller *wasmtime.Caller, buf, bufLen int32) int32 {
randOutput := make([]byte, bufLen)
if _, err := io.ReadAtLeast(randSource, randOutput, int(bufLen)); err != nil {
return ErrnoFault
}

if n := wasmWrite(caller, randOutput, buf, bufLen); n != int64(len(randOutput)) {
return ErrnoFault
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createSeedRandomGet trusts bufLen from the guest: if bufLen is negative (possible due to u32→i32 wrapping) make([]byte, bufLen) will panic; if bufLen is very large it can trigger large allocations/OOM. Add a guard for bufLen < 0 and consider enforcing a reasonable maximum or writing into WASM memory in bounded chunks to avoid allocating bufLen bytes at once.

Copilot uses AI. Check for mistakes.
Comment on lines +283 to +289
// createSeedRandomGet overrides random_get with a PRNG seeded from the given
// seed value. Unlike createRandomGet (which reads the seed from ModuleConfig),
// this accepts the seed directly — intended for the NoDAG path where the seed
// is derived from the workflow execution ID.
func createSeedRandomGet(seed int64) func(caller *wasmtime.Caller, buf, bufLen int32) int32 {
randSource := rand.New(rand.NewSource(seed)) //nolint:gosec
return func(caller *wasmtime.Caller, buf, bufLen int32) int32 {
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createSeedRandomGet seeds the PRNG once and advances it across calls, while createRandomGet re-seeds on every random_get invocation (returning the same byte sequence each call). If consumers expect consistent deterministic random_get semantics across DAG vs NoDAG execution paths, consider refactoring to share one implementation and align the per-call behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +50 to +60
// Override random_get with a deterministic PRNG seeded from the workflow
// execution ID (via exec.donSeed). All nodes in the DON share the same
// execution ID, so they get the same seed and therefore the same random
// output. This ensures Go map iteration order — which is randomized via
// runtime.fastrand seeded from WASI random_get — is identical across nodes,
// preventing quorum failures from non-deterministic map ranging in WASM.
err = linker.FuncWrap(
"wasi_snapshot_preview1",
"random_get",
createSeedRandomGet(exec.donSeed),
)
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s test coverage for random_get determinism in the legacy DAG path (TestModule_Sandbox_RandomGet), but nothing asserting determinism for the NoDAG linker path. Consider adding a NoDAG-focused test that verifies random_get output is stable across runs for a fixed workflow execution ID (and changes when the execution ID changes).

Copilot uses AI. Check for mistakes.
Adds a standard test (TestStandardMapIteration) that creates a 10-entry
map inside WASM, iterates it, and returns the key order. The host test
verifies that:
- Two executions with the same workflow execution ID produce identical
  map iteration order (proving random_get seeding is deterministic).
- A different execution ID produces different order (proving the seed
  actually varies).

This is a regression test for the random_get override added to the NoDAG
WASI linker path.
Both createRandomGet and createSeedRandomGet trusted the guest-supplied
bufLen without validation. A negative value (from u32→i32 wrapping)
would panic in make(), and a very large value could OOM the host.

Extract shared fillWasmMemRand helper that:
- Returns ErrnoSuccess for bufLen == 0 (no-op)
- Returns ErrnoInval for bufLen < 0
- Bounds-checks buf+bufLen against actual WASM linear memory
- Writes PRNG output directly into the WASM memory slice, avoiding
  any host-side allocation proportional to bufLen
@vreff vreff closed this Apr 8, 2026
@vreff vreff deleted the fix/nodag-deterministic-random-get branch April 8, 2026 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants