Summary
The Loop runner builds a partial-success results array (with nil for failed iterations) but then discards it on the error return path, causing the SDK to receive data: null instead of the per-iteration array.
This breaks the documented contract where a Loop with one failing inner-node iteration and one successful iteration should return [null, {…}] and report partial success.
Reproduction
Test in ava-sdk-js:
// tests/nodes/loopNode.test.ts:2380
test("should return flattened data format for contract_read runner in loop", ...)
The test creates a Loop whose Runner is a ContractRead with two contracts:
- Chainlink ETH/USD price feed (does not have
name()/symbol()) → expected to revert per iteration → expected null entry
- USDC (has
name()/symbol()) → expected to succeed → expected {name: "USDC", symbol: "USDC"} entry
Expected SDK response:
{
success: true, // or partial — see "Open question" below
data: [null, { name: "USDC", symbol: "USDC" }],
...
}
Actual SDK response:
{
success: false,
data: null,
error: 'invalid request: contract call failed: execution reverted',
errorCode: undefined,
metadata: undefined,
executionContext: undefined
}
The test fails deterministically in ~1.4s on every run, both in isolation and as part of yarn test:nodes.
Root cause
In core/taskengine/vm.go, the loop execution path:
// Lines ~4711-4738 (parallel) and ~4771-4796 (sequential)
for i, resultChannel := range resultChannels {
select {
case result := <-resultChannel:
if result.Error != nil {
success = false
if firstError == nil {
firstError = result.Error
}
log.WriteString(fmt.Sprintf("\nError in iteration %d: %s", i, result.Error.Error()))
// ← results[i] is left as nil (intentional partial-success placeholder)
} else {
results[i] = result.Data
}
// ...
case <-time.After(iterationTimeout):
// similar
}
}
After the loop, partial-success aggregation works correctly:
// Lines ~4811-4824
jsonSerializableResults := make([]interface{}, len(results))
for i, result := range results {
if result != nil {
jsonSerializableResults[i] = convertToJSONCompatible(result)
} else {
jsonSerializableResults[i] = nil // ← preserves the per-iteration null
}
}
outputValue, err := structpb.NewValue(jsonSerializableResults)
// ...
loopOutput := &avsproto.LoopNode_Output{Data: outputValue}
s.OutputData = &avsproto.Execution_Step_Loop{Loop: loopOutput}
At this point s.OutputData correctly holds [null, {name: "USDC", symbol: "USDC"}]. But then:
// Lines ~4833-4836
if !success && firstError != nil {
finalizeStep(s, false, nil, firstError.Error(), log.String())
return s, firstError
}
finalizeStep(s, true, nil, "", log.String())
return s, nil
The loop returns firstError whenever any iteration failed, and the runNodeWithInputs / runNodeImmediately response builder sees the non-nil error and discards s.OutputData, replacing the response's data field with null. The partial-success array that was just built is silently thrown away.
Proposed fix
The intent of results[i] = nil on a per-iteration error is clearly partial-success bookkeeping. The fix is to honor that intent on the return path:
- Distinguish between "loop infrastructure failure" (queue submit error, timeout collecting results, internal panic) and "per-iteration runner failure" (the inner ContractRead/ContractWrite reverted).
- For per-iteration runner failures, finalize the step as success or partial-success, preserve
OutputData, and return (s, nil) so downstream sees the data.
- For infrastructure failures, keep the current behavior (return the error).
Sketch:
allFailed := success == false && countNonNilResults(results) == 0
hardInfraFailure := loopSetupErr != nil // not the per-iteration errors
if hardInfraFailure {
// current behavior — propagate
finalizeStep(s, false, nil, firstError.Error(), log.String())
return s, firstError
}
if allFailed {
// every iteration failed — finalize as failed but still preserve the
// results array so the client can see which iterations failed and why
finalizeStep(s, false, nil, firstError.Error(), log.String())
return s, nil // ← key change: return nil error so OutputData is preserved
}
// At least one iteration succeeded → partial or full success
finalizeStep(s, true, nil, "", log.String())
return s, nil
The exact success: true vs success: partial decision is a separate API choice (see "Open question" below), but at minimum the response data field must contain the per-iteration array, not null.
Open question: success vs partial-success vs failure
Three possible response shapes for "one iteration succeeded, one failed":
| Choice |
success field |
data field |
error field |
| A |
true |
[null, {…}] |
"" |
| B |
partial (new tri-state) |
[null, {…}] |
"iteration 0 failed: …" |
| C |
false |
[null, {…}] |
"iteration 0 failed: …" |
Currently the operator effectively does D: success: false, data: null, error: "..." — which loses information.
The test expects A (success: true), suggesting the convention is "if the loop ran to completion, it succeeded; per-iteration outcomes are reflected in the data array, not the top-level success flag." That seems consistent with how loop runners are commonly used (you want to process all items and inspect failures after, not bail on first error). I'd recommend A, but B or C also fix the bug as long as data is preserved.
Impact
ava-sdk-js test loopNode > flattened data format for contract_read runner in loop has been failing deterministically since at least the 720b0a2 commit that added it. Currently the only test exercising the documented per-iteration partial-failure contract.
- Any production workflow that uses
Loop > ContractRead against a heterogeneous list of contracts (e.g., reading metadata for a list of arbitrary tokens, some of which might not implement the requested method) silently loses all results when any one contract reverts. The user has no way to know which contracts succeeded — the entire array is null.
- Symmetric bug likely exists in
Loop > ContractWrite and Loop > GraphQL/Loop > REST runners — anywhere the per-iteration runner can fail individually. Worth grepping for the same pattern in those code paths.
Repro environment
Related code
core/taskengine/vm.go — loop result aggregation (lines ~4690-4840)
core/taskengine/loop_helpers.go — per-iteration template processing
- The downstream
runNodeWithInputs response builder that drops OutputData when the step returns a non-nil error
Summary
The Loop runner builds a partial-success results array (with
nilfor failed iterations) but then discards it on the error return path, causing the SDK to receivedata: nullinstead of the per-iteration array.This breaks the documented contract where a Loop with one failing inner-node iteration and one successful iteration should return
[null, {…}]and report partial success.Reproduction
Test in
ava-sdk-js:The test creates a Loop whose
Runneris aContractReadwith two contracts:name()/symbol()) → expected to revert per iteration → expected null entryname()/symbol()) → expected to succeed → expected{name: "USDC", symbol: "USDC"}entryExpected SDK response:
Actual SDK response:
The test fails deterministically in ~1.4s on every run, both in isolation and as part of
yarn test:nodes.Root cause
In
core/taskengine/vm.go, the loop execution path:After the loop, partial-success aggregation works correctly:
At this point
s.OutputDatacorrectly holds[null, {name: "USDC", symbol: "USDC"}]. But then:The loop returns
firstErrorwhenever any iteration failed, and therunNodeWithInputs/runNodeImmediatelyresponse builder sees the non-nil error and discardss.OutputData, replacing the response'sdatafield withnull. The partial-success array that was just built is silently thrown away.Proposed fix
The intent of
results[i] = nilon a per-iteration error is clearly partial-success bookkeeping. The fix is to honor that intent on the return path:OutputData, and return(s, nil)so downstream sees the data.Sketch:
The exact
success: truevssuccess: partialdecision is a separate API choice (see "Open question" below), but at minimum the responsedatafield must contain the per-iteration array, notnull.Open question: success vs partial-success vs failure
Three possible response shapes for "one iteration succeeded, one failed":
true[null, {…}]""partial(new tri-state)[null, {…}]"iteration 0 failed: …"false[null, {…}]"iteration 0 failed: …"Currently the operator effectively does D:
success: false, data: null, error: "..."— which loses information.The test expects A (
success: true), suggesting the convention is "if the loop ran to completion, it succeeded; per-iteration outcomes are reflected in the data array, not the top-level success flag." That seems consistent with how loop runners are commonly used (you want to process all items and inspect failures after, not bail on first error). I'd recommend A, but B or C also fix the bug as long asdatais preserved.Impact
ava-sdk-jstestloopNode > flattened data format for contract_read runner in loophas been failing deterministically since at least the720b0a2commit that added it. Currently the only test exercising the documented per-iteration partial-failure contract.Loop > ContractReadagainst a heterogeneous list of contracts (e.g., reading metadata for a list of arbitrary tokens, some of which might not implement the requested method) silently loses all results when any one contract reverts. The user has no way to know which contracts succeeded — the entire array isnull.Loop > ContractWriteandLoop > GraphQL/Loop > RESTrunners — anywhere the per-iteration runner can fail individually. Worth grepping for the same pattern in those code paths.Repro environment
staging(verified byte-identical tomainfor the affected file paths)staging, EigenLayer-AVS PR release: staging → main (event trigger fixes, fee classification, JWT API key, sentry logging) #509 mergedava-sdk-js@stagingPR Define migration process and perform the first migration for second to ms #207 test suite (TEST_ENV=dev yarn test:nodes)Related code
core/taskengine/vm.go— loop result aggregation (lines ~4690-4840)core/taskengine/loop_helpers.go— per-iteration template processingrunNodeWithInputsresponse builder that dropsOutputDatawhen the step returns a non-nil error