fix: loop node reports iteration failures as partial success#518
Conversation
…success Previously, per-iteration runner errors (e.g. ERC20 transfer revert) left the loop step as success=true with nil entries in the output array. This caused AnalyzeExecutionResult to report the overall execution as "success" even when all loop iterations failed. Now the loop step counts successful vs failed iterations after completion: - If any iterations failed, the step reports success=false with a message like "2 of 2 iterations failed: ERC20: transfer amount exceeds balance" - Output data is still preserved with nil entries for failed iterations - AnalyzeExecutionResult correctly detects this as partial_success The loop still returns nil error (not a Go error) because the loop itself ran to completion — only the inner iterations failed.
The inner iteration error already contains 'invalid request: ' from the contract write processor. finalizeStep wraps the message again with NewInvalidRequestError, producing double prefixes like: 'invalid request: invalid request: ERC20: transfer amount exceeds balance' Strip the prefix from the inner error before building the loop error message.
Pass the loop error as the err parameter to finalizeStep instead of errorMessage, so it uses err.Error() directly and skips the NewInvalidRequestError wrapper. The step.Error is now simply: '2 of 2 iterations failed: ERC20: transfer amount exceeds balance'
There was a problem hiding this comment.
Pull request overview
Updates loop-node execution semantics so iteration-level runner failures cause the loop step to report success: false (while still preserving per-iteration output with nil placeholders), enabling execution-level partial_success detection.
Changes:
- Mark loop step as failed when any iteration fails, while retaining
OutputDatafor partial inspection. - Build a cleaner aggregated error message summarizing iteration failures.
- Update the partial-failure loop test to assert
step.Success == falseand validate the new error text.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| core/taskengine/vm.go | Changes loop step success/error reporting based on iteration outcomes and formats aggregated error messaging. |
| core/taskengine/manual_trigger_loop_test.go | Updates regression test expectations for loop partial-failure behavior and error message content. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Count successful vs failed iterations to determine step status. | ||
| // Per-iteration runner errors (e.g. a contract call reverting) are reflected | ||
| // as nil entries in the results array. The loop ran to completion, so we | ||
| // always preserve OutputData for the client to inspect partial results. | ||
| // See AvaProtocol/EigenLayer-AVS#511. | ||
| iterationSuccessCount := 0 | ||
| iterationFailCount := 0 | ||
| for _, result := range results { | ||
| if result != nil { | ||
| iterationSuccessCount++ | ||
| } else { |
There was a problem hiding this comment.
iterationSuccessCount is declared but never used, which will fail compilation in Go. Remove it, or use it (e.g., for logging or to build the error message).
| // Count successful vs failed iterations to determine step status. | |
| // Per-iteration runner errors (e.g. a contract call reverting) are reflected | |
| // as nil entries in the results array. The loop ran to completion, so we | |
| // always preserve OutputData for the client to inspect partial results. | |
| // See AvaProtocol/EigenLayer-AVS#511. | |
| iterationSuccessCount := 0 | |
| iterationFailCount := 0 | |
| for _, result := range results { | |
| if result != nil { | |
| iterationSuccessCount++ | |
| } else { | |
| // Count failed iterations to determine step status. | |
| // Per-iteration runner errors (e.g. a contract call reverting) are reflected | |
| // as nil entries in the results array. The loop ran to completion, so we | |
| // always preserve OutputData for the client to inspect partial results. | |
| // See AvaProtocol/EigenLayer-AVS#511. | |
| iterationFailCount := 0 | |
| for _, result := range results { | |
| if result == nil { |
| // Count successful vs failed iterations to determine step status. | ||
| // Per-iteration runner errors (e.g. a contract call reverting) are reflected | ||
| // as nil entries in the results array. The loop ran to completion, so we | ||
| // always preserve OutputData for the client to inspect partial results. | ||
| // See AvaProtocol/EigenLayer-AVS#511. | ||
| iterationSuccessCount := 0 | ||
| iterationFailCount := 0 | ||
| for _, result := range results { | ||
| if result != nil { | ||
| iterationSuccessCount++ | ||
| } else { | ||
| iterationFailCount++ | ||
| } | ||
| } | ||
|
|
||
| if iterationFailCount > 0 && firstError != nil { | ||
| // Some or all iterations failed — mark the loop step as failed so | ||
| // AnalyzeExecutionResult can detect partial_success at the execution level. | ||
| // Pass the error via the `err` parameter (not `errorMessage`) so that | ||
| // finalizeStep uses err.Error() directly without wrapping it in | ||
| // NewInvalidRequestError which adds an "invalid request: " prefix. | ||
| innerMsg := strings.TrimPrefix(firstError.Error(), "invalid request: ") | ||
| loopErr := fmt.Errorf("%d of %d iterations failed: %s", iterationFailCount, len(results), innerMsg) |
There was a problem hiding this comment.
Iteration failures are currently inferred by counting nil entries in results. This can over-count failures when an iteration succeeds but legitimately produces a nil output (e.g., extractResultData can return nil even when step.Success is true). Consider tracking failures using the per-iteration result.Error / step.Success while collecting iteration results, rather than using result == nil as the failure signal.
| // Count successful vs failed iterations to determine step status. | |
| // Per-iteration runner errors (e.g. a contract call reverting) are reflected | |
| // as nil entries in the results array. The loop ran to completion, so we | |
| // always preserve OutputData for the client to inspect partial results. | |
| // See AvaProtocol/EigenLayer-AVS#511. | |
| iterationSuccessCount := 0 | |
| iterationFailCount := 0 | |
| for _, result := range results { | |
| if result != nil { | |
| iterationSuccessCount++ | |
| } else { | |
| iterationFailCount++ | |
| } | |
| } | |
| if iterationFailCount > 0 && firstError != nil { | |
| // Some or all iterations failed — mark the loop step as failed so | |
| // AnalyzeExecutionResult can detect partial_success at the execution level. | |
| // Pass the error via the `err` parameter (not `errorMessage`) so that | |
| // finalizeStep uses err.Error() directly without wrapping it in | |
| // NewInvalidRequestError which adds an "invalid request: " prefix. | |
| innerMsg := strings.TrimPrefix(firstError.Error(), "invalid request: ") | |
| loopErr := fmt.Errorf("%d of %d iterations failed: %s", iterationFailCount, len(results), innerMsg) | |
| // Do not infer iteration failures from nil result entries: a successful | |
| // iteration can legitimately produce a nil output. Infrastructure failures | |
| // are handled above; any remaining firstError here represents one or more | |
| // iteration-level failures while still preserving OutputData for the client | |
| // to inspect partial results. | |
| if firstError != nil { | |
| // Some or all iterations failed — mark the loop step as failed so | |
| // AnalyzeExecutionResult can detect partial_success at the execution level. | |
| // Pass the error via the `err` parameter (not `errorMessage`) so that | |
| // finalizeStep uses err.Error() directly without wrapping it in | |
| // NewInvalidRequestError which adds an "invalid request: " prefix. | |
| innerMsg := strings.TrimPrefix(firstError.Error(), "invalid request: ") | |
| loopErr := fmt.Errorf("one or more iterations failed: %s", innerMsg) |
| // finalizeStep uses err.Error() directly without wrapping it in | ||
| // NewInvalidRequestError which adds an "invalid request: " prefix. | ||
| innerMsg := strings.TrimPrefix(firstError.Error(), "invalid request: ") | ||
| loopErr := fmt.Errorf("%d of %d iterations failed: %s", iterationFailCount, len(results), innerMsg) |
There was a problem hiding this comment.
loopErr is created with fmt.Errorf, so finalizeStep will set step.ErrorCode to ERROR_CODE_UNSPECIFIED (since GetErrorCode only recognizes StructuredError). If clients rely on error codes, consider using NewStructuredError(avsproto.ErrorCode_INVALID_REQUEST, <message>) (or another appropriate code) to keep a clean message without the invalid request: prefix while preserving the error code.
| loopErr := fmt.Errorf("%d of %d iterations failed: %s", iterationFailCount, len(results), innerMsg) | |
| loopErr := NewStructuredError( | |
| avsproto.ErrorCode_INVALID_REQUEST, | |
| "%d of %d iterations failed: %s", | |
| iterationFailCount, | |
| len(results), | |
| innerMsg, | |
| ) |
| step, err := vm.RunNodeWithInputs(node, inputVariables) | ||
|
|
||
| // Per AvaProtocol/EigenLayer-AVS#511, per-iteration runner failures do not | ||
| // fail the loop step. The loop ran to completion, so step.Success is true | ||
| // and the per-iteration outcomes are reflected in the data array (nil for | ||
| // failed iterations). | ||
| // When some iterations fail, the loop step reports success=false with a | ||
| // descriptive error so that AnalyzeExecutionResult detects partial_success. | ||
| // The loop still ran to completion and preserves OutputData with nil entries | ||
| // for failed iterations. | ||
| require.NotNil(t, step, "Execution step should not be nil") | ||
| assert.True(t, step.Success, "Loop step should succeed when it ran to completion, even if an iteration failed") | ||
| assert.Empty(t, step.Error, "Loop step error should be empty when loop ran to completion") | ||
| assert.False(t, step.Success, "Loop step should report failure when iterations failed") | ||
| assert.Contains(t, step.Error, "1 of 2 iterations failed", "Error should describe the partial failure") |
There was a problem hiding this comment.
err is assigned but never used (step, err := vm.RunNodeWithInputs(...)), which will fail compilation. Either assert on err (likely require.NoError(t, err) if infra failures should not occur here) or explicitly ignore it with _.
… assert err - Remove unused iterationSuccessCount variable - Use NewStructuredError with INVALID_REQUEST code instead of fmt.Errorf so error code is preserved for clients - Add require.NoError assertion for loop infrastructure error in test
Co-authored-by: Wei Lin <wei@avaprotocol.org>
Co-authored-by: Wei Lin <wei@avaprotocol.org>
Summary
success: falsewhen any iterations fail, enablingpartial_successat the execution levelsuccess: truewith nil output entries, causing the overall execution to incorrectly reportsuccess2 of 2 iterations failed: ERC20: transfer amount exceeds balance(no redundant prefixes)Test plan
TestLoopNode_ContractWrite_InvalidAddress_PartialFailureupdated and passingpartial_success