Skip to content

fix: loop node reports iteration failures as partial success#518

Merged
chrisli30 merged 4 commits into
stagingfrom
fix/loop-partial-success-status
Apr 10, 2026
Merged

fix: loop node reports iteration failures as partial success#518
chrisli30 merged 4 commits into
stagingfrom
fix/loop-partial-success-status

Conversation

@chrisli30
Copy link
Copy Markdown
Member

Summary

  • Loop node now reports success: false when any iterations fail, enabling partial_success at the execution level
  • Previously, per-iteration runner errors (e.g. ERC20 revert) left the loop step as success: true with nil output entries, causing the overall execution to incorrectly report success
  • Output data is still preserved with nil entries for failed iterations so clients can inspect partial results
  • Error message is clean: 2 of 2 iterations failed: ERC20: transfer amount exceeds balance (no redundant prefixes)

Test plan

  • TestLoopNode_ContractWrite_InvalidAddress_PartialFailure updated and passing
  • All existing loop, contract write, and Tenderly tests pass
  • Manually verified on local aggregator-sepolia — Telegram notification shows clean error, execution status is partial_success

…success

Previously, per-iteration runner errors (e.g. ERC20 transfer revert) left
the loop step as success=true with nil entries in the output array. This
caused AnalyzeExecutionResult to report the overall execution as "success"
even when all loop iterations failed.

Now the loop step counts successful vs failed iterations after completion:
- If any iterations failed, the step reports success=false with a message
  like "2 of 2 iterations failed: ERC20: transfer amount exceeds balance"
- Output data is still preserved with nil entries for failed iterations
- AnalyzeExecutionResult correctly detects this as partial_success

The loop still returns nil error (not a Go error) because the loop itself
ran to completion — only the inner iterations failed.
The inner iteration error already contains 'invalid request: ' from the
contract write processor. finalizeStep wraps the message again with
NewInvalidRequestError, producing double prefixes like:
  'invalid request: invalid request: ERC20: transfer amount exceeds balance'

Strip the prefix from the inner error before building the loop error message.
Pass the loop error as the err parameter to finalizeStep instead of
errorMessage, so it uses err.Error() directly and skips the
NewInvalidRequestError wrapper. The step.Error is now simply:
  '2 of 2 iterations failed: ERC20: transfer amount exceeds balance'
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates loop-node execution semantics so iteration-level runner failures cause the loop step to report success: false (while still preserving per-iteration output with nil placeholders), enabling execution-level partial_success detection.

Changes:

  • Mark loop step as failed when any iteration fails, while retaining OutputData for partial inspection.
  • Build a cleaner aggregated error message summarizing iteration failures.
  • Update the partial-failure loop test to assert step.Success == false and validate the new error text.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
core/taskengine/vm.go Changes loop step success/error reporting based on iteration outcomes and formats aggregated error messaging.
core/taskengine/manual_trigger_loop_test.go Updates regression test expectations for loop partial-failure behavior and error message content.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread core/taskengine/vm.go Outdated
Comment on lines +4849 to +4859
// Count successful vs failed iterations to determine step status.
// Per-iteration runner errors (e.g. a contract call reverting) are reflected
// as nil entries in the results array. The loop ran to completion, so we
// always preserve OutputData for the client to inspect partial results.
// See AvaProtocol/EigenLayer-AVS#511.
iterationSuccessCount := 0
iterationFailCount := 0
for _, result := range results {
if result != nil {
iterationSuccessCount++
} else {
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iterationSuccessCount is declared but never used, which will fail compilation in Go. Remove it, or use it (e.g., for logging or to build the error message).

Suggested change
// Count successful vs failed iterations to determine step status.
// Per-iteration runner errors (e.g. a contract call reverting) are reflected
// as nil entries in the results array. The loop ran to completion, so we
// always preserve OutputData for the client to inspect partial results.
// See AvaProtocol/EigenLayer-AVS#511.
iterationSuccessCount := 0
iterationFailCount := 0
for _, result := range results {
if result != nil {
iterationSuccessCount++
} else {
// Count failed iterations to determine step status.
// Per-iteration runner errors (e.g. a contract call reverting) are reflected
// as nil entries in the results array. The loop ran to completion, so we
// always preserve OutputData for the client to inspect partial results.
// See AvaProtocol/EigenLayer-AVS#511.
iterationFailCount := 0
for _, result := range results {
if result == nil {

Copilot uses AI. Check for mistakes.
Comment thread core/taskengine/vm.go Outdated
Comment on lines +4849 to +4871
// Count successful vs failed iterations to determine step status.
// Per-iteration runner errors (e.g. a contract call reverting) are reflected
// as nil entries in the results array. The loop ran to completion, so we
// always preserve OutputData for the client to inspect partial results.
// See AvaProtocol/EigenLayer-AVS#511.
iterationSuccessCount := 0
iterationFailCount := 0
for _, result := range results {
if result != nil {
iterationSuccessCount++
} else {
iterationFailCount++
}
}

if iterationFailCount > 0 && firstError != nil {
// Some or all iterations failed — mark the loop step as failed so
// AnalyzeExecutionResult can detect partial_success at the execution level.
// Pass the error via the `err` parameter (not `errorMessage`) so that
// finalizeStep uses err.Error() directly without wrapping it in
// NewInvalidRequestError which adds an "invalid request: " prefix.
innerMsg := strings.TrimPrefix(firstError.Error(), "invalid request: ")
loopErr := fmt.Errorf("%d of %d iterations failed: %s", iterationFailCount, len(results), innerMsg)
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iteration failures are currently inferred by counting nil entries in results. This can over-count failures when an iteration succeeds but legitimately produces a nil output (e.g., extractResultData can return nil even when step.Success is true). Consider tracking failures using the per-iteration result.Error / step.Success while collecting iteration results, rather than using result == nil as the failure signal.

Suggested change
// Count successful vs failed iterations to determine step status.
// Per-iteration runner errors (e.g. a contract call reverting) are reflected
// as nil entries in the results array. The loop ran to completion, so we
// always preserve OutputData for the client to inspect partial results.
// See AvaProtocol/EigenLayer-AVS#511.
iterationSuccessCount := 0
iterationFailCount := 0
for _, result := range results {
if result != nil {
iterationSuccessCount++
} else {
iterationFailCount++
}
}
if iterationFailCount > 0 && firstError != nil {
// Some or all iterations failed — mark the loop step as failed so
// AnalyzeExecutionResult can detect partial_success at the execution level.
// Pass the error via the `err` parameter (not `errorMessage`) so that
// finalizeStep uses err.Error() directly without wrapping it in
// NewInvalidRequestError which adds an "invalid request: " prefix.
innerMsg := strings.TrimPrefix(firstError.Error(), "invalid request: ")
loopErr := fmt.Errorf("%d of %d iterations failed: %s", iterationFailCount, len(results), innerMsg)
// Do not infer iteration failures from nil result entries: a successful
// iteration can legitimately produce a nil output. Infrastructure failures
// are handled above; any remaining firstError here represents one or more
// iteration-level failures while still preserving OutputData for the client
// to inspect partial results.
if firstError != nil {
// Some or all iterations failed — mark the loop step as failed so
// AnalyzeExecutionResult can detect partial_success at the execution level.
// Pass the error via the `err` parameter (not `errorMessage`) so that
// finalizeStep uses err.Error() directly without wrapping it in
// NewInvalidRequestError which adds an "invalid request: " prefix.
innerMsg := strings.TrimPrefix(firstError.Error(), "invalid request: ")
loopErr := fmt.Errorf("one or more iterations failed: %s", innerMsg)

Copilot uses AI. Check for mistakes.
Comment thread core/taskengine/vm.go Outdated
// finalizeStep uses err.Error() directly without wrapping it in
// NewInvalidRequestError which adds an "invalid request: " prefix.
innerMsg := strings.TrimPrefix(firstError.Error(), "invalid request: ")
loopErr := fmt.Errorf("%d of %d iterations failed: %s", iterationFailCount, len(results), innerMsg)
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loopErr is created with fmt.Errorf, so finalizeStep will set step.ErrorCode to ERROR_CODE_UNSPECIFIED (since GetErrorCode only recognizes StructuredError). If clients rely on error codes, consider using NewStructuredError(avsproto.ErrorCode_INVALID_REQUEST, <message>) (or another appropriate code) to keep a clean message without the invalid request: prefix while preserving the error code.

Suggested change
loopErr := fmt.Errorf("%d of %d iterations failed: %s", iterationFailCount, len(results), innerMsg)
loopErr := NewStructuredError(
avsproto.ErrorCode_INVALID_REQUEST,
"%d of %d iterations failed: %s",
iterationFailCount,
len(results),
innerMsg,
)

Copilot uses AI. Check for mistakes.
Comment on lines 428 to +436
step, err := vm.RunNodeWithInputs(node, inputVariables)

// Per AvaProtocol/EigenLayer-AVS#511, per-iteration runner failures do not
// fail the loop step. The loop ran to completion, so step.Success is true
// and the per-iteration outcomes are reflected in the data array (nil for
// failed iterations).
// When some iterations fail, the loop step reports success=false with a
// descriptive error so that AnalyzeExecutionResult detects partial_success.
// The loop still ran to completion and preserves OutputData with nil entries
// for failed iterations.
require.NotNil(t, step, "Execution step should not be nil")
assert.True(t, step.Success, "Loop step should succeed when it ran to completion, even if an iteration failed")
assert.Empty(t, step.Error, "Loop step error should be empty when loop ran to completion")
assert.False(t, step.Success, "Loop step should report failure when iterations failed")
assert.Contains(t, step.Error, "1 of 2 iterations failed", "Error should describe the partial failure")
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err is assigned but never used (step, err := vm.RunNodeWithInputs(...)), which will fail compilation. Either assert on err (likely require.NoError(t, err) if infra failures should not occur here) or explicitly ignore it with _.

Copilot uses AI. Check for mistakes.
… assert err

- Remove unused iterationSuccessCount variable
- Use NewStructuredError with INVALID_REQUEST code instead of fmt.Errorf
  so error code is preserved for clients
- Add require.NoError assertion for loop infrastructure error in test
@chrisli30 chrisli30 merged commit 4b3edd9 into staging Apr 10, 2026
18 checks passed
@chrisli30 chrisli30 deleted the fix/loop-partial-success-status branch April 10, 2026 08:59
chrisli30 added a commit that referenced this pull request Apr 10, 2026
Co-authored-by: Wei Lin <wei@avaprotocol.org>
chrisli30 added a commit that referenced this pull request Apr 10, 2026
Co-authored-by: Wei Lin <wei@avaprotocol.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants