Loop runner discards partial-success results array on per-iteration failure, returning data: null instead of [null, {...}]

## Summary

The Loop runner builds a partial-success results array (with `nil` for failed iterations) but then **discards it on the error return path**, causing the SDK to receive `data: null` instead of the per-iteration array.

This breaks the documented contract where a Loop with one failing inner-node iteration and one successful iteration should return `[null, {…}]` and report partial success.

## Reproduction

Test in `ava-sdk-js`:

```ts
// tests/nodes/loopNode.test.ts:2380
test("should return flattened data format for contract_read runner in loop", ...)
```

The test creates a Loop whose `Runner` is a `ContractRead` with two contracts:
1. Chainlink ETH/USD price feed (does **not** have `name()`/`symbol()`) → expected to revert per iteration → expected null entry
2. USDC (has `name()`/`symbol()`) → expected to succeed → expected `{name: "USDC", symbol: "USDC"}` entry

Expected SDK response:
```js
{
  success: true, // or partial — see "Open question" below
  data: [null, { name: "USDC", symbol: "USDC" }],
  ...
}
```

Actual SDK response:
```js
{
  success: false,
  data: null,
  error: 'invalid request: contract call failed: execution reverted',
  errorCode: undefined,
  metadata: undefined,
  executionContext: undefined
}
```

The test fails deterministically in ~1.4s on every run, both in isolation and as part of `yarn test:nodes`.

## Root cause

In `core/taskengine/vm.go`, the loop execution path:

```go
// Lines ~4711-4738 (parallel) and ~4771-4796 (sequential)
for i, resultChannel := range resultChannels {
    select {
    case result := <-resultChannel:
        if result.Error != nil {
            success = false
            if firstError == nil {
                firstError = result.Error
            }
            log.WriteString(fmt.Sprintf("\nError in iteration %d: %s", i, result.Error.Error()))
            // ← results[i] is left as nil (intentional partial-success placeholder)
        } else {
            results[i] = result.Data
        }
        // ...
    case <-time.After(iterationTimeout):
        // similar
    }
}
```

After the loop, **partial-success aggregation works correctly**:

```go
// Lines ~4811-4824
jsonSerializableResults := make([]interface{}, len(results))
for i, result := range results {
    if result != nil {
        jsonSerializableResults[i] = convertToJSONCompatible(result)
    } else {
        jsonSerializableResults[i] = nil  // ← preserves the per-iteration null
    }
}
outputValue, err := structpb.NewValue(jsonSerializableResults)
// ...
loopOutput := &avsproto.LoopNode_Output{Data: outputValue}
s.OutputData = &avsproto.Execution_Step_Loop{Loop: loopOutput}
```

At this point `s.OutputData` correctly holds `[null, {name: "USDC", symbol: "USDC"}]`. But then:

```go
// Lines ~4833-4836
if !success && firstError != nil {
    finalizeStep(s, false, nil, firstError.Error(), log.String())
    return s, firstError
}

finalizeStep(s, true, nil, "", log.String())
return s, nil
```

The loop **returns `firstError` whenever any iteration failed**, and the `runNodeWithInputs` / `runNodeImmediately` response builder sees the non-nil error and discards `s.OutputData`, replacing the response's `data` field with `null`. The partial-success array that was just built is silently thrown away.

## Proposed fix

The intent of `results[i] = nil` on a per-iteration error is clearly **partial-success bookkeeping**. The fix is to honor that intent on the return path:

1. Distinguish between "loop infrastructure failure" (queue submit error, timeout collecting results, internal panic) and "per-iteration runner failure" (the inner ContractRead/ContractWrite reverted).
2. For **per-iteration runner failures**, finalize the step as **success or partial-success**, preserve `OutputData`, and return `(s, nil)` so downstream sees the data.
3. For **infrastructure failures**, keep the current behavior (return the error).

Sketch:

```go
allFailed := success == false && countNonNilResults(results) == 0
hardInfraFailure := loopSetupErr != nil // not the per-iteration errors

if hardInfraFailure {
    // current behavior — propagate
    finalizeStep(s, false, nil, firstError.Error(), log.String())
    return s, firstError
}

if allFailed {
    // every iteration failed — finalize as failed but still preserve the
    // results array so the client can see which iterations failed and why
    finalizeStep(s, false, nil, firstError.Error(), log.String())
    return s, nil // ← key change: return nil error so OutputData is preserved
}

// At least one iteration succeeded → partial or full success
finalizeStep(s, true, nil, "", log.String())
return s, nil
```

The exact `success: true` vs `success: partial` decision is a separate API choice (see "Open question" below), but at minimum the response `data` field must contain the per-iteration array, not `null`.

## Open question: success vs partial-success vs failure

Three possible response shapes for "one iteration succeeded, one failed":

| Choice | success field | data field | error field |
|---|---|---|---|
| A | `true` | `[null, {…}]` | `""` |
| B | `partial` (new tri-state) | `[null, {…}]` | `"iteration 0 failed: …"` |
| C | `false` | `[null, {…}]` | `"iteration 0 failed: …"` |

Currently the operator effectively does **D**: `success: false, data: null, error: "..."` — which loses information.

The test expects A (`success: true`), suggesting the convention is "if the loop ran to completion, it succeeded; per-iteration outcomes are reflected in the data array, not the top-level success flag." That seems consistent with how loop runners are commonly used (you want to process all items and inspect failures after, not bail on first error). I'd recommend **A**, but B or C also fix the bug as long as `data` is preserved.

## Impact

- **`ava-sdk-js` test `loopNode > flattened data format for contract_read runner in loop`** has been failing deterministically since at least the `720b0a2` commit that added it. Currently the only test exercising the documented per-iteration partial-failure contract.
- **Any production workflow** that uses `Loop > ContractRead` against a heterogeneous list of contracts (e.g., reading metadata for a list of arbitrary tokens, some of which might not implement the requested method) silently loses all results when any one contract reverts. The user has no way to know which contracts succeeded — the entire array is `null`.
- **Symmetric bug likely exists** in `Loop > ContractWrite` and `Loop > GraphQL`/`Loop > REST` runners — anywhere the per-iteration runner can fail individually. Worth grepping for the same pattern in those code paths.

## Repro environment

- Branch: `staging` (verified byte-identical to `main` for the affected file paths)
- Chain: Sepolia
- Aggregator: dev build, branch `staging`, EigenLayer-AVS PR #509 merged
- Encountered while running `ava-sdk-js@staging` PR #207 test suite (`TEST_ENV=dev yarn test:nodes`)

## Related code

- `core/taskengine/vm.go` — loop result aggregation (lines ~4690-4840)
- `core/taskengine/loop_helpers.go` — per-iteration template processing
- The downstream `runNodeWithInputs` response builder that drops `OutputData` when the step returns a non-nil error


Choice	success field	data field	error field
A	`true`	`[null, {…}]`	`""`
B	`partial` (new tri-state)	`[null, {…}]`	`"iteration 0 failed: …"`
C	`false`	`[null, {…}]`	`"iteration 0 failed: …"`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loop runner discards partial-success results array on per-iteration failure, returning data: null instead of [null, {...}] #511

Summary

Reproduction

Root cause

Proposed fix

Open question: success vs partial-success vs failure

Impact

Repro environment

Related code

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Loop runner discards partial-success results array on per-iteration failure, returning data: null instead of [null, {...}] #511

Description

Summary

Reproduction

Root cause

Proposed fix

Open question: success vs partial-success vs failure

Impact

Repro environment

Related code

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions