Feature foreach unordered validation by rushikeshbathe096 · Pull Request #3062 · Netflix/metaflow

rushikeshbathe096 · 2026-03-30T15:28:13Z

PR Type

[ x] Bug fix
New feature
Core Runtime change (higher bar -- see CONTRIBUTING.md)
Docs / tooling
Refactoring

Summary

When a set or frozenset is used to foreach Metaflow silently produces incorrect results— some tasks run on wrong inputs and others are skipped entirely. Now this PR adds early validation in FlowSpec.next() to reject unordered collections (set, frozenset) with a clear error message suggesting list().

Issue

Fixes #469

Reproduction

Runtime: local

Commands to run:

from metaflow import FlowSpec, step

class BugFlow(FlowSpec):
    @step
    def start(self):
        self.items = {"a", "b", "c"}  # set — no guaranteed order
        self.next(self.process, foreach="items")

    @step
    def process(self):
        self.next(self.join)

    @step
    def join(self, inputs):
        self.next(self.end)

    @step
    def end(self):
        pass

if __name__ == "__main__":
    BugFlow()

python flow.py run

Where evidence shows up: parent console — tasks silently run on wrong inputs or are skipped with no error.

Before (error / log snippet)

``` # No error raised — tasks execute silently on wrong or duplicate inputs ```

After (evidence that fix works)

``` MetaflowException: Foreach variable *self.items* in step *start* is a set, which has no guaranteed iteration order. This can cause tasks to run on wrong inputs or some inputs to be skipped entirely. Wrap it in list() first, e.g.: self.items = list(self.items) ```

Root Cause

FlowSpec.next() expects the foreach input to have a fixed order, since
tasks are assigned to elements based on their position.Sets and frozensets don’t guarantee a consistent order, so the same data can be processed in a different order across runs. This can cause tasks to receive the wrong inputs or some inputs to be skipped. Since this happens silently without any error, it is hard to notice and debug. Since this failure is silent, it is particularly difficult to debug in practice.

Why This Fix Is Correct

The fix adds _validate_foreach_type() which raises InvalidNextException
if a set or frozenset is used before any tasks are created.Instead of running with wrong results, it stops early and shows a clear message telling the user to convert it to a list. Lists and tuples keep working the same, so existing valid usage is not affected.

Failure Modes Considered

Existing flows using lists/tuples — no change, they work as before.
UnboundedForeachInput — not affected, since validation happens before this logic.
Silent wrong execution — now replaced with a clear error, so users know what went wrong.

Tests

Unit tests added/updated
[ x] Reproduction script provided
CI passes
If tests are impractical: N/A

5 unit tests in test/unit/test_foreach_unordered.py:

test_set_raises — set raises InvalidNextException
test_frozenset_raises — frozenset raises InvalidNextException
test_error_message_contains_fix_hint — error message contains list() hint
test_list_does_not_raise — list passes through cleanly
test_tuple_does_not_raise — tuple passes through cleanly

Non-Goals

Does not change behavior for any other collection types
Does not auto-convert sets to lists — explicit is better than implicit
Does not touch the runtime or metadata layer

AI Tool Usage

AI tools were used (describe below)

I used AI tools to identify relevant code locations and also to draft initial tests.All code changes were reviewed, understood, and tested manually.

Implements serializable dict representation for Run and Task, making the Metaflow client more agent/LLM-friendly. Closes Netflix#1833

…Netflix#469)

greptile-apps · 2026-03-30T15:30:40Z

Greptile Summary

This PR adds _validate_foreach_type to FlowSpec.next() to reject set and frozenset foreach inputs with a clear error message, and also adds unrelated to_dict() methods to Task and Run in the client layer. The core validation logic is correct and well-placed; prior review concerns about the duplicate method definition and tests exercising the real method have been addressed.

Confidence Score: 4/5

Safe to merge for the foreach validation change; unrelated to_dict() additions and the absence of an end-to-end test through next() are the remaining concerns.

The core fix is correct: the method is defined once, placed appropriately in next(), and the tests now call the real method rather than a local copy. Two unresolved concerns keep this from 5/5: the to_dict() changes are unrelated to the PR's stated goal (flagged in a prior thread but still present), and there is no integration test that would catch someone accidentally removing the _validate_foreach_type call from next() — all five tests would still pass since they bypass next() entirely.

test/unit/test_foreach_unordered.py — a single test calling next(foreach=...) would close the coverage gap on the integration path.

Important Files Changed

Filename	Overview
metaflow/flowspec.py	Adds `_validate_foreach_type` helper and calls it inside `next()` after the foreach attribute is resolved; placement is correct (post-existence check, pre-UnboundedForeachInput check) and no duplicate remains.
test/unit/test_foreach_unordered.py	Tests now call the real `_validate_foreach_type` method on a `FlowSpec` instance; no integration test exercises the call site inside `next()`, so removing the call from `next()` would leave all 5 tests green.
metaflow/client/core.py	Adds `to_dict()` to both `Task` and `Run`; changes are unrelated to the foreach-unordered validation and were flagged in a prior review thread but remain in the PR.
test/unit/test_to_dict.py	Unit tests for `Task.to_dict()` and `Run.to_dict()` using heavy `PropertyMock` patching; covers keys, ISO-format datetime serialisation, and None handling adequately.

_{Reviews (3): Last reviewed commit: "Merge branch 'master' into feature/forea..." | Re-trigger Greptile}

greptile-apps · 2026-03-30T15:30:45Z

        meta_dict = self.metadata_dict
        return env.get_client_info(self.path_components[0], meta_dict)

+    def to_dict(self) -> Dict[str, Any]:
+        """
+        Returns a dictionary representation of this Task.
+
+        Useful for agents and external systems that need a serializable
+        summary of the task without fetching full artifact data.
+
+        Returns
+        -------
+        Dict[str, Any]
+            Dictionary containing key metadata about this task.
+        """
+        return {
+            "pathspec": self.pathspec,
+            "id": self.id,
+            "successful": self.successful,
+            "finished": self.finished,
+            "finished_at": self.finished_at.isoformat() if self.finished_at else None,
+            "created_at": self.created_at.isoformat() if self.created_at else None,
+            "tags": list(self.tags),
+            "user_tags": list(self.user_tags),
+            "runtime_name": self.runtime_name,
+            "current_attempt": self.current_attempt,
+            "origin_pathspec": self.origin_pathspec,
+        }
+


Unrelated changes mixed into the PR

The additions of Task.to_dict() and Run.to_dict() (and the corresponding test/unit/test_to_dict.py file) are entirely unrelated to the foreach-unordered validation fix described in the PR description and the linked issue. Mixing unrelated features in one PR makes review harder, complicates bisection if a regression is introduced, and makes the changelog harder to read.

Consider splitting these to_dict additions into a separate PR so each change can be reviewed and landed on its own merit.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

rushikeshbathe096 added 5 commits March 27, 2026 04:42

Generalize AWS Batch username sanitization for all invalid characters

cf62e3a

Add to_dict() to Run and Task client objects

a962049

Implements serializable dict representation for Run and Task, making the Metaflow client more agent/LLM-friendly. Closes Netflix#1833

Fix pre-commit

7584ba8

Revert unrelated batch.py changes from to_dict PR

d0dde9c

Validate foreach target is an ordered collection before fan-out (Fixes …

5552c6e

…Netflix#469)

greptile-apps Bot reviewed Mar 30, 2026

View reviewed changes

rushikeshbathe096 added 2 commits March 31, 2026 09:03

Fix duplicate method and test real FlowSpec method

e68e241

Merge branch 'master' into feature/foreach-unordered-validation

30ca50d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature foreach unordered validation#3062

Feature foreach unordered validation#3062
rushikeshbathe096 wants to merge 7 commits into
Netflix:masterfrom
rushikeshbathe096:feature/foreach-unordered-validation

rushikeshbathe096 commented Mar 30, 2026

Uh oh!

greptile-apps Bot commented Mar 30, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rushikeshbathe096 commented Mar 30, 2026

PR Type

Summary

Issue

Reproduction

Root Cause

Why This Fix Is Correct

Failure Modes Considered

Tests

Non-Goals

AI Tool Usage

Uh oh!

greptile-apps Bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps Bot commented Mar 30, 2026 •

edited

Loading