Skip to content

Feature foreach unordered validation#3062

Open
rushikeshbathe096 wants to merge 7 commits into
Netflix:masterfrom
rushikeshbathe096:feature/foreach-unordered-validation
Open

Feature foreach unordered validation#3062
rushikeshbathe096 wants to merge 7 commits into
Netflix:masterfrom
rushikeshbathe096:feature/foreach-unordered-validation

Conversation

@rushikeshbathe096
Copy link
Copy Markdown

PR Type

  • [ x] Bug fix
  • New feature
  • Core Runtime change (higher bar -- see CONTRIBUTING.md)
  • Docs / tooling
  • Refactoring

Summary

When a set or frozenset is used to foreach Metaflow silently produces incorrect results— some tasks run on wrong inputs and others are skipped entirely. Now this PR adds early validation in FlowSpec.next() to reject unordered collections (set, frozenset) with a clear error message suggesting list().

Issue

Fixes #469

Reproduction

Runtime: local

Commands to run:

from metaflow import FlowSpec, step

class BugFlow(FlowSpec):
    @step
    def start(self):
        self.items = {"a", "b", "c"}  # set — no guaranteed order
        self.next(self.process, foreach="items")

    @step
    def process(self):
        self.next(self.join)

    @step
    def join(self, inputs):
        self.next(self.end)

    @step
    def end(self):
        pass

if __name__ == "__main__":
    BugFlow()
python flow.py run

Where evidence shows up: parent console — tasks silently run on wrong inputs or are skipped with no error.

Before (error / log snippet) ``` # No error raised — tasks execute silently on wrong or duplicate inputs ```
After (evidence that fix works) ``` MetaflowException: Foreach variable *self.items* in step *start* is a set, which has no guaranteed iteration order. This can cause tasks to run on wrong inputs or some inputs to be skipped entirely. Wrap it in list() first, e.g.: self.items = list(self.items) ```

Root Cause

FlowSpec.next() expects the foreach input to have a fixed order, since
tasks are assigned to elements based on their position.Sets and frozensets don’t guarantee a consistent order, so the same data can be processed in a different order across runs. This can cause tasks to receive the wrong inputs or some inputs to be skipped. Since this happens silently without any error, it is hard to notice and debug. Since this failure is silent, it is particularly difficult to debug in practice.

Why This Fix Is Correct

The fix adds _validate_foreach_type() which raises InvalidNextException
if a set or frozenset is used before any tasks are created.Instead of running with wrong results, it stops early and shows a clear message telling the user to convert it to a list. Lists and tuples keep working the same, so existing valid usage is not affected.

Failure Modes Considered

  1. Existing flows using lists/tuples — no change, they work as before.
  2. UnboundedForeachInput — not affected, since validation happens before this logic.
  3. Silent wrong execution — now replaced with a clear error, so users know what went wrong.

Tests

  • Unit tests added/updated
  • [ x] Reproduction script provided
  • CI passes
  • If tests are impractical: N/A

5 unit tests in test/unit/test_foreach_unordered.py:

  • test_set_raises — set raises InvalidNextException
  • test_frozenset_raises — frozenset raises InvalidNextException
  • test_error_message_contains_fix_hint — error message contains list() hint
  • test_list_does_not_raise — list passes through cleanly
  • test_tuple_does_not_raise — tuple passes through cleanly

Non-Goals

  • Does not change behavior for any other collection types
  • Does not auto-convert sets to lists — explicit is better than implicit
  • Does not touch the runtime or metadata layer

AI Tool Usage

  • AI tools were used (describe below)

I used AI tools to identify relevant code locations and also to draft initial tests.All code changes were reviewed, understood, and tested manually.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 30, 2026

Greptile Summary

This PR adds _validate_foreach_type to FlowSpec.next() to reject set and frozenset foreach inputs with a clear error message, and also adds unrelated to_dict() methods to Task and Run in the client layer. The core validation logic is correct and well-placed; prior review concerns about the duplicate method definition and tests exercising the real method have been addressed.

Confidence Score: 4/5

Safe to merge for the foreach validation change; unrelated to_dict() additions and the absence of an end-to-end test through next() are the remaining concerns.

The core fix is correct: the method is defined once, placed appropriately in next(), and the tests now call the real method rather than a local copy. Two unresolved concerns keep this from 5/5: the to_dict() changes are unrelated to the PR's stated goal (flagged in a prior thread but still present), and there is no integration test that would catch someone accidentally removing the _validate_foreach_type call from next() — all five tests would still pass since they bypass next() entirely.

test/unit/test_foreach_unordered.py — a single test calling next(foreach=...) would close the coverage gap on the integration path.

Important Files Changed

Filename Overview
metaflow/flowspec.py Adds _validate_foreach_type helper and calls it inside next() after the foreach attribute is resolved; placement is correct (post-existence check, pre-UnboundedForeachInput check) and no duplicate remains.
test/unit/test_foreach_unordered.py Tests now call the real _validate_foreach_type method on a FlowSpec instance; no integration test exercises the call site inside next(), so removing the call from next() would leave all 5 tests green.
metaflow/client/core.py Adds to_dict() to both Task and Run; changes are unrelated to the foreach-unordered validation and were flagged in a prior review thread but remain in the PR.
test/unit/test_to_dict.py Unit tests for Task.to_dict() and Run.to_dict() using heavy PropertyMock patching; covers keys, ISO-format datetime serialisation, and None handling adequately.

Reviews (3): Last reviewed commit: "Merge branch 'master' into feature/forea..." | Re-trigger Greptile

Comment thread metaflow/flowspec.py Outdated
Comment thread test/unit/test_foreach_unordered.py Outdated
Comment thread metaflow/client/core.py
Comment on lines 1772 to +1800
meta_dict = self.metadata_dict
return env.get_client_info(self.path_components[0], meta_dict)

def to_dict(self) -> Dict[str, Any]:
"""
Returns a dictionary representation of this Task.

Useful for agents and external systems that need a serializable
summary of the task without fetching full artifact data.

Returns
-------
Dict[str, Any]
Dictionary containing key metadata about this task.
"""
return {
"pathspec": self.pathspec,
"id": self.id,
"successful": self.successful,
"finished": self.finished,
"finished_at": self.finished_at.isoformat() if self.finished_at else None,
"created_at": self.created_at.isoformat() if self.created_at else None,
"tags": list(self.tags),
"user_tags": list(self.user_tags),
"runtime_name": self.runtime_name,
"current_attempt": self.current_attempt,
"origin_pathspec": self.origin_pathspec,
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unrelated changes mixed into the PR

The additions of Task.to_dict() and Run.to_dict() (and the corresponding test/unit/test_to_dict.py file) are entirely unrelated to the foreach-unordered validation fix described in the PR description and the linked issue. Mixing unrelated features in one PR makes review harder, complicates bisection if a regression is introduced, and makes the changelog harder to read.

Consider splitting these to_dict additions into a separate PR so each change can be reviewed and landed on its own merit.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Using an unordered collection for foreach can cause incorrect execution

1 participant