Skip to content

feat: add YAML support for config files#3075

Open
mohitmalviya0707 wants to merge 1 commit into
Netflix:masterfrom
mohitmalviya0707:yaml-support
Open

feat: add YAML support for config files#3075
mohitmalviya0707 wants to merge 1 commit into
Netflix:masterfrom
mohitmalviya0707:yaml-support

Conversation

@mohitmalviya0707
Copy link
Copy Markdown

This PR adds YAML support for config files.

Currently, Metaflow only supports JSON. This change adds a fallback to YAML parsing using yaml.safe_load.

  • JSON-first parsing preserved
  • YAML support added
  • Ensures YAML returns a dictionary

Tested locally using a sample flow with YAML config.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 31, 2026

Greptile Summary

This PR adds YAML parsing as a fallback when a config value cannot be parsed as JSON, using the already-vendored metaflow._vendor.yaml library. The core idea is sound and the vendored dependency is already present, but there is one correctness issue with the error-reporting path and two small style issues that should be addressed before merging.

Key findings:

  • P1 – Misleading error message: The outer except Exception block catches both genuine YAML parse failures and the ValueError("YAML did not produce a dictionary") raised for structurally-valid-but-non-mapping YAML. This causes users who pass a valid YAML list or scalar to see "not valid JSON or YAML", which is factually wrong and hard to debug. The catch should be narrowed to YAMLError only, with a separate, accurate message for the non-mapping case.
  • P2 – Import inside exception handler: from metaflow._vendor import yaml is placed inside the per-call exception handler rather than at the module top level, which is inconsistent with every other file in the codebase that uses the vendored YAML module (e.g. parsers.py, includefile.py). It also means an ImportError would be silently misreported as a parse failure.
  • P2 – Stale TODO comment: The # TODO: Support YAML comment directly below the new YAML block was not removed.

Confidence Score: 4/5

Safe to merge after fixing the misleading error message; the core YAML parsing logic is correct.

One P1 issue exists: valid YAML that is not a dict is reported as 'not valid JSON or YAML', which is incorrect and will confuse users. The two remaining issues are P2 style. Score is 4 to signal the P1 should be addressed first.

metaflow/user_configs/config_options.py — specifically the error-message path in the YAML fallback block (lines 385–399).

Important Files Changed

Filename Overview
metaflow/user_configs/config_options.py Adds YAML fallback parsing after JSON fails; has a misleading error message when YAML is valid-but-not-a-dict, a stale TODO comment, and an in-handler import inconsistent with the rest of the codebase.

Reviews (1): Last reviewed commit: "feat: add YAML support for config files" | Re-trigger Greptile

Comment on lines +385 to +398
except json.JSONDecodeError:
try:
from metaflow._vendor import yaml
read_value = yaml.safe_load(val)

if not isinstance(read_value, dict):
raise ValueError("YAML did not produce a dictionary")

except Exception as e:
msgs.append(
"configuration value for '%s' is not valid JSON or YAML: %s"
% (name, e)
)
continue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Misleading error message when YAML is structurally valid but not a mapping

The broad except Exception as e: block on line 393 catches both genuine YAML parse errors and the ValueError("YAML did not produce a dictionary") raised two lines above it. This means that when a user supplies valid YAML that isn't a mapping (e.g. a list - item1\n- item2, a scalar 42, or a bare string hello), they'll receive:

configuration value for 'my_config' is not valid JSON or YAML: YAML did not produce a dictionary

This message is factually wrong — the input is valid YAML — and will confuse users who are trying to diagnose why their config is rejected.

A clean fix is to narrow the outer catch to only YAML scanner/parser errors and emit a distinct message for the non-dict case:

except json.JSONDecodeError:
    from metaflow._vendor import yaml
    from metaflow._vendor.yaml import YAMLError
    try:
        read_value = yaml.safe_load(val)
    except YAMLError as e:
        msgs.append(
            "configuration value for '%s' is not valid JSON or YAML: %s"
            % (name, e)
        )
        continue
    if not isinstance(read_value, dict):
        msgs.append(
            "configuration value for '%s' is valid YAML but must be a "
            "mapping (got %s)" % (name, type(read_value).__name__)
        )
        continue

This also surfaces a secondary risk: if the import itself ever fails, the current code swallows the ImportError and reports it as a JSON/YAML parse failure, making that failure very hard to debug.

continue
except json.JSONDecodeError:
try:
from metaflow._vendor import yaml
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Import should be at module level, consistent with codebase convention

The from metaflow._vendor import yaml is placed inside a per-call exception handler. Every failed JSON parse will trigger a new import statement in the hot path. More importantly, every other file in this codebase that uses the vendored YAML module imports it at the top of the file (e.g. metaflow/plugins/parsers.py line 1, metaflow/includefile.py line 13). Moving this import to the top of config_options.py alongside the existing from metaflow._vendor import click would be consistent and cleaner.

% (name, e)
)
continue
# TODO: Support YAML
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Stale TODO comment should be removed

The # TODO: Support YAML comment is left over from before this PR. Since YAML support is now implemented in the block directly above, this comment is no longer accurate and should be deleted.

Suggested change
# TODO: Support YAML

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant