fix(decorators): catch SchemaErrors in Union dispatch (#2325)#2329
Closed
jbbqqf wants to merge 1 commit into
Closed
fix(decorators): catch SchemaErrors in Union dispatch (#2325)#2329jbbqqf wants to merge 1 commit into
jbbqqf wants to merge 1 commit into
Conversation
``_check_arg_value_against_union`` only caught ``SchemaError`` (singular). When a strict-mode candidate schema rejects the dataframe up-front (e.g. COLUMN_NOT_IN_SCHEMA), pandera raises ``SchemaErrors`` (plural) — which is not a subclass of ``SchemaError`` — so the dispatch short-circuited on the first strict candidate instead of trying the next one in the Union. Also catch ``SchemaErrors`` and extend the accumulated error list with its inner errors. Add a focused regression test that fails on origin/main and passes on this branch. Co-Authored-By: Claude Code <noreply@anthropic.com> Signed-off-by: jbb <jbaptiste.braun@gmail.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2329 +/- ##
=======================================
Coverage 83.51% 83.52%
=======================================
Files 190 190
Lines 16613 16615 +2
=======================================
+ Hits 13875 13877 +2
Misses 2738 2738 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Collaborator
|
Already fixed by #2326 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
@pa.check_typesUnion dispatch (Union[DataFrame[A], DataFrame[B]]) was catching onlyerrors.SchemaError(singular). When a strict-mode candidate schema rejects the dataframe up-front (e.g.COLUMN_NOT_IN_SCHEMA), pandera raiseserrors.SchemaErrors(plural) — which is not a subclass ofSchemaError— so the dispatch short-circuited on the first strict candidate and bubbled its failure cases instead of trying the next Union member. This PR also catchesSchemaErrorsand merges its inner errors into the accumulator, restoring the documented Union-fallthrough behaviour.Resolves #2325.
Context
The reporter (#2325) pinpointed
_check_arg_value_against_unionatpandera/decorators.py:729as the offending site.errors.SchemaErrorsis a separate exception class (not a subclass ofSchemaError), so the existingexcept errors.SchemaErrorclause never sees it.The bug surfaces specifically when the first schema in the Union has
Config.strict = True— strict-mode rejections raiseSchemaErrors(plural) up-front rather than the per-elementSchemaError(singular) that non-strict schemas raise.Changes
pandera/decorators.py: add a secondexcept errors.SchemaErrorsclause in_check_arg_value_against_union. The innere.schema_errorsare merged into the accumulator, so when no candidate matches the final raisedSchemaErrorsstill contains every collected failure across the Union members.tests/pandas/test_decorators.py: focused regression testtest_check_types_union_dispatches_past_strict_schemacovering both:SchemaErrors, i.e. catch-and-fall-through doesn't silently accept invalid data).The 4-line code comment explains why both exceptions need handling — a reviewer reading the diff cold otherwise has to re-derive the SchemaError-vs-SchemaErrors taxonomy.
Reproduce BEFORE/AFTER yourself (copy-paste)
What I ran locally
pytest tests/pandas/test_decorators.py::test_check_types_union_dispatches_past_strict_schema -v→ 1/1 passed (regression test).pytest tests/pandas/test_decorators.py -q→ 71 passed, 1 skipped (the pre-existingtest_check_types_union_argsis still skipped under itsknown issue with error propagationmarker — out of scope for this PR).SchemaErrors(StrictModel COLUMN_NOT_IN_SCHEMA), AFTER returns the dataframe via ExtendedModel.Edge cases tested
Risk / blast radius
Localised change to
_check_arg_value_against_union. The new clause only fires when a candidate schema raisesSchemaErrors(which only happens with strict schemas orlazy=True). The accumulator type doesn't change — both clauses extend the sameschema_errors: listand the final raise constructs the sameerrors.SchemaErrors. No behaviour change when zero of the candidates raise plural errors.PR drafted with assistance from Claude Code. The change was reviewed manually against pandera's
decorators.pyand the reporter's pinpoint in #2325.