Skip to content

fix(decorators): catch SchemaErrors in Union dispatch (#2325)#2329

Closed
jbbqqf wants to merge 1 commit into
unionai-oss:mainfrom
jbbqqf:bugfix/2325-union-catches-schema-errors
Closed

fix(decorators): catch SchemaErrors in Union dispatch (#2325)#2329
jbbqqf wants to merge 1 commit into
unionai-oss:mainfrom
jbbqqf:bugfix/2325-union-catches-schema-errors

Conversation

@jbbqqf
Copy link
Copy Markdown

@jbbqqf jbbqqf commented May 9, 2026

Summary

@pa.check_types Union dispatch (Union[DataFrame[A], DataFrame[B]]) was catching only errors.SchemaError (singular). When a strict-mode candidate schema rejects the dataframe up-front (e.g. COLUMN_NOT_IN_SCHEMA), pandera raises errors.SchemaErrors (plural) — which is not a subclass of SchemaError — so the dispatch short-circuited on the first strict candidate and bubbled its failure cases instead of trying the next Union member. This PR also catches SchemaErrors and merges its inner errors into the accumulator, restoring the documented Union-fallthrough behaviour.

Resolves #2325.

Context

The reporter (#2325) pinpointed _check_arg_value_against_union at pandera/decorators.py:729 as the offending site. errors.SchemaErrors is a separate exception class (not a subclass of SchemaError), so the existing except errors.SchemaError clause never sees it.

The bug surfaces specifically when the first schema in the Union has Config.strict = True — strict-mode rejections raise SchemaErrors (plural) up-front rather than the per-element SchemaError (singular) that non-strict schemas raise.

Changes

  • pandera/decorators.py: add a second except errors.SchemaErrors clause in _check_arg_value_against_union. The inner e.schema_errors are merged into the accumulator, so when no candidate matches the final raised SchemaErrors still contains every collected failure across the Union members.
  • tests/pandas/test_decorators.py: focused regression test test_check_types_union_dispatches_past_strict_schema covering both:
    • the green path (Union[Strict, Extended] with input that only matches Extended → returns the dataframe unchanged), and
    • the negative path (input matching neither schema still raises a SchemaErrors, i.e. catch-and-fall-through doesn't silently accept invalid data).

The 4-line code comment explains why both exceptions need handling — a reviewer reading the diff cold otherwise has to re-derive the SchemaError-vs-SchemaErrors taxonomy.

Reproduce BEFORE/AFTER yourself (copy-paste)

# --- one-time setup ---
git clone https://github.com/unionai-oss/pandera.git /tmp/repro && cd /tmp/repro
python3 -m venv .venv && source .venv/bin/activate
pip install -e '.[pandas]' pytest

# --- BEFORE (origin/main) ---
git checkout origin/main
python3 - <<'PY'
import pandas as pd
import pandera.pandas as pa
from pandera.typing import DataFrame, Series
from typing import Union

class StrictModel(pa.DataFrameModel):
    timestamp: Series[pa.Timestamp]
    feature: Series[float] = pa.Field(alias="value_.+", regex=True)
    class Config:
        strict = True

class ExtendedModel(pa.DataFrameModel):
    timestamp: Series[pa.Timestamp]
    feature: Series[float] = pa.Field(alias="value_.+", regex=True)
    cluster: Series[int] = pa.Field(alias="Cluster")
    class Config:
        strict = True

@pa.check_types()
def process(df: Union[DataFrame[StrictModel], DataFrame[ExtendedModel]]) -> Union[DataFrame[StrictModel], DataFrame[ExtendedModel]]:
    return df

df = pd.DataFrame({"timestamp": [pd.Timestamp("2024-01-01")], "value_x": [1.0], "Cluster": [1]})
try:
    process(df)
    print("OK")
except Exception as e:
    print(f"RAISED {type(e).__name__}: column 'Cluster' not in StrictModel schema")
PY
# Expected: RAISED SchemaErrors with COLUMN_NOT_IN_SCHEMA on StrictModel,
#           even though ExtendedModel covers Cluster.

# --- AFTER (this PR) ---
git fetch https://github.com/jbbqqf/pandera.git bugfix/2325-union-catches-schema-errors
git checkout FETCH_HEAD
pip install -e '.[pandas]' >/dev/null
python3 - <<'PY'
# (same body as above)
import pandas as pd
import pandera.pandas as pa
from pandera.typing import DataFrame, Series
from typing import Union

class StrictModel(pa.DataFrameModel):
    timestamp: Series[pa.Timestamp]
    feature: Series[float] = pa.Field(alias="value_.+", regex=True)
    class Config:
        strict = True

class ExtendedModel(pa.DataFrameModel):
    timestamp: Series[pa.Timestamp]
    feature: Series[float] = pa.Field(alias="value_.+", regex=True)
    cluster: Series[int] = pa.Field(alias="Cluster")
    class Config:
        strict = True

@pa.check_types()
def process(df: Union[DataFrame[StrictModel], DataFrame[ExtendedModel]]) -> Union[DataFrame[StrictModel], DataFrame[ExtendedModel]]:
    return df

df = pd.DataFrame({"timestamp": [pd.Timestamp("2024-01-01")], "value_x": [1.0], "Cluster": [1]})
print("OK -- result shape:", process(df).shape)
PY
# Expected: OK -- result shape: (1, 3) (ExtendedModel matched on fall-through)

What I ran locally

  • pytest tests/pandas/test_decorators.py::test_check_types_union_dispatches_past_strict_schema -v → 1/1 passed (regression test).
  • pytest tests/pandas/test_decorators.py -q → 71 passed, 1 skipped (the pre-existing test_check_types_union_args is still skipped under its known issue with error propagation marker — out of scope for this PR).
  • BEFORE/AFTER snippet above: BEFORE raises SchemaErrors (StrictModel COLUMN_NOT_IN_SCHEMA), AFTER returns the dataframe via ExtendedModel.

Edge cases tested

# Scenario Expected Verified by
1 Union[Strict, Extended] with input matching Extended (Strict has extra column rejected by strict) falls through to Extended; returns df green path of new test
2 Union[Strict, Strict] with input matching neither raises SchemaErrors collecting both candidates negative path of new test
3 Existing non-strict Union behavior (e.g. OnlyZeroes/OnlyOnes pre-existing tests) unchanged pre-existing decorator tests still green

Risk / blast radius

Localised change to _check_arg_value_against_union. The new clause only fires when a candidate schema raises SchemaErrors (which only happens with strict schemas or lazy=True). The accumulator type doesn't change — both clauses extend the same schema_errors: list and the final raise constructs the same errors.SchemaErrors. No behaviour change when zero of the candidates raise plural errors.


PR drafted with assistance from Claude Code. The change was reviewed manually against pandera's decorators.py and the reporter's pinpoint in #2325.

``_check_arg_value_against_union`` only caught ``SchemaError`` (singular).
When a strict-mode candidate schema rejects the dataframe up-front (e.g.
COLUMN_NOT_IN_SCHEMA), pandera raises ``SchemaErrors`` (plural) — which
is not a subclass of ``SchemaError`` — so the dispatch short-circuited
on the first strict candidate instead of trying the next one in the
Union.

Also catch ``SchemaErrors`` and extend the accumulated error list with
its inner errors. Add a focused regression test that fails on origin/main
and passes on this branch.

Co-Authored-By: Claude Code <noreply@anthropic.com>
Signed-off-by: jbb <jbaptiste.braun@gmail.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.52%. Comparing base (378b2de) to head (e069b50).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2329   +/-   ##
=======================================
  Coverage   83.51%   83.52%           
=======================================
  Files         190      190           
  Lines       16613    16615    +2     
=======================================
+ Hits        13875    13877    +2     
  Misses       2738     2738           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@cosmicBboy
Copy link
Copy Markdown
Collaborator

Already fixed by #2326

@cosmicBboy cosmicBboy closed this May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

check_types decorator does not handle SchemaErrors (plural)

2 participants