Skip to content

fix(checks): refresh dispatcher for built-in checks with user names (#2042)#2330

Closed
jbbqqf wants to merge 1 commit into
unionai-oss:mainfrom
jbbqqf:bugfix/2042-check-name-kwarg
Closed

fix(checks): refresh dispatcher for built-in checks with user names (#2042)#2330
jbbqqf wants to merge 1 commit into
unionai-oss:mainfrom
jbbqqf:bugfix/2042-check-name-kwarg

Conversation

@jbbqqf
Copy link
Copy Markdown

@jbbqqf jbbqqf commented May 9, 2026

Summary

Built-in check builders accept a name= kwarg used to label the check in error messages (Check.gt(0, name="my_check")). On schema validation, this raised KeyError: <class 'pandas.Series'> because Check.__call__'s dispatcher-refresh guard relied on self.name matching a registered built-in name — which it never does once the user has supplied a custom label. Track the underlying built-in registry key separately and use it for the refresh decision.

Resolves #2042.

Context

Check.gt(0) (default name) and Check.gt(0, name="positive") follow different paths in Check.__call__:

# pandera/api/checks.py, line 250 on origin/main:
if self.name is not None and self.is_builtin_check(self.name):
    self._check_fn = self.get_builtin_check_fn(self.name)

The reload exists because schema validation deep-copies Check instances (via copy.deepcopy(self) in many pandera/api/dataframe/container.py methods). Deep-copying a Dispatcher produces a new instance with a new _function_registry dict — so per-type entries that lazy backend registration (register_pandas_backends) adds to the original dispatcher are invisible to the deep-copy.

The reload re-fetches the live dispatcher from the class-level CHECK_FUNCTION_REGISTRY, papering over the deep-copy. But it gates on self.name, which the user is allowed to override; with a custom name the gate never opens, the deep-copied stale dispatcher stays in place, and dispatcher.__call__ KeyErrors when it tries type(args[0]) lookup.

Verified by tracing dispatcher object identity: at validation time Check.gt(0, name="my_check")._check_fn has a different id() than BaseCheck.CHECK_FUNCTION_REGISTRY["greater_than"], with a registry containing only {typing.Any} (the base function registered before any pandas-specific overload).

Changes

  • pandera/api/base/checks.py: from_builtin_check_name now sets instance._builtin_check_name = name on the returned check, capturing the registry key independently of the user-supplied label. A 7-line comment explains the deep-copy interaction so a future reader doesn't have to re-derive it.
  • pandera/api/checks.py: Check.__call__ now resolves a builtin_key = getattr(self, "_builtin_check_name", None) or self.name and uses that (not self.name directly) to decide whether to refresh self._check_fn. Falls back to self.name for instances built outside the factory (e.g. user-defined Check subclasses), preserving existing behaviour.
  • tests/pandas/test_checks.py: regression test test_builtin_check_with_user_supplied_name covering both green path (validation passes for valid input) and red path (invalid input still produces a clean SchemaError with the underlying greater_than(0) formatting, not a KeyError).

Reproduce BEFORE/AFTER yourself (copy-paste)

# --- one-time setup ---
git clone https://github.com/unionai-oss/pandera.git /tmp/repro && cd /tmp/repro
python3 -m venv .venv && source .venv/bin/activate
pip install -e '.[pandas]'

# --- BEFORE (origin/main) ---
git checkout origin/main
python3 - <<'PY'
import pandera.pandas as pa
import pandas as pd

schema = pa.DataFrameSchema(
    columns={"test": pa.Column(float, checks=[pa.Check.gt(0, name="positive_check")])},
)
try:
    schema.validate(pd.DataFrame({"test": [1.0]}))
    print("OK")
except Exception as e:
    print(f"RAISED {type(e).__name__}: {str(e)[:90]}")
PY
# Expected: RAISED SchemaError ... KeyError("<class 'pandas.Series'>") ...

# --- AFTER (this PR) ---
git fetch https://github.com/jbbqqf/pandera.git bugfix/2042-check-name-kwarg
git checkout FETCH_HEAD
pip install -e '.[pandas]' >/dev/null
python3 - <<'PY'
import pandera.pandas as pa
import pandas as pd

schema = pa.DataFrameSchema(
    columns={"test": pa.Column(float, checks=[pa.Check.gt(0, name="positive_check")])},
)
print("validation OK; rows:", len(schema.validate(pd.DataFrame({"test": [1.0]}))))
PY
# Expected: validation OK; rows: 1

What I ran locally

  • pytest tests/pandas/test_checks.py::test_builtin_check_with_user_supplied_name -v → 1/1 passed (regression test).
  • pytest tests/pandas/test_checks.py tests/pandas/test_checks_builtin.py -q → 285 passed.
  • Verified the regression test fails on origin/main with the documented KeyError: <class 'pandas.Series'> traceback.

Edge cases tested

# Scenario Expected Verified by
1 Check.gt(0, name="positive") on a column that satisfies the check validation passes, returns the dataframe green path of new test
2 Check.gt(0, name="positive") on a column that fails the check clean SchemaError with greater_than(0) failure-case formatting (not a KeyError) red path of new test
3 Check.gt(0) (no name=) — pre-existing happy path unchanged full tests/pandas/test_checks*.py suite (285 passed)
4 Custom-defined Check subclasses without _builtin_check_name falls back to self.name, original behaviour preserved tests/pandas/test_checks.py (no regressions)

Risk / blast radius

The new attribute _builtin_check_name is set only on instances created via from_builtin_check_name; everywhere else, getattr(self, "_builtin_check_name", None) returns None and the resolution falls through to self.name — i.e. exactly the prior behaviour. No public API changes. The attribute is set by all existing call sites in pandera/api/checks.py (every cls.from_builtin_check_name(...) builder) and in pandera/api/hypotheses.py.


PR drafted with assistance from Claude Code. The change was reviewed manually against pandera's check / dispatcher / backend layers and against the reporter's traceback in #2042.

…nionai-oss#2042)

``Check.__call__`` reloads ``self._check_fn`` from the class-level
registry so that per-type entries added by lazy backend registration
(during ``schema.validate``) are visible — schema validation deep-copies
Check instances, leaving the deep-copied ``Dispatcher`` with a stale
per-type function map.

The reload was guarded by ``self.is_builtin_check(self.name)``, but
``self.name`` is the user-supplied label when constructing via
``Check.gt(0, name="my_check")``, so the guard never fires for named
built-in checks. The result was ``KeyError: <class 'pandas.Series'>``
when validating a column with a renamed built-in check.

Track the underlying built-in registry key on the instance via
``_builtin_check_name`` (set by ``BaseCheck.from_builtin_check_name``)
and consult it in addition to ``self.name`` when deciding whether to
refresh the dispatcher. Add a regression test that fails on origin/main
and passes on this branch.

Co-Authored-By: Claude Code <noreply@anthropic.com>
Signed-off-by: jbb <jbaptiste.braun@gmail.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.52%. Comparing base (378b2de) to head (0d80018).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2330   +/-   ##
=======================================
  Coverage   83.51%   83.52%           
=======================================
  Files         190      190           
  Lines       16613    16616    +3     
=======================================
+ Hits        13875    13878    +3     
  Misses       2738     2738           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@cosmicBboy
Copy link
Copy Markdown
Collaborator

Thanks for the contributions @jbbqqf ! On this one, need to make the linters happy with prek run --all-files

Check out https://github.com/unionai-oss/pandera/blob/main/AGENTS.md, which should make Claude more useful on this project.

@jbbqqf jbbqqf closed this May 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Checks with kwarg name specified raise a KeyError on exceution.

2 participants