Fixed issue where Pandera cannot handle metadata through Annotated types#2111
Merged
Conversation
> > Co-authored-by: Kevin Jijon <Kevinjijon0@gmail.com> > Co-authored-by: Karan Verma <kverma2750@gmail.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2111 +/- ##
==========================================
+ Coverage 80.90% 83.54% +2.63%
==========================================
Files 190 190
Lines 16621 16654 +33
==========================================
+ Hits 13448 13914 +466
+ Misses 3173 2740 -433 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Collaborator
|
thanks for the contribution @NGHades ! looks like there are a few failing tests. you can repro this locally using |
3 tasks
ybressler
reviewed
Oct 9, 2025
Comment on lines
+173
to
+174
| if field_info_list: | ||
| existing_field = field_info_list[0] |
Collaborator
There was a problem hiding this comment.
Would be useful to provide an inline comment here, why you're retrieving the first item
Fully resolves unionai-oss#2110 by propagating ``FieldInfo`` metadata embedded in ``typing.Annotated`` annotations to the resulting schema, and addresses several issues with the original patch: * Extract the embedded ``FieldInfo`` from ``Annotated`` metadata in ``DataFrameModel.__init_subclass__`` so attributes like ``description``, ``title``, ``unique``, and ``ge``/``le`` checks defined via ``Annotated[T, pa.Field(...)]`` are preserved on the schema columns. * Fix ``BaseFieldInfo.__hash__``/``__eq__`` to use identity for un-named fields so Python's ``typing.Annotated`` cache does not deduplicate ``Annotated[T, pa.Field(...)]`` annotations across distinct model classes (which previously caused the second model to inherit the first model's field configuration). * Refactor ``get_dtype_kwargs`` to filter out ``FieldInfo`` entries from the annotation metadata via a shared ``_dtype_metadata`` helper. The pandas and pyspark builders use the helper to decide whether to call ``annotation.arg(**kwargs)`` or use the annotated type as-is. Cleanups carried over from the original patch: * Remove an unused ``from pandera.utils import F`` import and the dead ``__annotation_infos__`` cache branch. * Restore the ``column_properties`` docstring and remove trailing whitespace introduced in ``model_components.py``/``typing/common.py``. * Drop a duplicated ``dtype_kwargs`` check and a leftover ``<-- str`` comment in ``pandas/model.py``. Adds pandas-side tests covering description/title/unique/check propagation through ``Annotated``, the cross-class non-deduplication guarantee, and that an explicit ``= pa.Field(...)`` assignment continues to take precedence over an embedded ``Annotated`` ``FieldInfo``. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
The polars ``_build_columns`` code path entered the
``annotation.arg(**dtype_kwargs)`` branch when the polars engine could
not resolve an ``Annotated`` type (e.g. ``Annotated[float, pa.Field(...)]``),
causing the model to fail with ``SchemaInitError: Invalid annotation``
because ``float(**{}) == 0.0`` is not a valid polars dtype.
Apply the same ``_dtype_metadata`` guard used by the pandas builder: if
the ``Annotated`` metadata contains no dtype parameters (only a
``FieldInfo``), use ``annotation.arg`` directly as the dtype instead of
calling it with empty kwargs.
Adds polars-side regression tests covering FieldInfo propagation
through ``Annotated`` (description, title, unique, checks, metadata)
and the cross-class non-deduplication guarantee.
Co-authored-by: Cursor <cursoragent@cursor.com>
Add subsections to both the general DataFrame Models guide and the polars guide showing how to embed a ``pa.Field(...)`` directly inside ``typing.Annotated`` — including descriptions, titles, unique flags, checks (``ge``/``le``), and combinations with parameterized dtypes. Also notes that an explicit ``= pa.Field(...)`` assignment continues to take precedence over an embedded ``Annotated`` ``FieldInfo``. Verified via ``sphinx-build``: both pages build cleanly and the ``code-cell``/``testcode`` blocks execute their expected output into the rendered HTML. Co-authored-by: Cursor <cursoragent@cursor.com>
Mirror the pandas/polars Annotated[T, pa.Field(...)] handling in the ibis
backend so embedded FieldInfo metadata (description, title, unique,
checks, etc.) propagates to the generated schema.
* api/ibis/model.py: guard the dtype-instantiation path with
``_dtype_metadata`` so ``Annotated[float, pa.Field(...)]`` (and similar
built-in-typed annotations carrying only a FieldInfo) no longer call
``float(**{})`` and instead use the annotated type as-is.
* engines/ibis_engine.py: the ``Engine.dtype`` fallback used to call
``data_type().to_numpy()`` unconditionally, which raised an
``AttributeError`` for annotations that don't resolve as numpy-scalar
dtypes. Wrap the fallback so any failure re-raises the original
TypeError, allowing the DataFrameModel fallback path (above) to take
over.
* tests/ibis/test_ibis_model.py: add regression tests covering metadata
propagation, check application, and the Annotated cross-class dedup
bug that was fixed in BaseFieldInfo.__hash__/__eq__.
* docs/source/ibis.md: document the new functionality with a worked
example.
Co-authored-by: Cursor <cursoragent@cursor.com>
Match the polars docs section by adding "Embedding Field metadata in Annotated" subsections to the pyspark and ibis user guides, with worked examples that combine plain and parameterized dtypes with an embedded FieldInfo. Also finish the pyspark side of the Annotated FieldInfo fix: * api/pyspark/model.py: allow ``annotation.is_annotated_type`` to take the column-build path (previously the ``annotation.origin is None`` guard misclassified ``Annotated[T.StringType, pa.Field(...)]`` as an invalid annotation and raised SchemaInitError). * tests/pyspark/test_pyspark_model.py: add regression tests covering metadata propagation, check application, and the Annotated cross-class dedup bug. Co-authored-by: Cursor <cursoragent@cursor.com>
Collaborator
|
I fixed this PR up @NGHades , sorry for the long delay getting this in, and thanks for the contribution! |
cosmicBboy
approved these changes
May 21, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
This PR addresses issue #2110, which we (@kevJ711 and @KViruz2750) identified while working with Annotated types in conjunction with pa.Field(...).
Description
When defining schema models using Annotated along with pa.Field(...), metadata such as description, unique, and title was not being correctly propagated into the resulting DataFrameModel.
Solution
We introduced a more rigorous check for parsing Annotated types to ensure that any AnnotationInfo attached to a type is correctly handled and its metadata extracted. This change allows the model to capture and utilize metadata as expected.
Additionally, we observed that certain built-in types (str, int, float, bool) do not support parameterization. To prevent issues when handling these types, we added a check that safely returns an empty metadata dictionary for them.