Skip to content

Fixed issue where Pandera cannot handle metadata through Annotated types#2111

Merged
cosmicBboy merged 14 commits into
unionai-oss:mainfrom
NGHades:main
May 21, 2026
Merged

Fixed issue where Pandera cannot handle metadata through Annotated types#2111
cosmicBboy merged 14 commits into
unionai-oss:mainfrom
NGHades:main

Conversation

@NGHades
Copy link
Copy Markdown
Contributor

@NGHades NGHades commented Aug 11, 2025

Problem

This PR addresses issue #2110, which we (@kevJ711 and @KViruz2750) identified while working with Annotated types in conjunction with pa.Field(...).

Description

When defining schema models using Annotated along with pa.Field(...), metadata such as description, unique, and title was not being correctly propagated into the resulting DataFrameModel.

Solution

We introduced a more rigorous check for parsing Annotated types to ensure that any AnnotationInfo attached to a type is correctly handled and its metadata extracted. This change allows the model to capture and utilize metadata as expected.

Additionally, we observed that certain built-in types (str, int, float, bool) do not support parameterization. To prevent issues when handling these types, we added a check that safely returns an empty metadata dictionary for them.

@codecov
Copy link
Copy Markdown

codecov Bot commented Aug 12, 2025

Codecov Report

❌ Patch coverage is 67.24138% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.54%. Comparing base (76d663c) to head (3a12e4f).

Files with missing lines Patch % Lines
pandera/api/pyspark/model.py 0.00% 12 Missing ⚠️
pandera/api/base/model_components.py 40.00% 3 Missing ⚠️
pandera/api/ibis/model.py 66.66% 2 Missing ⚠️
pandera/api/dataframe/model.py 94.44% 1 Missing ⚠️
pandera/api/pandas/model.py 83.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2111      +/-   ##
==========================================
+ Coverage   80.90%   83.54%   +2.63%     
==========================================
  Files         190      190              
  Lines       16621    16654      +33     
==========================================
+ Hits        13448    13914     +466     
+ Misses       3173     2740     -433     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@cosmicBboy
Copy link
Copy Markdown
Collaborator

thanks for the contribution @NGHades ! looks like there are a few failing tests. you can repro this locally using nox, see here: https://pandera.readthedocs.io/en/stable/CONTRIBUTING.html#run-a-specific-test-suite-locally

Comment thread pandera/api/dataframe/model.py Outdated
Comment on lines +173 to +174
if field_info_list:
existing_field = field_info_list[0]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be useful to provide an inline comment here, why you're retrieving the first item

cosmicBboy and others added 7 commits December 31, 2025 10:21
Fully resolves unionai-oss#2110 by propagating ``FieldInfo`` metadata embedded in
``typing.Annotated`` annotations to the resulting schema, and addresses
several issues with the original patch:

* Extract the embedded ``FieldInfo`` from ``Annotated`` metadata in
  ``DataFrameModel.__init_subclass__`` so attributes like ``description``,
  ``title``, ``unique``, and ``ge``/``le`` checks defined via
  ``Annotated[T, pa.Field(...)]`` are preserved on the schema columns.
* Fix ``BaseFieldInfo.__hash__``/``__eq__`` to use identity for un-named
  fields so Python's ``typing.Annotated`` cache does not deduplicate
  ``Annotated[T, pa.Field(...)]`` annotations across distinct model
  classes (which previously caused the second model to inherit the first
  model's field configuration).
* Refactor ``get_dtype_kwargs`` to filter out ``FieldInfo`` entries from
  the annotation metadata via a shared ``_dtype_metadata`` helper. The
  pandas and pyspark builders use the helper to decide whether to call
  ``annotation.arg(**kwargs)`` or use the annotated type as-is.

Cleanups carried over from the original patch:

* Remove an unused ``from pandera.utils import F`` import and the dead
  ``__annotation_infos__`` cache branch.
* Restore the ``column_properties`` docstring and remove trailing
  whitespace introduced in ``model_components.py``/``typing/common.py``.
* Drop a duplicated ``dtype_kwargs`` check and a leftover ``<-- str``
  comment in ``pandas/model.py``.

Adds pandas-side tests covering description/title/unique/check
propagation through ``Annotated``, the cross-class non-deduplication
guarantee, and that an explicit ``= pa.Field(...)`` assignment continues
to take precedence over an embedded ``Annotated`` ``FieldInfo``.

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
The polars ``_build_columns`` code path entered the
``annotation.arg(**dtype_kwargs)`` branch when the polars engine could
not resolve an ``Annotated`` type (e.g. ``Annotated[float, pa.Field(...)]``),
causing the model to fail with ``SchemaInitError: Invalid annotation``
because ``float(**{}) == 0.0`` is not a valid polars dtype.

Apply the same ``_dtype_metadata`` guard used by the pandas builder: if
the ``Annotated`` metadata contains no dtype parameters (only a
``FieldInfo``), use ``annotation.arg`` directly as the dtype instead of
calling it with empty kwargs.

Adds polars-side regression tests covering FieldInfo propagation
through ``Annotated`` (description, title, unique, checks, metadata)
and the cross-class non-deduplication guarantee.

Co-authored-by: Cursor <cursoragent@cursor.com>
Add subsections to both the general DataFrame Models guide and the
polars guide showing how to embed a ``pa.Field(...)`` directly inside
``typing.Annotated`` — including descriptions, titles, unique flags,
checks (``ge``/``le``), and combinations with parameterized dtypes.
Also notes that an explicit ``= pa.Field(...)`` assignment continues to
take precedence over an embedded ``Annotated`` ``FieldInfo``.

Verified via ``sphinx-build``: both pages build cleanly and the
``code-cell``/``testcode`` blocks execute their expected output into
the rendered HTML.

Co-authored-by: Cursor <cursoragent@cursor.com>
Mirror the pandas/polars Annotated[T, pa.Field(...)] handling in the ibis
backend so embedded FieldInfo metadata (description, title, unique,
checks, etc.) propagates to the generated schema.

* api/ibis/model.py: guard the dtype-instantiation path with
  ``_dtype_metadata`` so ``Annotated[float, pa.Field(...)]`` (and similar
  built-in-typed annotations carrying only a FieldInfo) no longer call
  ``float(**{})`` and instead use the annotated type as-is.

* engines/ibis_engine.py: the ``Engine.dtype`` fallback used to call
  ``data_type().to_numpy()`` unconditionally, which raised an
  ``AttributeError`` for annotations that don't resolve as numpy-scalar
  dtypes. Wrap the fallback so any failure re-raises the original
  TypeError, allowing the DataFrameModel fallback path (above) to take
  over.

* tests/ibis/test_ibis_model.py: add regression tests covering metadata
  propagation, check application, and the Annotated cross-class dedup
  bug that was fixed in BaseFieldInfo.__hash__/__eq__.

* docs/source/ibis.md: document the new functionality with a worked
  example.

Co-authored-by: Cursor <cursoragent@cursor.com>
Match the polars docs section by adding "Embedding Field metadata in
Annotated" subsections to the pyspark and ibis user guides, with
worked examples that combine plain and parameterized dtypes with an
embedded FieldInfo.

Also finish the pyspark side of the Annotated FieldInfo fix:

* api/pyspark/model.py: allow ``annotation.is_annotated_type`` to take
  the column-build path (previously the ``annotation.origin is None``
  guard misclassified ``Annotated[T.StringType, pa.Field(...)]`` as an
  invalid annotation and raised SchemaInitError).

* tests/pyspark/test_pyspark_model.py: add regression tests covering
  metadata propagation, check application, and the Annotated
  cross-class dedup bug.

Co-authored-by: Cursor <cursoragent@cursor.com>
@cosmicBboy
Copy link
Copy Markdown
Collaborator

I fixed this PR up @NGHades , sorry for the long delay getting this in, and thanks for the contribution!

@cosmicBboy cosmicBboy merged commit 53e3a50 into unionai-oss:main May 21, 2026
319 of 320 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants