Fixed issue where Pandera cannot handle metadata through Annotated types by NGHades · Pull Request #2111 · unionai-oss/pandera

NGHades · 2025-08-11T18:25:39Z

Problem

This PR addresses issue #2110, which we (@kevJ711 and @KViruz2750) identified while working with Annotated types in conjunction with pa.Field(...).

Description

When defining schema models using Annotated along with pa.Field(...), metadata such as description, unique, and title was not being correctly propagated into the resulting DataFrameModel.

Solution

We introduced a more rigorous check for parsing Annotated types to ensure that any AnnotationInfo attached to a type is correctly handled and its metadata extracted. This change allows the model to capture and utilize metadata as expected.

Additionally, we observed that certain built-in types (str, int, float, bool) do not support parameterization. To prevent issues when handling these types, we added a check that safely returns an empty metadata dictionary for them.

> > Co-authored-by: Kevin Jijon <Kevinjijon0@gmail.com> > Co-authored-by: Karan Verma <kverma2750@gmail.com>

codecov · 2025-08-12T03:37:08Z

Codecov Report

❌ Patch coverage is 67.24138% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.54%. Comparing base (76d663c) to head (3a12e4f).

Files with missing lines	Patch %	Lines
pandera/api/pyspark/model.py	0.00%	12 Missing ⚠️
pandera/api/base/model_components.py	40.00%	3 Missing ⚠️
pandera/api/ibis/model.py	66.66%	2 Missing ⚠️
pandera/api/dataframe/model.py	94.44%	1 Missing ⚠️
pandera/api/pandas/model.py	83.33%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2111      +/-   ##
==========================================
+ Coverage   80.90%   83.54%   +2.63%     
==========================================
  Files         190      190              
  Lines       16621    16654      +33     
==========================================
+ Hits        13448    13914     +466     
+ Misses       3173     2740     -433

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

cosmicBboy · 2025-08-12T17:12:10Z

thanks for the contribution @NGHades ! looks like there are a few failing tests. you can repro this locally using nox, see here: https://pandera.readthedocs.io/en/stable/CONTRIBUTING.html#run-a-specific-test-suite-locally

ybressler · 2025-10-09T16:56:49Z

+                    if field_info_list:  
+                        existing_field = field_info_list[0]  


Would be useful to provide an inline comment here, why you're retrieving the first item

Fully resolves unionai-oss#2110 by propagating ``FieldInfo`` metadata embedded in ``typing.Annotated`` annotations to the resulting schema, and addresses several issues with the original patch: * Extract the embedded ``FieldInfo`` from ``Annotated`` metadata in ``DataFrameModel.__init_subclass__`` so attributes like ``description``, ``title``, ``unique``, and ``ge``/``le`` checks defined via ``Annotated[T, pa.Field(...)]`` are preserved on the schema columns. * Fix ``BaseFieldInfo.__hash__``/``__eq__`` to use identity for un-named fields so Python's ``typing.Annotated`` cache does not deduplicate ``Annotated[T, pa.Field(...)]`` annotations across distinct model classes (which previously caused the second model to inherit the first model's field configuration). * Refactor ``get_dtype_kwargs`` to filter out ``FieldInfo`` entries from the annotation metadata via a shared ``_dtype_metadata`` helper. The pandas and pyspark builders use the helper to decide whether to call ``annotation.arg(**kwargs)`` or use the annotated type as-is. Cleanups carried over from the original patch: * Remove an unused ``from pandera.utils import F`` import and the dead ``__annotation_infos__`` cache branch. * Restore the ``column_properties`` docstring and remove trailing whitespace introduced in ``model_components.py``/``typing/common.py``. * Drop a duplicated ``dtype_kwargs`` check and a leftover ``<-- str`` comment in ``pandas/model.py``. Adds pandas-side tests covering description/title/unique/check propagation through ``Annotated``, the cross-class non-deduplication guarantee, and that an explicit ``= pa.Field(...)`` assignment continues to take precedence over an embedded ``Annotated`` ``FieldInfo``. Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

The polars ``_build_columns`` code path entered the ``annotation.arg(**dtype_kwargs)`` branch when the polars engine could not resolve an ``Annotated`` type (e.g. ``Annotated[float, pa.Field(...)]``), causing the model to fail with ``SchemaInitError: Invalid annotation`` because ``float(**{}) == 0.0`` is not a valid polars dtype. Apply the same ``_dtype_metadata`` guard used by the pandas builder: if the ``Annotated`` metadata contains no dtype parameters (only a ``FieldInfo``), use ``annotation.arg`` directly as the dtype instead of calling it with empty kwargs. Adds polars-side regression tests covering FieldInfo propagation through ``Annotated`` (description, title, unique, checks, metadata) and the cross-class non-deduplication guarantee. Co-authored-by: Cursor <cursoragent@cursor.com>

Add subsections to both the general DataFrame Models guide and the polars guide showing how to embed a ``pa.Field(...)`` directly inside ``typing.Annotated`` — including descriptions, titles, unique flags, checks (``ge``/``le``), and combinations with parameterized dtypes. Also notes that an explicit ``= pa.Field(...)`` assignment continues to take precedence over an embedded ``Annotated`` ``FieldInfo``. Verified via ``sphinx-build``: both pages build cleanly and the ``code-cell``/``testcode`` blocks execute their expected output into the rendered HTML. Co-authored-by: Cursor <cursoragent@cursor.com>

Mirror the pandas/polars Annotated[T, pa.Field(...)] handling in the ibis backend so embedded FieldInfo metadata (description, title, unique, checks, etc.) propagates to the generated schema. * api/ibis/model.py: guard the dtype-instantiation path with ``_dtype_metadata`` so ``Annotated[float, pa.Field(...)]`` (and similar built-in-typed annotations carrying only a FieldInfo) no longer call ``float(**{})`` and instead use the annotated type as-is. * engines/ibis_engine.py: the ``Engine.dtype`` fallback used to call ``data_type().to_numpy()`` unconditionally, which raised an ``AttributeError`` for annotations that don't resolve as numpy-scalar dtypes. Wrap the fallback so any failure re-raises the original TypeError, allowing the DataFrameModel fallback path (above) to take over. * tests/ibis/test_ibis_model.py: add regression tests covering metadata propagation, check application, and the Annotated cross-class dedup bug that was fixed in BaseFieldInfo.__hash__/__eq__. * docs/source/ibis.md: document the new functionality with a worked example. Co-authored-by: Cursor <cursoragent@cursor.com>

Match the polars docs section by adding "Embedding Field metadata in Annotated" subsections to the pyspark and ibis user guides, with worked examples that combine plain and parameterized dtypes with an embedded FieldInfo. Also finish the pyspark side of the Annotated FieldInfo fix: * api/pyspark/model.py: allow ``annotation.is_annotated_type`` to take the column-build path (previously the ``annotation.origin is None`` guard misclassified ``Annotated[T.StringType, pa.Field(...)]`` as an invalid annotation and raised SchemaInitError). * tests/pyspark/test_pyspark_model.py: add regression tests covering metadata propagation, check application, and the Annotated cross-class dedup bug. Co-authored-by: Cursor <cursoragent@cursor.com>

cosmicBboy · 2026-05-21T13:06:21Z

I fixed this PR up @NGHades , sorry for the long delay getting this in, and thanks for the contribution!

NGHades added 7 commits August 5, 2025 15:46

Fixed Annotated types issue with metadata

0a827d7

Fixed handling of Annotated types

260f3ac

> > Co-authored-by: Kevin Jijon <Kevinjijon0@gmail.com> > Co-authored-by: Karan Verma <kverma2750@gmail.com>

Fixed issue where metadata was not being correctly added to FieldInfo

58c9adf

Deleted print statements in DataFrameModel files and annotated.py

259bb41

Deleted the get_origin in import statements

77223b7

Deleted print statements in dataframe model_components

9c73a84

Deleted print statements in typing common.py

af4c344

cosmicBboy mentioned this pull request Aug 19, 2025

Pandera cannot handle metadata through Annotated types #2110

Closed

3 tasks

ybressler reviewed Oct 9, 2025

View reviewed changes

cosmicBboy and others added 7 commits December 31, 2025 10:21

Merge branch 'main' into main

b874075

Merge branch 'main' into main

599ecb5

Co-authored-by: Cursor <cursoragent@cursor.com>

cosmicBboy approved these changes May 21, 2026

View reviewed changes

cosmicBboy merged commit 53e3a50 into unionai-oss:main May 21, 2026
319 of 320 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixed issue where Pandera cannot handle metadata through Annotated types#2111

Fixed issue where Pandera cannot handle metadata through Annotated types#2111
cosmicBboy merged 14 commits into
unionai-oss:mainfrom
NGHades:main

NGHades commented Aug 11, 2025

Uh oh!

codecov Bot commented Aug 12, 2025 •

edited

Loading

Uh oh!

cosmicBboy commented Aug 12, 2025

Uh oh!

ybressler Oct 9, 2025

Uh oh!

cosmicBboy commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

NGHades commented Aug 11, 2025

Problem

Description

Solution

Uh oh!

codecov Bot commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cosmicBboy commented Aug 12, 2025

Uh oh!

ybressler Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

cosmicBboy commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov Bot commented Aug 12, 2025 •

edited

Loading