Skip to content

refactor: Testing constructors with default namespace#3685

Open
FBruzzesi wants to merge 7 commits into
feat/testing-constructorsfrom
feat/testing-constructors-with-default
Open

refactor: Testing constructors with default namespace#3685
FBruzzesi wants to merge 7 commits into
feat/testing-constructorsfrom
feat/testing-constructors-with-default

Conversation

@FBruzzesi

Copy link
Copy Markdown
Member

@dangotbanned this branch has two commits, one for typing one to add the default namespace in frame_constructor._call__.

Let me know what you think

@FBruzzesi FBruzzesi requested a review from dangotbanned June 12, 2026 16:14

@dangotbanned dangotbanned left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting to this so quickly @FBruzzesi!

I have a few suggestions and 1 follow-up issue that we might wanna address on main instead

Comment on lines 25 to 29
# NOTE: the pandas-like-only `.columns.name` accesses below go unchecked because
# `.columns` is loosely typed on the native protocols (`main` typed `constructor`
# as `Callable[..., pd.DataFrame]` in this test).
df_native = constructor(data).to_native()
df_native.columns.name = "foo"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have quite a few pandas-specific tests, it may be worth adding a Protocol we can use for them.

Show FrameConstructor

from __future__ import annotations

from typing import Any, Protocol, TypeAlias, TypeVar

from narwhals import DataFrame, LazyFrame

FrameT_co = TypeVar(
    "FrameT_co", bound=nw.DataFrame[Any] | nw.LazyFrame[Any], covariant=True
)
Incomplete: TypeAlias = Any
"""Fill these in @FBruzzesi!"""

Data: TypeAlias = Incomplete
NarwhalsNamespace: TypeAlias = Incomplete


class FrameConstructor(Protocol[FrameT_co]):
    """Pick whatever name you like!"""

    # NOTE: Include any attributes that are referenced in a test
    # exclude any that are part of the setup
    is_eager: bool
    nan_is_null: bool
    needs_gpu: bool
    default_include: bool

    # NOTE: Would add just `__hash__` but that could be an issue w/ inheritance
    def __hash__(self) -> int: ...
    def __eq__(self, other: object) -> bool: ...

    def __call__(
        self, obj: Data, /, namespace: NarwhalsNamespace | None = None, **kwds: Any
    ) -> FrameT_co: ...

    # NOTE: Defining the docs here means:
    # - `frame_constructor` can inherit them
    # - this protocol can be used in an annotation and still provide docs
    @property
    def identifier(self) -> str:
        """Instance-level string identifier for test IDs."""
        ...

    @property
    def is_lazy(self) -> bool:
        """Whether this constructor produces a lazy native frame."""
        ...

    @property
    def is_pandas(self) -> bool:
        """Whether this is one of the pandas constructors."""
        ...

    @property
    def is_modin(self) -> bool:
        """Whether this is one of the modin constructors."""
        ...

    @property
    def is_cudf(self) -> bool:
        """Whether this is the cudf constructor."""
        ...

    @property
    def is_pandas_like(self) -> bool:
        """Whether this constructor produces a pandas-like dataframe (pandas, modin, cudf)."""
        ...

    @property
    def is_polars(self) -> bool:
        """Whether this is one of the polars constructors."""
        ...

    @property
    def is_pyarrow(self) -> bool:
        """Whether this is the pyarrow table constructor."""
        ...

    @property
    def is_dask(self) -> bool:
        """Whether this is the dask constructor."""
        ...

    @property
    def is_duckdb(self) -> bool:
        """Whether this is the duckdb constructor."""
        ...

    @property
    def is_pyspark(self) -> bool:
        """Whether this is one of the pyspark constructors."""
        ...

    @property
    def is_sqlframe(self) -> bool:
        """Whether this is the sqlframe constructor."""
        ...

    @property
    def is_ibis(self) -> bool:
        """Whether this is the ibis constructor."""
        ...

    @property
    def is_spark_like(self) -> bool:
        """Whether this constructor uses a spark-like backend (pyspark, sqlframe)."""
        ...

    @property
    def needs_pyarrow(self) -> bool:
        """Whether this constructor requires `pyarrow` to be installed."""
        ...

    @property
    def is_available(self) -> bool:
        """Whether every package this constructor needs is importable."""
        ...

Then we can add aliases like this:

import pandas as pd

PandasConstructor: TypeAlias = FrameConstructor[nw.DataFrame[pd.DataFrame]]

And fix the typing like this 😄

def test_ops_preserve_column_index_name(
-    constructor: Constructor, request: pytest.FixtureRequest
+    constructor: PandasConstructor, request: pytest.FixtureRequest
) -> None:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nice thing about the separate Protocol is that you have the flexibility of defining both the narwhals & native types in one swoop 😎

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went with the "quick win" in 75a581c

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dangotbanned sorry how does this differ from frame_constructor?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dangotbanned sorry how does this differ from frame_constructor?

(1)

It uses forward references, and they won't link on the API reference 1

AFAICT, frame_constructor isn't on the API reference either, - so maybe there isn't a way to display the docstrings currently?

(2)

A separate Protocol avoids needing to resolve the @overloads in frame_constructor.__call__.

This is cheaper for a type checker 2, because they're given FrameT_co upfront and it is the only return type

    def __call__(
        self, obj: Data, /, namespace: NarwhalsNamespace | None = None, **kwds: Any
    ) -> FrameT_co: ...

(3)

The idea was that frame_constructor inherits from the protocol.
In (#3685 (comment)) I left out the method impls, but you could just as easily implement them there if extra code is a concern?

The protocol also isn't too far off from being able to support Series 😉

Footnotes

  1. That was my 2nd most common issue in (https://github.com/narwhals-dev/narwhals/compare/expr-ir/docs/are-we-docs-yet...expr-ir/docs/make-mkdocstrings-happy#files_bucket)

  2. pyright is great at understanding @overloads, but they can be expensive when you use inlay hints

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining @dangotbanned ! If you fancy adding a commit to address this, please go ahead! I won't have time to make code changes before mid next week most likely (but at the same time I would like to see the testing constructors moving forward 😅)

Comment on lines +47 to +49
# NOTE: `type-var` because v1 `from_native` typing does not admit `NativeIbis`,
# which `IntoLazyFrame` (hence the constructor return) includes.
nw_v1_frame = nw_v1.from_native(native_frame) # type: ignore[type-var]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't tell if the ibis one should be raising or not?

Suggested change
# NOTE: `type-var` because v1 `from_native` typing does not admit `NativeIbis`,
# which `IntoLazyFrame` (hence the constructor return) includes.
nw_v1_frame = nw_v1.from_native(native_frame) # type: ignore[type-var]
from narwhals.dependencies import is_ibis_table
if is_ibis_table(native_frame):
nw_v1.from_native(native_frame) # type: ignore[type-var]
else:
nw_v1_frame = nw_v1.from_native(native_frame)

@dangotbanned dangotbanned Jun 15, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note

Feel free to defer this to an issue to be resolved on main first 👍

Tbh it might be easier to do this with the tests split up some more.

I missed it while reviewing (#3613) - but there isn't much typing coverage being exercised.

What?

We check this input type:

Constructor: TypeAlias = Callable[[Any], "NativeLazyFrame | IntoDataFrame"]

against things that depend the same alias:

def is_into_dataframe(native_dataframe: Any | IntoDataFrameT) -> TypeIs[IntoDataFrameT]:

IntoDataFrameT = TypeVar("IntoDataFrameT", bound=IntoDataFrame)

IntoDataFrame: TypeAlias = NativeDataFrame

IntoLazyFrame: TypeAlias = NativeLazyFrame | NativeIbis

Why is that bad?

We can change the definition of IntoDataFrame and the typing for the tests will still stay green.

Fix?

We should be testing the actual native types, similar to:

tests.implementation_test._TYPING_ONLY_TESTS

_TYPING_ONLY_TESTS = "_"
"""Exhaustive checks for overload matching native -> implementation.
## Arrange
Each test defines a function accepting a `native` frame.
Important:
We *must* use the concrete types and therefore the type checker *needs* the package installed.
Next, wrap `native` as a `nw.{Data,Lazy}Frame`.
Note:
If we support multiple native types, use `native` to generate `nw.{LazyFrame,Series}` as well.
Finally, look-up `.implementation` on all wrapped objects.
## Act
Try passing every result (`*_impl`) to functions that *only* accept a **subset** of `Implementation`.
This step *may require* a `# (type|pyright): ignore` directive, which defines the `# [... Negative]` result.
Otherwise, results are labelled with `# [... Positive]`.
If this *static* label matches *runtime* we use `# [True ...]`, otherwise `# [False ...]`.
Tip:
`# [False Negative]`s are the most frustrating for users.
Always try to minimize warning on safe code.
## Assert
The action determined whether or not our typing warns on an `@overload` match.
We still need to use [`assert_type`] to verify which `Implementation`(s) were returned as a result.
[`assert_type`]: https://typing-extensions.readthedocs.io/en/latest/#typing_extensions.assert_type
"""
if TYPE_CHECKING:
import dask.dataframe as dd
import duckdb
import ibis
import pandas as pd
import polars as pl
import pyarrow as pa
from sqlframe.base.dataframe import BaseDataFrame
from typing_extensions import assert_type
import tests._modin_stub as mpd
any_df: nw.DataFrame[Any] = cast("nw.DataFrame[Any]", "")
any_ldf: nw.LazyFrame[Any] = cast("nw.LazyFrame[Any]", "")
any_ser: nw.Series[Any] = cast("nw.Series[Any]", "")
bound_df: nw.DataFrame[IntoDataFrame] = cast("nw.DataFrame[IntoDataFrame]", "")
bound_ldf: nw.LazyFrame[IntoLazyFrame] = cast("nw.LazyFrame[IntoLazyFrame]", "")
bound_ser: nw.Series[IntoSeries] = cast("nw.Series[IntoSeries]", "")
def test_polars_typing(native: pl.DataFrame) -> None:
df = nw.from_native(native)
ldf = nw.from_native(native.lazy())
ser = nw.from_native(native.to_series(), series_only=True)
df_impl = df.implementation
ldf_impl = ldf.implementation
ser_impl = ser.implementation
# [True Positive]
any_df.lazy(df_impl)
any_df.lazy(ldf_impl)
any_df.lazy(ser_impl)
any_ldf.collect(df_impl)
any_ldf.collect(ldf_impl)
any_ldf.collect(ser_impl)
assert_type(df_impl, _PolarsImpl)
assert_type(ldf_impl, _PolarsImpl)
assert_type(ser_impl, _PolarsImpl)
def test_pandas_typing(native: pd.DataFrame) -> None:
df = nw.from_native(native)
ldf = nw.from_native(native).lazy()
ser = nw.from_native(native.iloc[0], series_only=True)
df_impl = df.implementation
ldf_impl = ldf.implementation
ser_impl = ser.implementation
# [True Negative]
any_df.lazy(df_impl) # type: ignore[arg-type]
# [False Positive]
any_df.lazy(ldf_impl)
# [True Negative]
any_df.lazy(ser_impl) # pyright: ignore[reportArgumentType] # pyrefly: ignore[bad-argument-type]
# [True Positive]
any_ldf.collect(df_impl)
any_ldf.collect(ldf_impl)
any_ldf.collect(ser_impl)
assert_type(df_impl, _PandasImpl)
# NOTE: Would require adding overloads to `DataFrame.lazy`
assert_type(ldf_impl, _PandasImpl) # pyright: ignore[reportAssertTypeFailure] # pyrefly: ignore[assert-type]
assert_type(ser_impl, _PandasImpl)
def test_arrow_typing(native: pa.Table) -> None:
df = nw.from_native(native)
ldf = nw.from_native(native).lazy()
ser = nw.from_native(native.column(0), series_only=True)
df_impl = df.implementation
ldf_impl = ldf.implementation
ser_impl = ser.implementation
# [True Negative]
any_df.lazy(df_impl) # type: ignore[arg-type]
# [False Positive]
any_df.lazy(ldf_impl)
# [True Negative]
any_df.lazy(ser_impl) # pyright: ignore[reportArgumentType] # pyrefly: ignore[bad-argument-type]
# [True Positive]
any_ldf.collect(df_impl)
any_ldf.collect(ldf_impl)
any_ldf.collect(ser_impl)
assert_type(df_impl, _ArrowImpl)
# NOTE: Would require adding overloads to `DataFrame.lazy`
assert_type(ldf_impl, _ArrowImpl) # pyright: ignore[reportAssertTypeFailure] # pyrefly: ignore[assert-type]
assert_type(ser_impl, _ArrowImpl)
def test_duckdb_typing(native: duckdb.DuckDBPyRelation) -> None:
ldf = nw.from_native(native)
ldf_impl = ldf.implementation
# [True Positive]
any_df.lazy(ldf_impl)
# [True Negative]
any_ldf.collect(ldf_impl) # type: ignore[arg-type]
assert_type(ldf.implementation, _DuckDBImpl)
def test_sqlframe_typing(native: BaseDataFrame[Any, Any, Any, Any, Any]) -> None:
ldf = nw.from_native(native)
ldf_impl = ldf.implementation
# [True Positive]
any_df.lazy(ldf_impl)
# [True Negative]
any_ldf.collect(ldf_impl) # pyright: ignore[reportArgumentType] # pyrefly: ignore[bad-argument-type]
assert_type(ldf.implementation, _SQLFrameImpl)
def test_ibis_typing(native: ibis.Table) -> None:
ldf = nw.from_native(native)
ldf_impl = ldf.implementation
# [True Positive]
any_df.lazy(ldf_impl)
# [True Negative]
any_ldf.collect(ldf_impl) # pyright: ignore[reportArgumentType]
assert_type(ldf.implementation, _IbisImpl) # pyrefly: ignore[assert-type] (todo)
def test_dask_typing(native: dd.DataFrame) -> None:
ldf = nw.from_native(native)
ldf_impl = ldf.implementation
# [True Positive]
any_df.lazy(ldf_impl)
# [True Negative]
any_ldf.collect(ldf_impl) # pyright: ignore[reportArgumentType] # pyrefly: ignore[bad-argument-type]
assert_type(ldf.implementation, _DaskImpl)
def test_modin_typing(native: mpd.DataFrame) -> None:
df = nw.from_native(native)
# NOTE: Arbitrary method that returns a `Series`
ser = nw.from_native(native.duplicated(), series_only=True)
df_impl = df.implementation
ser_impl = ser.implementation
# [True Negative]
any_df.lazy(df_impl) # pyright: ignore[reportArgumentType] # pyrefly: ignore[bad-argument-type]
any_df.lazy(ser_impl) # pyright: ignore[reportArgumentType] # pyrefly: ignore[bad-argument-type]
any_ldf.collect(df_impl) # pyright: ignore[reportArgumentType] # pyrefly: ignore[bad-argument-type]
any_ldf.collect(ser_impl) # pyright: ignore[reportArgumentType] # pyrefly: ignore[bad-argument-type]
assert_type(df_impl, _ModinImpl)
assert_type(ser_impl, _ModinImpl)
def test_any_typing() -> None:
df_impl = any_df.implementation
ldf_impl = any_ldf.implementation
ser_impl = any_ser.implementation
# [False Positive]
any_df.lazy(df_impl)
any_df.lazy(ldf_impl)
any_df.lazy(ser_impl)
any_ldf.collect(df_impl)
any_ldf.collect(ldf_impl)
any_ldf.collect(ser_impl)
assert_type(df_impl, _EagerAllowedImpl) # pyright: ignore[reportAssertTypeFailure] # pyrefly: ignore[assert-type]
assert_type(ldf_impl, _LazyAllowedImpl) # pyright: ignore[reportAssertTypeFailure] # pyrefly: ignore[assert-type]
assert_type(ser_impl, _EagerAllowedImpl) # pyright: ignore[reportAssertTypeFailure] # pyrefly: ignore[assert-type]
# Fallback, matches the first overload `_PolarsImpl`
assert_type(df_impl, _PolarsImpl)
assert_type(ldf_impl, _PolarsImpl)
assert_type(ser_impl, _PolarsImpl)
def test_bound_typing() -> None:
df_impl = bound_df.implementation
ldf_impl = bound_ldf.implementation
ser_impl = bound_ser.implementation
# [True Negative]
any_df.lazy(df_impl) # type: ignore[arg-type]
# [True Positive]
any_df.lazy(ldf_impl)
# [True Negative]
any_df.lazy(ser_impl) # type: ignore[arg-type]
any_ldf.collect(df_impl) # type: ignore[arg-type]
any_ldf.collect(ldf_impl) # type: ignore[arg-type]
any_ldf.collect(ser_impl) # type: ignore[arg-type]
assert_type(df_impl, _EagerAllowedImpl)
assert_type(ldf_impl, _LazyAllowedImpl)
assert_type(ser_impl, _EagerAllowedImpl)
def test_mixed_eager_typing(
*args: nw.DataFrame[pl.DataFrame | pd.DataFrame | pa.Table]
| nw.Series[pl.Series | pd.Series[Any] | pa.ChunkedArray[Any]],
) -> None:
# NOTE: Any combination of eager objects that **does not** include `cuDF`, `modin` should
# preserve that detail
mix_impl = args[0].implementation
# [True Negative]
any_df.lazy(mix_impl) # type: ignore[arg-type]
# [True Positive]
any_ldf.collect(mix_impl)
assert_type(mix_impl, _PolarsImpl | _PandasImpl | _ArrowImpl)

We can still do the short versions for runtime tests, and this would be if TYPE_CHECKING:-only (free for pytest)

Comment on lines +45 to +46
# NOTE: `type-var` because v1 `from_native` typing does not admit `NativeIbis`,
# which `IntoLazyFrame` (hence the constructor return) includes.

@dangotbanned dangotbanned Jun 15, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as is_into_dataframe

Comment thread tests/frame/schema_test.py Outdated
Comment thread tests/frame/schema_test.py Outdated
Comment thread docs/api-reference/testing.md Outdated
Comment thread docs/api-reference/testing.md Outdated
Comment thread tests/conftest.py
Comment on lines +119 to +126
# The following fixtures are short-name aliases of those registered in
# `narwhals/testing/pytest_plugin.py`. Calling a constructor without an explicit
# `namespace` defaults to the main `narwhals` namespace; tests can still pass
# `nw_v1` / `nw_v2` explicitly to opt in to a stable namespace. The legacy pattern
# `nw.from_native(constructor(data))` keeps working because `nw.from_native` is
# idempotent on narwhals objects.
# TODO(FBruzzesi): Drop these aliases once every test requests `nw_frame` /
# `nw_dataframe` / `nw_pandas_like_frame` directly.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just make this a sub-issue of #3653 and save yourself from the merge conflict pains 😂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants