refactor: Testing constructors with default namespace#3685
Conversation
dangotbanned
left a comment
There was a problem hiding this comment.
Thanks for getting to this so quickly @FBruzzesi!
I have a few suggestions and 1 follow-up issue that we might wanna address on main instead
| # NOTE: the pandas-like-only `.columns.name` accesses below go unchecked because | ||
| # `.columns` is loosely typed on the native protocols (`main` typed `constructor` | ||
| # as `Callable[..., pd.DataFrame]` in this test). | ||
| df_native = constructor(data).to_native() | ||
| df_native.columns.name = "foo" |
There was a problem hiding this comment.
Since we have quite a few pandas-specific tests, it may be worth adding a Protocol we can use for them.
Show FrameConstructor
from __future__ import annotations
from typing import Any, Protocol, TypeAlias, TypeVar
from narwhals import DataFrame, LazyFrame
FrameT_co = TypeVar(
"FrameT_co", bound=nw.DataFrame[Any] | nw.LazyFrame[Any], covariant=True
)
Incomplete: TypeAlias = Any
"""Fill these in @FBruzzesi!"""
Data: TypeAlias = Incomplete
NarwhalsNamespace: TypeAlias = Incomplete
class FrameConstructor(Protocol[FrameT_co]):
"""Pick whatever name you like!"""
# NOTE: Include any attributes that are referenced in a test
# exclude any that are part of the setup
is_eager: bool
nan_is_null: bool
needs_gpu: bool
default_include: bool
# NOTE: Would add just `__hash__` but that could be an issue w/ inheritance
def __hash__(self) -> int: ...
def __eq__(self, other: object) -> bool: ...
def __call__(
self, obj: Data, /, namespace: NarwhalsNamespace | None = None, **kwds: Any
) -> FrameT_co: ...
# NOTE: Defining the docs here means:
# - `frame_constructor` can inherit them
# - this protocol can be used in an annotation and still provide docs
@property
def identifier(self) -> str:
"""Instance-level string identifier for test IDs."""
...
@property
def is_lazy(self) -> bool:
"""Whether this constructor produces a lazy native frame."""
...
@property
def is_pandas(self) -> bool:
"""Whether this is one of the pandas constructors."""
...
@property
def is_modin(self) -> bool:
"""Whether this is one of the modin constructors."""
...
@property
def is_cudf(self) -> bool:
"""Whether this is the cudf constructor."""
...
@property
def is_pandas_like(self) -> bool:
"""Whether this constructor produces a pandas-like dataframe (pandas, modin, cudf)."""
...
@property
def is_polars(self) -> bool:
"""Whether this is one of the polars constructors."""
...
@property
def is_pyarrow(self) -> bool:
"""Whether this is the pyarrow table constructor."""
...
@property
def is_dask(self) -> bool:
"""Whether this is the dask constructor."""
...
@property
def is_duckdb(self) -> bool:
"""Whether this is the duckdb constructor."""
...
@property
def is_pyspark(self) -> bool:
"""Whether this is one of the pyspark constructors."""
...
@property
def is_sqlframe(self) -> bool:
"""Whether this is the sqlframe constructor."""
...
@property
def is_ibis(self) -> bool:
"""Whether this is the ibis constructor."""
...
@property
def is_spark_like(self) -> bool:
"""Whether this constructor uses a spark-like backend (pyspark, sqlframe)."""
...
@property
def needs_pyarrow(self) -> bool:
"""Whether this constructor requires `pyarrow` to be installed."""
...
@property
def is_available(self) -> bool:
"""Whether every package this constructor needs is importable."""
...Then we can add aliases like this:
import pandas as pd
PandasConstructor: TypeAlias = FrameConstructor[nw.DataFrame[pd.DataFrame]]And fix the typing like this 😄
def test_ops_preserve_column_index_name(
- constructor: Constructor, request: pytest.FixtureRequest
+ constructor: PandasConstructor, request: pytest.FixtureRequest
) -> None:There was a problem hiding this comment.
The nice thing about the separate Protocol is that you have the flexibility of defining both the narwhals & native types in one swoop 😎
There was a problem hiding this comment.
@dangotbanned sorry how does this differ from frame_constructor?
There was a problem hiding this comment.
@dangotbanned sorry how does this differ from
frame_constructor?
(1)
It uses forward references, and they won't link on the API reference 1
AFAICT, frame_constructor isn't on the API reference either, - so maybe there isn't a way to display the docstrings currently?
(2)
A separate Protocol avoids needing to resolve the @overloads in frame_constructor.__call__.
This is cheaper for a type checker 2, because they're given FrameT_co upfront and it is the only return type
def __call__(
self, obj: Data, /, namespace: NarwhalsNamespace | None = None, **kwds: Any
) -> FrameT_co: ...(3)
The idea was that frame_constructor inherits from the protocol.
In (#3685 (comment)) I left out the method impls, but you could just as easily implement them there if extra code is a concern?
The protocol also isn't too far off from being able to support Series 😉
Footnotes
-
That was my 2nd most common issue in (https://github.com/narwhals-dev/narwhals/compare/expr-ir/docs/are-we-docs-yet...expr-ir/docs/make-mkdocstrings-happy#files_bucket) ↩
-
pyrightis great at understanding@overloads, but they can be expensive when you use inlay hints ↩
There was a problem hiding this comment.
Thanks for explaining @dangotbanned ! If you fancy adding a commit to address this, please go ahead! I won't have time to make code changes before mid next week most likely (but at the same time I would like to see the testing constructors moving forward 😅)
| # NOTE: `type-var` because v1 `from_native` typing does not admit `NativeIbis`, | ||
| # which `IntoLazyFrame` (hence the constructor return) includes. | ||
| nw_v1_frame = nw_v1.from_native(native_frame) # type: ignore[type-var] |
There was a problem hiding this comment.
I can't tell if the ibis one should be raising or not?
| # NOTE: `type-var` because v1 `from_native` typing does not admit `NativeIbis`, | |
| # which `IntoLazyFrame` (hence the constructor return) includes. | |
| nw_v1_frame = nw_v1.from_native(native_frame) # type: ignore[type-var] | |
| from narwhals.dependencies import is_ibis_table | |
| if is_ibis_table(native_frame): | |
| nw_v1.from_native(native_frame) # type: ignore[type-var] | |
| else: | |
| nw_v1_frame = nw_v1.from_native(native_frame) |
There was a problem hiding this comment.
Note
Feel free to defer this to an issue to be resolved on main first 👍
Tbh it might be easier to do this with the tests split up some more.
I missed it while reviewing (#3613) - but there isn't much typing coverage being exercised.
What?
We check this input type:
Line 48 in 8f6b947
against things that depend the same alias:
narwhals/src/narwhals/dependencies.py
Line 657 in 8f6b947
narwhals/src/narwhals/_native.py
Line 354 in 8f6b947
narwhals/src/narwhals/_native.py
Line 297 in 8f6b947
narwhals/src/narwhals/_native.py
Line 310 in 8f6b947
Why is that bad?
We can change the definition of IntoDataFrame and the typing for the tests will still stay green.
Fix?
We should be testing the actual native types, similar to:
tests.implementation_test._TYPING_ONLY_TESTS
narwhals/tests/implementation_test.py
Lines 76 to 319 in 8f6b947
We can still do the short versions for runtime tests, and this would be if TYPE_CHECKING:-only (free for pytest)
| # NOTE: `type-var` because v1 `from_native` typing does not admit `NativeIbis`, | ||
| # which `IntoLazyFrame` (hence the constructor return) includes. |
| # The following fixtures are short-name aliases of those registered in | ||
| # `narwhals/testing/pytest_plugin.py`. Calling a constructor without an explicit | ||
| # `namespace` defaults to the main `narwhals` namespace; tests can still pass | ||
| # `nw_v1` / `nw_v2` explicitly to opt in to a stable namespace. The legacy pattern | ||
| # `nw.from_native(constructor(data))` keeps working because `nw.from_native` is | ||
| # idempotent on narwhals objects. | ||
| # TODO(FBruzzesi): Drop these aliases once every test requests `nw_frame` / | ||
| # `nw_dataframe` / `nw_pandas_like_frame` directly. |
There was a problem hiding this comment.
Just make this a sub-issue of #3653 and save yourself from the merge conflict pains 😂
@dangotbanned this branch has two commits, one for typing one to add the default namespace in
frame_constructor._call__.Let me know what you think