feat(bigframes): Add numpy ufunc support to col expressions#16554
feat(bigframes): Add numpy ufunc support to col expressions#16554TrevorBergeron wants to merge 1 commit intomainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces support for NumPy universal functions (ufuncs) in BigFrames by implementing the __array_ufunc__ method in the Expression class. It also refactors binary operation logic into a helper function _as_bf_expr and adds unit tests to verify the new functionality. Feedback was provided regarding the use of a non-standard type hint for the method parameter and an issue in the unit tests where non-standard pandas API calls were used to compute expected results.
| return strings.StringMethods(self) | ||
|
|
||
| def __array_ufunc__( | ||
| self, ufunc: numpy.ufunc, method: __builtins__.str, *inputs, **kwargs |
There was a problem hiding this comment.
Using __builtins__.str as a type hint is non-standard and potentially fragile. It is recommended to use the built-in str type directly.
| self, ufunc: numpy.ufunc, method: __builtins__.str, *inputs, **kwargs | |
| self, ufunc: numpy.ufunc, method: str, *inputs, **kwargs |
References
- Standard Python type hinting practices (PEP 484) recommend using built-in types like 'str' directly instead of accessing them through 'builtins'. (link)
| pd_kwargs = { | ||
| "sqrt": np.sqrt(pd.col("float64_col")), # type: ignore | ||
| "add_const": np.add(pd.col("float64_col"), 2.4), # type: ignore | ||
| "radd_const": np.add(2.4, pd.col("float64_col")), # type: ignore | ||
| "add_cols": np.add(pd.col("float64_col"), pd.col("int64_col")), # type: ignore | ||
| } |
There was a problem hiding this comment.
The pd_kwargs dictionary uses pd.col, which is not a standard pandas API. To correctly verify the BigFrames implementation against pandas, the expected results should be computed using standard pandas column access on scalars_pandas_df. Additionally, standard pandas assign does not support BigFrames Expression objects. To ensure dictionary keys remain sorted without manual effort, the dictionary should be programmatically sorted.
| pd_kwargs = { | |
| "sqrt": np.sqrt(pd.col("float64_col")), # type: ignore | |
| "add_const": np.add(pd.col("float64_col"), 2.4), # type: ignore | |
| "radd_const": np.add(2.4, pd.col("float64_col")), # type: ignore | |
| "add_cols": np.add(pd.col("float64_col"), pd.col("int64_col")), # type: ignore | |
| } | |
| pd_kwargs = dict(sorted({ | |
| "sqrt": np.sqrt(scalars_pandas_df["float64_col"]), | |
| "add_const": np.add(scalars_pandas_df["float64_col"], 2.4), | |
| "radd_const": np.add(2.4, scalars_pandas_df["float64_col"]), | |
| "add_cols": np.add(scalars_pandas_df["float64_col"], scalars_pandas_df["int64_col"]), | |
| }.items())) |
References
- To ensure dictionary keys remain sorted without manual effort, programmatically sort the dictionary instead of relying on manual ordering in the code.
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕