Skip to content

feat(bigframes): Add numpy ufunc support to col expressions#16554

Draft
TrevorBergeron wants to merge 1 commit intomainfrom
tbergeron_col_numpy
Draft

feat(bigframes): Add numpy ufunc support to col expressions#16554
TrevorBergeron wants to merge 1 commit intomainfrom
tbergeron_col_numpy

Conversation

@TrevorBergeron
Copy link
Copy Markdown
Contributor

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for NumPy universal functions (ufuncs) in BigFrames by implementing the __array_ufunc__ method in the Expression class. It also refactors binary operation logic into a helper function _as_bf_expr and adds unit tests to verify the new functionality. Feedback was provided regarding the use of a non-standard type hint for the method parameter and an issue in the unit tests where non-standard pandas API calls were used to compute expected results.

return strings.StringMethods(self)

def __array_ufunc__(
self, ufunc: numpy.ufunc, method: __builtins__.str, *inputs, **kwargs
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using __builtins__.str as a type hint is non-standard and potentially fragile. It is recommended to use the built-in str type directly.

Suggested change
self, ufunc: numpy.ufunc, method: __builtins__.str, *inputs, **kwargs
self, ufunc: numpy.ufunc, method: str, *inputs, **kwargs
References
  1. Standard Python type hinting practices (PEP 484) recommend using built-in types like 'str' directly instead of accessing them through 'builtins'. (link)

Comment on lines +261 to +266
pd_kwargs = {
"sqrt": np.sqrt(pd.col("float64_col")), # type: ignore
"add_const": np.add(pd.col("float64_col"), 2.4), # type: ignore
"radd_const": np.add(2.4, pd.col("float64_col")), # type: ignore
"add_cols": np.add(pd.col("float64_col"), pd.col("int64_col")), # type: ignore
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The pd_kwargs dictionary uses pd.col, which is not a standard pandas API. To correctly verify the BigFrames implementation against pandas, the expected results should be computed using standard pandas column access on scalars_pandas_df. Additionally, standard pandas assign does not support BigFrames Expression objects. To ensure dictionary keys remain sorted without manual effort, the dictionary should be programmatically sorted.

Suggested change
pd_kwargs = {
"sqrt": np.sqrt(pd.col("float64_col")), # type: ignore
"add_const": np.add(pd.col("float64_col"), 2.4), # type: ignore
"radd_const": np.add(2.4, pd.col("float64_col")), # type: ignore
"add_cols": np.add(pd.col("float64_col"), pd.col("int64_col")), # type: ignore
}
pd_kwargs = dict(sorted({
"sqrt": np.sqrt(scalars_pandas_df["float64_col"]),
"add_const": np.add(scalars_pandas_df["float64_col"], 2.4),
"radd_const": np.add(2.4, scalars_pandas_df["float64_col"]),
"add_cols": np.add(scalars_pandas_df["float64_col"], scalars_pandas_df["int64_col"]),
}.items()))
References
  1. To ensure dictionary keys remain sorted without manual effort, programmatically sort the dictionary instead of relying on manual ordering in the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant