Skip to content

[SPARK-57679][PYTHON] Fix numpy type checking#56757

Closed
gaogaotiantian wants to merge 4 commits into
apache:masterfrom
gaogaotiantian:fix-numpy-type-checking
Closed

[SPARK-57679][PYTHON] Fix numpy type checking#56757
gaogaotiantian wants to merge 4 commits into
apache:masterfrom
gaogaotiantian:fix-numpy-type-checking

Conversation

@gaogaotiantian

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Add a branch to detect NDArray for numpy >= 2.5.0.

Why are the changes needed?

Our detection for NDArray is a bit fragile. numpy changed the type alias in 2.5.0 so we need another way to detect it.

CI is failing - https://github.com/apache/spark/actions/runs/28064134730/job/83087583742

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Local test passed

Was this patch authored or co-authored using generative AI tooling?

Yes, Claude Code (Opus 4.8 high)

@gaogaotiantian gaogaotiantian changed the title Fix numpy type checking [SPARK-57679][PYTHON][PANDAS] Fix numpy type checking Jun 24, 2026

@uros-b uros-b left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and len(tpe.__args__) > 1
):
# numpy.typing.NDArray
# numpy.typing.NDArray for numpy < 2.5

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# numpy.typing.NDArray for numpy < 2.5
# numpy.typing.NDArray for numpy < 2.5.0

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(uber nit)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically < 2.5.0 is the same as < 2.5 I believe we check minor (instead of micro) version of numpy in another few places.

@HyukjinKwon HyukjinKwon changed the title [SPARK-57679][PYTHON][PANDAS] Fix numpy type checking [SPARK-57679][PYTHON] Fix numpy type checking Jun 25, 2026
gaogaotiantian added a commit that referenced this pull request Jun 30, 2026
### What changes were proposed in this pull request?

Add a branch to detect `NDArray` for `numpy >= 2.5.0`.

### Why are the changes needed?

Our detection for `NDArray` is a bit fragile. `numpy` changed the type alias in `2.5.0` so we need another way to detect it.

CI is failing - https://github.com/apache/spark/actions/runs/28064134730/job/83087583742

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Local test passed

### Was this patch authored or co-authored using generative AI tooling?

Yes, Claude Code (Opus 4.8 high)

Closes #56757 from gaogaotiantian/fix-numpy-type-checking.

Authored-by: Tian Gao <gaogaotiantian@hotmail.com>
Signed-off-by: Tian Gao <gaogaotiantian@hotmail.com>
(cherry picked from commit 5f7570c)
Signed-off-by: Tian Gao <gaogaotiantian@hotmail.com>
gaogaotiantian added a commit that referenced this pull request Jun 30, 2026
Add a branch to detect `NDArray` for `numpy >= 2.5.0`.

Our detection for `NDArray` is a bit fragile. `numpy` changed the type alias in `2.5.0` so we need another way to detect it.

CI is failing - https://github.com/apache/spark/actions/runs/28064134730/job/83087583742

No.

Local test passed

Yes, Claude Code (Opus 4.8 high)

Closes #56757 from gaogaotiantian/fix-numpy-type-checking.

Authored-by: Tian Gao <gaogaotiantian@hotmail.com>
Signed-off-by: Tian Gao <gaogaotiantian@hotmail.com>
(cherry picked from commit 5f7570c)
Signed-off-by: Tian Gao <gaogaotiantian@hotmail.com>
gaogaotiantian added a commit that referenced this pull request Jun 30, 2026
Add a branch to detect `NDArray` for `numpy >= 2.5.0`.

Our detection for `NDArray` is a bit fragile. `numpy` changed the type alias in `2.5.0` so we need another way to detect it.

CI is failing - https://github.com/apache/spark/actions/runs/28064134730/job/83087583742

No.

Local test passed

Yes, Claude Code (Opus 4.8 high)

Closes #56757 from gaogaotiantian/fix-numpy-type-checking.

Authored-by: Tian Gao <gaogaotiantian@hotmail.com>
Signed-off-by: Tian Gao <gaogaotiantian@hotmail.com>
(cherry picked from commit 5f7570c)
Signed-off-by: Tian Gao <gaogaotiantian@hotmail.com>
gaogaotiantian added a commit that referenced this pull request Jun 30, 2026
Add a branch to detect `NDArray` for `numpy >= 2.5.0`.

Our detection for `NDArray` is a bit fragile. `numpy` changed the type alias in `2.5.0` so we need another way to detect it.

CI is failing - https://github.com/apache/spark/actions/runs/28064134730/job/83087583742

No.

Local test passed

Yes, Claude Code (Opus 4.8 high)

Closes #56757 from gaogaotiantian/fix-numpy-type-checking.

Authored-by: Tian Gao <gaogaotiantian@hotmail.com>
Signed-off-by: Tian Gao <gaogaotiantian@hotmail.com>
(cherry picked from commit 5f7570c)
Signed-off-by: Tian Gao <gaogaotiantian@hotmail.com>
@gaogaotiantian

Copy link
Copy Markdown
Contributor Author

Merged to master, branch-4.x, branch-4.2, branch-4.1 and branch-4.0. Thank you all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants