Skip to content

[SEDONA-690] Set default metric to use Haversine for KNN join and code refactoring#1909

Merged
jiayuasu merged 3 commits into
apache:masterfrom
zhangfengcdt:sedona-690-fixKNNJoinPerformance
Apr 10, 2025
Merged

[SEDONA-690] Set default metric to use Haversine for KNN join and code refactoring#1909
jiayuasu merged 3 commits into
apache:masterfrom
zhangfengcdt:sedona-690-fixKNNJoinPerformance

Conversation

@zhangfengcdt
Copy link
Copy Markdown
Member

Did you read the Contributor Guide?

Is this PR related to a ticket?

  • Yes, and the PR name follows the format [SEDONA-XXX] my subject.

What changes were proposed in this PR?

This PR change the default distance metric to use Haversine distance when use_spheroid = true. It also refactors the join index judgement code to be more readable.

How was this patch tested?

KNNTestSuit

Did this PR include necessary documentation updates?

  • No, this PR does not affect any public API so no need to change the documentation.

@zhangfengcdt zhangfengcdt marked this pull request as ready for review April 3, 2025 19:01
@zhangfengcdt zhangfengcdt requested a review from jiayuasu as a code owner April 3, 2025 19:01
@jiayuasu jiayuasu requested a review from Copilot April 10, 2025 05:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 3 out of 7 changed files in this pull request and generated no comments.

Files not reviewed (4)
  • spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/strategy/join/BroadcastObjectSideKNNJoinExec.scala: Language not supported
  • spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/strategy/join/BroadcastQuerySideKNNJoinExec.scala: Language not supported
  • spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/strategy/join/KNNJoinExec.scala: Language not supported
  • spark/common/src/test/scala/org/apache/sedona/sql/KnnJoinSuite.scala: Language not supported
Comments suppressed due to low confidence (3)

spark/common/src/main/java/org/apache/sedona/core/spatialOperator/JoinQuery.java:830

  • Ensure that switching from 'indexedRDD' to 'spatialPartitionedRDD' here is intentional and that it does not adversely impact the join performance or expected partitioning behavior.
queryRDD.spatialPartitionedRDD.zipPartitions(objectRDD.spatialPartitionedRDD, judgement);

spark/common/src/main/java/org/apache/sedona/core/joinJudgement/KnnJoinIndexJudgement.java:48

  • The change in generic types alters the method signature; please verify that all downstream usages of KnnJoinIndexJudgement align with this new type ordering to prevent potential type mismatches.
implements FlatMapFunction2<Iterator<T>, Iterator<U>, Pair<T, U>>

spark/common/src/main/java/org/apache/sedona/core/spatialOperator/JoinQuery.java:786

  • The updated type for 'broadcastQueryObjects' improves type safety. Please confirm that all its usages are updated accordingly to handle 'UniqueGeometry' objects consistently.
final Broadcast<List<UniqueGeometry<U>>> broadcastQueryObjects;

@jiayuasu jiayuasu added this to the sedona-1.8.0 milestone Apr 10, 2025
@jiayuasu jiayuasu merged commit 1fd3b86 into apache:master Apr 10, 2025
39 checks passed
jiayuasu pushed a commit that referenced this pull request May 30, 2025
…e refactoring (#1909)

* [SEDONA-690] Set default metric to use Haversine for KNN join and some code refactor

* fix unit tests

* clean up join params
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants