GROOVY-11649: add partitionPoint to Groovy Array and List(missed in java and groovy) by dongwq · Pull Request #2210 · apache/groovy

dongwq · 2025-05-03T12:45:39Z

partition_point missed in java and groovy like in

paulk-asert · 2025-05-03T23:23:38Z

I created this issue to track this potential enhancement:
https://issues.apache.org/jira/browse/GROOVY-11649

codecov-commenter · 2025-05-03T23:48:03Z

Codecov Report

❌ Patch coverage is 92.85714% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.9784%. Comparing base (71ca3e0) to head (826bc98).
⚠️ Report is 763 commits behind head on master.

Files with missing lines	Patch %	Lines
...rg/codehaus/groovy/runtime/ArrayGroovyMethods.java	92.8571%	0 Missing and 7 partials ⚠️
.../codehaus/groovy/runtime/DefaultGroovyMethods.java	92.8571%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@                Coverage Diff                 @@
##               master      #2210        +/-   ##
==================================================
+ Coverage     68.9576%   68.9784%   +0.0208%     
- Complexity      29591      29625        +34     
==================================================
  Files            1423       1423                
  Lines          114102     114217       +115     
  Branches        19793      19818        +25     
==================================================
+ Hits            78682      78785       +103     
- Misses          28800      28803         +3     
- Partials         6620       6629         +9

Files with missing lines	Coverage Δ
.../codehaus/groovy/runtime/DefaultGroovyMethods.java	`74.4435% <92.8571%> (+0.0768%)`	⬆️
...rg/codehaus/groovy/runtime/ArrayGroovyMethods.java	`88.7597% <92.8571%> (+0.2849%)`	⬆️

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

paulk-asert · 2025-05-04T00:33:59Z

This is an interesting proposal.

One immediate point of feedback is I would change the IntRange semantics to follow normal Groovy idioms. I know that means it might be slightly different to rust and other languages, but it follows what we do for all other Groovy methods. So, basically you'd get rid of the "- 1" when accessing getToInt() and change your examples from "0..arr.size()" to "0..<arr.size()" or "0..-1". Groovy let's you pick between (l, r), [l, r), [l, r] etc by using 0<..<4, 1..<4, 1..3 etc. I also haven't checked yet whether reverse ranges are handled properly, but perhaps you've already considered that.

Also, Java doesn't have a sorted list but offers Collections.sort(). Groovy doesn't have partition (currently) but does have groupBy. So, I think it makes sense to discuss in more detail how this would typically be used. Is it less useful than in other languages given the JDK collection semantics or is other functionality needed (like a partition extension method) to reap the full benefits. I think discussing such things is best on the mailing list or the issue.

…ew, method name to binaryPartitionPoint

dongwq · 2025-05-04T08:15:09Z

This is an interesting proposal.

One immediate point of feedback is I would change the IntRange semantics to follow normal Groovy idioms. I know that means it might be slightly different to rust and other languages, but it follows what we do for all other Groovy methods. So, basically you'd get rid of the "- 1" when accessing getToInt() and change your examples from "0..arr.size()" to "0..<arr.size()" or "0..-1". Groovy let's you pick between (l, r), [l, r), [l, r] etc by using 0<..<4, 1..<4, 1..3 etc. I also haven't checked yet whether reverse ranges are handled properly, but perhaps you've already considered that.

Also, Java doesn't have a sorted list but offers Collections.sort(). Groovy doesn't have partition (currently) but does have groupBy. So, I think it makes sense to discuss in more detail how this would typically be used. Is it less useful than in other languages given the JDK collection semantics or is other functionality needed (like a partition extension method) to reap the full benefits. I think discussing such things is best on the mailing list or the issue.

@paulk-asert Thank you

the range arg: have adjusted to normal Groovy idioms.
reverse ranges: with an assert check
the usage: here partition_point can be view as a modern/general vision api of binary search with the help of closure/lambda. old binarySearch in java , lack of the ability to process these cases:
1. directly provide the postion for insert like: lowerBound/uppperBound
2. the result of ordered arr/list with one val occured more than once is uncertain.

after all, so I change the method name to binaryPartitionPoint to be more precise.

paulk-asert · 2025-05-05T10:35:18Z

I actually think i prefer just partitionPoint to match Rust and C++, but let me put my thoughts into GROOVY-11649 because I am not sure all of the variants are equally compelling.

paulk-asert · 2025-05-05T10:38:14Z

Also, was there anything in particular you used from AbstractList. Wondering why not to use the more general List interface?

dongwq · 2025-05-05T16:10:43Z

Also, was there anything in particular you used from AbstractList. Wondering why not to use the more general List interface?

cause partition_point is O(logn), suppose that element.get(i) is O(1). so here we need a "random access" data store (such as an array).

the java doc says for AbstractList:

This class provides a skeletal implementation of the {@link List}
interface to minimize the effort required to implement this interface
backed by a "random access" data store (such as an array)

though there no actual limit in java collection framework.

dongwq · 2025-05-05T16:12:30Z

I actually think i prefer just partitionPoint to match Rust and C++, but let me put my thoughts into GROOVY-11649 because I am not sure all of the variants are equally compelling.

partitionPoint is ok for me.

dongwq · 2025-05-05T16:14:35Z

what about the usage, my q: why rust vec.partition_point is useful if there is binary_search already?
and copilot give below:

Let me help you understand the difference between partition_point and binary_search in Rust's Vec implementation. I'll use semantic code search to find relevant implementations and examples in the Rust codebase.

Let me explain why partition_point is useful even though binary_search exists:

Different Return Types and Use Cases:
- binary_search returns a Result<usize, usize>:
  - Ok(index) if the element is found
  - Err(index) where the element should be inserted to maintain sorting
- partition_point returns a single usize that represents the first position where the predicate returns false
Finding Ranges:
One of the most important use cases for partition_point is finding ranges of equal elements in a sorted collection. For example, if you have multiple equal elements:

let s = [0, 1, 1, 1, 1, 2, 3, 5, 8];

// To find the range of all 1s:
let low = s.partition_point(|x| x < &1);   // Returns 1
let high = s.partition_point(|x| x <= &1); // Returns 5

// Now s[low..high] contains all the 1s
assert_eq!(&s[low..high], &[1, 1, 1, 1]);

More Flexible Predicates:
- partition_point works with any predicate that divides the slice into true/false regions
- binary_search is specifically for finding exact matches or insertion points
Cleaner Code for Insertion:
- When you want to insert an element while maintaining sort order, partition_point gives you the position directly
- With binary_search you need to handle both Ok and Err cases, even though you only care about the position
Implementation Detail:
- Interestingly, binary_search is actually implemented in terms of partition_point internally in Rust, showing how fundamental the partition operation is

Here's a practical example showing the difference:

let vec = vec![1, 2, 2, 2, 3, 4];

// Using binary_search to find a 2:
match vec.binary_search(&2) {
    Ok(idx) => println!("Found a 2 at {}", idx), // Will find *some* 2, but which one?
    Err(_) => println!("Not found")
}

// Using partition_point to find ALL 2s:
let start = vec.partition_point(|x| x < &2);
let end = vec.partition_point(|x| x <= &2);
println!("All 2s are in positions {}..{}", start, end); // Gives you the exact range

So while both functions are useful, partition_point is particularly valuable when you need to:

Find ranges of equal elements
Work with more flexible predicates
Get clean insertion points
Need a simpler API that doesn't require match statements

paulk-asert · 2025-05-05T21:32:20Z

I was thinking a bit more about usage overnight.

For arrays, the "clean insertion" point argument is less compelling but somewhat useful none-the-less.

Given that it isn't easy to partition primitive arrays, I think partitionPoint is slightly less compelling. We could get by with both bisectLeft and bisectRight for instance. But even for natural ordering scenarios, the convenience of a single method is nicer than the two separate methods in my view.

I looked at the grades example from Python bisect: https://docs.python.org/3/library/bisect.html#examples

def scores = [33, 99, 77, 70, 89, 90, 100] // convert these to A grade (90 and above),
                                           // B grade (80 up to but not including 90), etc.

With binary search we have something like this:

def grade(score) {
  int[] breakpoints = [59, 69, 79, 89]
  def idx = Arrays.binarySearch(breakpoints, score)
  if (idx < 0) {
    idx = (idx + 1).abs()
  }
  'FDCBA'[idx]
}
assert scores.collect(this.&grade) == ['F', 'A', 'C', 'C', 'B', 'A', 'A']

We have to artificially subtract one from each of the breakpoints compared to the Python bisect example.

With partitionPoint it would become:

int[] breakpoints = [60, 70, 80, 90]
assert scores.collect { sc ->
    'FDCBA'[breakpoints.partitionPoint{ it <= sc }]
} == ['F', 'A', 'C', 'C', 'B', 'A', 'A']

This is much cleaner.

paulk-asert · 2025-05-05T21:43:11Z

Also, was there anything in particular you used from AbstractList. Wondering why not to use the more general List interface?

cause partition_point is O(logn), suppose that element.get(i) is O(1). so here we need a "random access" data store (such as an array).

the java doc says for AbstractList:

This class provides a skeletal implementation of the {@link List}

interface to minimize the effort required to implement this interface

backed by a "random access" data store (such as an array)

though there no actual limit in java collection framework.

In general, we would use List here. That makes it work for all lists. The default is ArrayList which should have good performance but if the user wishes to supply some other kind of list, they will be responsible for ensuring they pick a list matching their performance requirements.

paulk-asert · 2025-05-05T21:48:26Z

Can you rename back to partitionPoint, and use List instead of AbstractList. I am still pondering the benefits of Predicate vs Closure for the DGM List and AGM T[] variants. Did you make them different for a reason?

…artitionPoint, prefer Predicate<T> to Closure<?>, DGM prefer List to AbstractList

dongwq · 2025-05-06T00:37:31Z

Can you rename back to partitionPoint, and use List instead of AbstractList. I am still pondering the benefits of Predicate vs Closure for the DGM List and AGM T[] variants. Did you make them different for a reason?

@paulk-asert ok, 3 have done:

revert method name to partitionPoint,
prefer Predicate to Closure<?>,
DGM prefer List to AbstractList

paulk-asert · 2025-05-06T00:41:30Z

I am not sure if "prefer Predicate to Closure<?>" is correct yet. I think that will be the case for primitive arrays, still thinking about List and T[]. It might be best to align those with how we'd sort those two things but I am just thinking through the cases whether it turns out to be the same. I'll writer another comment in a few hours.

paulk-asert · 2025-05-06T01:27:36Z

Predicate looks fine in all cases.

paulk-asert · 2025-05-06T01:28:30Z

Merged! Thanks!

paulk-asert · 2025-05-06T01:30:35Z

Apologies, I meant to squash when merging.

dongwq added 2 commits May 3, 2025 20:09

add partitionPoint to Groovy Array and List

3d050cb

add partitionPoint to Groovy Array and List - fix comment

8926950

dongwq marked this pull request as draft May 3, 2025 13:23

dongwq changed the title ~~add partitionPoint to Groovy Array and List~~ add partitionPoint to Groovy Array and List(missed in java and groovy) May 3, 2025

paulk-asert reviewed May 4, 2025

View reviewed changes

Comment thread src/main/java/org/codehaus/groovy/runtime/ArrayGroovyMethods.java Outdated

add partitionPoint to Groovy Array and List - refactor after 1st revi…

ec8fbf5

…ew, method name to binaryPartitionPoint

dongwq changed the title ~~add partitionPoint to Groovy Array and List(missed in java and groovy)~~ GROOVY-11649: add binaryPartitionPoint to Groovy Array and List(missed in java and groovy) May 4, 2025

paulk-asert self-assigned this May 6, 2025

add partitionPoint to Groovy Array and List - revert method name to p…

826bc98

…artitionPoint, prefer Predicate<T> to Closure<?>, DGM prefer List to AbstractList

dongwq changed the title ~~GROOVY-11649: add binaryPartitionPoint to Groovy Array and List(missed in java and groovy)~~ GROOVY-11649: add partitionPoint to Groovy Array and List(missed in java and groovy) May 6, 2025

dongwq marked this pull request as ready for review May 6, 2025 00:37

paulk-asert merged commit 0c48fa2 into apache:master May 6, 2025
21 checks passed

Conversation

dongwq commented May 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paulk-asert commented May 3, 2025

Uh oh!

codecov-commenter commented May 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

paulk-asert commented May 4, 2025

Uh oh!

Uh oh!

dongwq commented May 4, 2025

Uh oh!

paulk-asert commented May 5, 2025

Uh oh!

paulk-asert commented May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongwq commented May 5, 2025

Uh oh!

dongwq commented May 5, 2025

Uh oh!

dongwq commented May 5, 2025

Uh oh!

paulk-asert commented May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paulk-asert commented May 5, 2025

Uh oh!

paulk-asert commented May 5, 2025

Uh oh!

dongwq commented May 6, 2025

Uh oh!

paulk-asert commented May 6, 2025

Uh oh!

paulk-asert commented May 6, 2025

Uh oh!

Uh oh!

paulk-asert commented May 6, 2025

Uh oh!

paulk-asert commented May 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dongwq commented May 3, 2025 •

edited

Loading

codecov-commenter commented May 3, 2025 •

edited

Loading

paulk-asert commented May 5, 2025 •

edited

Loading

paulk-asert commented May 5, 2025 •

edited

Loading