Skip to content

GROOVY-11649: add partitionPoint to Groovy Array and List(missed in java and groovy)#2210

Merged
paulk-asert merged 4 commits into
apache:masterfrom
dongwq:master
May 6, 2025
Merged

GROOVY-11649: add partitionPoint to Groovy Array and List(missed in java and groovy)#2210
paulk-asert merged 4 commits into
apache:masterfrom
dongwq:master

Conversation

@dongwq
Copy link
Copy Markdown
Contributor

@dongwq dongwq commented May 3, 2025

@dongwq dongwq marked this pull request as draft May 3, 2025 13:23
@dongwq dongwq changed the title add partitionPoint to Groovy Array and List add partitionPoint to Groovy Array and List(missed in java and groovy) May 3, 2025
@paulk-asert
Copy link
Copy Markdown
Contributor

I created this issue to track this potential enhancement:
https://issues.apache.org/jira/browse/GROOVY-11649

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 3, 2025

Codecov Report

❌ Patch coverage is 92.85714% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.9784%. Comparing base (71ca3e0) to head (826bc98).
⚠️ Report is 763 commits behind head on master.

Files with missing lines Patch % Lines
...rg/codehaus/groovy/runtime/ArrayGroovyMethods.java 92.8571% 0 Missing and 7 partials ⚠️
.../codehaus/groovy/runtime/DefaultGroovyMethods.java 92.8571% 0 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@                Coverage Diff                 @@
##               master      #2210        +/-   ##
==================================================
+ Coverage     68.9576%   68.9784%   +0.0208%     
- Complexity      29591      29625        +34     
==================================================
  Files            1423       1423                
  Lines          114102     114217       +115     
  Branches        19793      19818        +25     
==================================================
+ Hits            78682      78785       +103     
- Misses          28800      28803         +3     
- Partials         6620       6629         +9     
Files with missing lines Coverage Δ
.../codehaus/groovy/runtime/DefaultGroovyMethods.java 74.4435% <92.8571%> (+0.0768%) ⬆️
...rg/codehaus/groovy/runtime/ArrayGroovyMethods.java 88.7597% <92.8571%> (+0.2849%) ⬆️

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@paulk-asert
Copy link
Copy Markdown
Contributor

This is an interesting proposal.

One immediate point of feedback is I would change the IntRange semantics to follow normal Groovy idioms. I know that means it might be slightly different to rust and other languages, but it follows what we do for all other Groovy methods. So, basically you'd get rid of the "- 1" when accessing getToInt() and change your examples from "0..arr.size()" to "0..<arr.size()" or "0..-1". Groovy let's you pick between (l, r), [l, r), [l, r] etc by using 0<..<4, 1..<4, 1..3 etc. I also haven't checked yet whether reverse ranges are handled properly, but perhaps you've already considered that.

Also, Java doesn't have a sorted list but offers Collections.sort(). Groovy doesn't have partition (currently) but does have groupBy. So, I think it makes sense to discuss in more detail how this would typically be used. Is it less useful than in other languages given the JDK collection semantics or is other functionality needed (like a partition extension method) to reap the full benefits. I think discussing such things is best on the mailing list or the issue.

Comment thread src/main/java/org/codehaus/groovy/runtime/ArrayGroovyMethods.java Outdated
@dongwq
Copy link
Copy Markdown
Contributor Author

dongwq commented May 4, 2025

This is an interesting proposal.

One immediate point of feedback is I would change the IntRange semantics to follow normal Groovy idioms. I know that means it might be slightly different to rust and other languages, but it follows what we do for all other Groovy methods. So, basically you'd get rid of the "- 1" when accessing getToInt() and change your examples from "0..arr.size()" to "0..<arr.size()" or "0..-1". Groovy let's you pick between (l, r), [l, r), [l, r] etc by using 0<..<4, 1..<4, 1..3 etc. I also haven't checked yet whether reverse ranges are handled properly, but perhaps you've already considered that.

Also, Java doesn't have a sorted list but offers Collections.sort(). Groovy doesn't have partition (currently) but does have groupBy. So, I think it makes sense to discuss in more detail how this would typically be used. Is it less useful than in other languages given the JDK collection semantics or is other functionality needed (like a partition extension method) to reap the full benefits. I think discussing such things is best on the mailing list or the issue.

@paulk-asert Thank you

  • the range arg: have adjusted to normal Groovy idioms.
  • reverse ranges: with an assert check
  • the usage: here partition_point can be view as a modern/general vision api of binary search with the help of closure/lambda. old binarySearch in java , lack of the ability to process these cases:
    1. directly provide the postion for insert like: lowerBound/uppperBound
    2. the result of ordered arr/list with one val occured more than once is uncertain.

after all, so I change the method name to binaryPartitionPoint to be more precise.

@dongwq dongwq changed the title add partitionPoint to Groovy Array and List(missed in java and groovy) GROOVY-11649: add binaryPartitionPoint to Groovy Array and List(missed in java and groovy) May 4, 2025
@paulk-asert
Copy link
Copy Markdown
Contributor

I actually think i prefer just partitionPoint to match Rust and C++, but let me put my thoughts into GROOVY-11649 because I am not sure all of the variants are equally compelling.

@paulk-asert
Copy link
Copy Markdown
Contributor

paulk-asert commented May 5, 2025

Also, was there anything in particular you used from AbstractList. Wondering why not to use the more general List interface?

@dongwq
Copy link
Copy Markdown
Contributor Author

dongwq commented May 5, 2025

Also, was there anything in particular you used from AbstractList. Wondering why not to use the more general List interface?

cause partition_point is O(logn), suppose that element.get(i) is O(1). so here we need a "random access" data store (such as an array).

the java doc says for AbstractList:

  • This class provides a skeletal implementation of the {@link List}
  • interface to minimize the effort required to implement this interface
  • backed by a "random access" data store (such as an array)

though there no actual limit in java collection framework.

@dongwq
Copy link
Copy Markdown
Contributor Author

dongwq commented May 5, 2025

I actually think i prefer just partitionPoint to match Rust and C++, but let me put my thoughts into GROOVY-11649 because I am not sure all of the variants are equally compelling.

partitionPoint is ok for me.

@dongwq
Copy link
Copy Markdown
Contributor Author

dongwq commented May 5, 2025

what about the usage, my q: why rust vec.partition_point is useful if there is binary_search already?
and copilot give below:

Let me help you understand the difference between partition_point and binary_search in Rust's Vec implementation. I'll use semantic code search to find relevant implementations and examples in the Rust codebase.

Let me explain why partition_point is useful even though binary_search exists:

  1. Different Return Types and Use Cases:

    • binary_search returns a Result<usize, usize>:
      • Ok(index) if the element is found
      • Err(index) where the element should be inserted to maintain sorting
    • partition_point returns a single usize that represents the first position where the predicate returns false
  2. Finding Ranges:
    One of the most important use cases for partition_point is finding ranges of equal elements in a sorted collection. For example, if you have multiple equal elements:

let s = [0, 1, 1, 1, 1, 2, 3, 5, 8];

// To find the range of all 1s:
let low = s.partition_point(|x| x < &1);   // Returns 1
let high = s.partition_point(|x| x <= &1); // Returns 5

// Now s[low..high] contains all the 1s
assert_eq!(&s[low..high], &[1, 1, 1, 1]);
  1. More Flexible Predicates:

    • partition_point works with any predicate that divides the slice into true/false regions
    • binary_search is specifically for finding exact matches or insertion points
  2. Cleaner Code for Insertion:

    • When you want to insert an element while maintaining sort order, partition_point gives you the position directly
    • With binary_search you need to handle both Ok and Err cases, even though you only care about the position
  3. Implementation Detail:

    • Interestingly, binary_search is actually implemented in terms of partition_point internally in Rust, showing how fundamental the partition operation is

Here's a practical example showing the difference:

let vec = vec![1, 2, 2, 2, 3, 4];

// Using binary_search to find a 2:
match vec.binary_search(&2) {
    Ok(idx) => println!("Found a 2 at {}", idx), // Will find *some* 2, but which one?
    Err(_) => println!("Not found")
}

// Using partition_point to find ALL 2s:
let start = vec.partition_point(|x| x < &2);
let end = vec.partition_point(|x| x <= &2);
println!("All 2s are in positions {}..{}", start, end); // Gives you the exact range

So while both functions are useful, partition_point is particularly valuable when you need to:

  1. Find ranges of equal elements
  2. Work with more flexible predicates
  3. Get clean insertion points
  4. Need a simpler API that doesn't require match statements

@paulk-asert
Copy link
Copy Markdown
Contributor

paulk-asert commented May 5, 2025

I was thinking a bit more about usage overnight.

For arrays, the "clean insertion" point argument is less compelling but somewhat useful none-the-less.

Given that it isn't easy to partition primitive arrays, I think partitionPoint is slightly less compelling. We could get by with both bisectLeft and bisectRight for instance. But even for natural ordering scenarios, the convenience of a single method is nicer than the two separate methods in my view.

I looked at the grades example from Python bisect: https://docs.python.org/3/library/bisect.html#examples

def scores = [33, 99, 77, 70, 89, 90, 100] // convert these to A grade (90 and above),
                                           // B grade (80 up to but not including 90), etc.

With binary search we have something like this:

def grade(score) {
  int[] breakpoints = [59, 69, 79, 89]
  def idx = Arrays.binarySearch(breakpoints, score)
  if (idx < 0) {
    idx = (idx + 1).abs()
  }
  'FDCBA'[idx]
}
assert scores.collect(this.&grade) == ['F', 'A', 'C', 'C', 'B', 'A', 'A']

We have to artificially subtract one from each of the breakpoints compared to the Python bisect example.

With partitionPoint it would become:

int[] breakpoints = [60, 70, 80, 90]
assert scores.collect { sc ->
    'FDCBA'[breakpoints.partitionPoint{ it <= sc }]
} == ['F', 'A', 'C', 'C', 'B', 'A', 'A']

This is much cleaner.

@paulk-asert
Copy link
Copy Markdown
Contributor

Also, was there anything in particular you used from AbstractList. Wondering why not to use the more general List interface?

cause partition_point is O(logn), suppose that element.get(i) is O(1). so here we need a "random access" data store (such as an array).

the java doc says for AbstractList:

  • This class provides a skeletal implementation of the {@link List}
  • interface to minimize the effort required to implement this interface
  • backed by a "random access" data store (such as an array)

though there no actual limit in java collection framework.

In general, we would use List here. That makes it work for all lists. The default is ArrayList which should have good performance but if the user wishes to supply some other kind of list, they will be responsible for ensuring they pick a list matching their performance requirements.

@paulk-asert
Copy link
Copy Markdown
Contributor

Can you rename back to partitionPoint, and use List instead of AbstractList. I am still pondering the benefits of Predicate vs Closure for the DGM List and AGM T[] variants. Did you make them different for a reason?

@paulk-asert paulk-asert self-assigned this May 6, 2025
…artitionPoint, prefer Predicate<T> to Closure<?>, DGM prefer List to AbstractList
@dongwq dongwq changed the title GROOVY-11649: add binaryPartitionPoint to Groovy Array and List(missed in java and groovy) GROOVY-11649: add partitionPoint to Groovy Array and List(missed in java and groovy) May 6, 2025
@dongwq
Copy link
Copy Markdown
Contributor Author

dongwq commented May 6, 2025

Can you rename back to partitionPoint, and use List instead of AbstractList. I am still pondering the benefits of Predicate vs Closure for the DGM List and AGM T[] variants. Did you make them different for a reason?

@paulk-asert ok, 3 have done:

  • revert method name to partitionPoint,
  • prefer Predicate to Closure<?>,
  • DGM prefer List to AbstractList

@dongwq dongwq marked this pull request as ready for review May 6, 2025 00:37
@paulk-asert
Copy link
Copy Markdown
Contributor

I am not sure if "prefer Predicate to Closure<?>" is correct yet. I think that will be the case for primitive arrays, still thinking about List and T[]. It might be best to align those with how we'd sort those two things but I am just thinking through the cases whether it turns out to be the same. I'll writer another comment in a few hours.

@paulk-asert
Copy link
Copy Markdown
Contributor

Predicate looks fine in all cases.

@paulk-asert paulk-asert merged commit 0c48fa2 into apache:master May 6, 2025
21 checks passed
@paulk-asert
Copy link
Copy Markdown
Contributor

Merged! Thanks!

@paulk-asert
Copy link
Copy Markdown
Contributor

Apologies, I meant to squash when merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants