Skip to content

Add extractMatches() to RegExUtils#1382

Closed
komatzu1337 wants to merge 6 commits intoapache:masterfrom
komatzu1337:add-matchfinder
Closed

Add extractMatches() to RegExUtils#1382
komatzu1337 wants to merge 6 commits intoapache:masterfrom
komatzu1337:add-matchfinder

Conversation

@komatzu1337
Copy link
Copy Markdown

Provides simple way to get all regex matches as list without manual Matcher handling.

Copy link
Copy Markdown
Member

@garydgregory garydgregory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how this one. This is a pretty narrow use case. Some people will want a List, others an array; some will want String input, others will say it should be a CharSequence. What should happen if the string is null? What should happen if the pattern is null? Should an empty list (or array) be returned? A null? Should the List be immutable? Can this even be reused with Lang?

Supports both String and CharSequence
Provides both List<String> and String[] return
Added regex-string variant
All returned collections are unmodifiable
Never returns null (empty collections for null)
Improved documentation
@komatzu1337
Copy link
Copy Markdown
Author

komatzu1337 commented May 15, 2025

TY for your feedback! I implemented the following improvements:

Now throws IllegalArgumentException with clear messages for null
Supports both String and CharSequence
Provides both List and String[] return
Added regex-string variant
All returned collections are unmodifiable
Never returns null (empty collections for null)
Improved documentation

I will add tests later if you find this class worthwhile

Copy link
Copy Markdown
Member

@ppkarwasz ppkarwasz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @komatzu1337,

Thanks for the PR, could you add some unit tests please?

Note also that you enabled "Vigilant mode" on your GitHub account, but you didn't add the GPG key you are using to your GitHub account. For this reason all your commits appear as "Unverified". Can you fix that?

Comment thread src/main/java/org/apache/commons/lang3/RegExUtils.java Outdated
Comment thread src/main/java/org/apache/commons/lang3/RegExUtils.java Outdated
Comment thread src/main/java/org/apache/commons/lang3/RegExUtils.java Outdated
@garydgregory
Copy link
Copy Markdown
Member

I'm still not convinced this belongs here.

@ppkarwasz
Copy link
Copy Markdown
Member

I'm still not convinced this belongs here.

@komatzu1337, I appreciate the effort you've put into this contribution. I also have some reservations about the utility of findMatches, so please view my comments as suggestions for improvement rather than a final decision on acceptance.

Just as a reminder, per ASF guidelines, all maintainers need to reach consensus before any changes are merged.

@komatzu1337
Copy link
Copy Markdown
Author

I'm still not convinced this belongs here.

why?

@komatzu1337
Copy link
Copy Markdown
Author

Hi @komatzu1337,

Thanks for the PR, could you add some unit tests please?

hi
i will add unit tests later today

Note also that you enabled "Vigilant mode" on your GitHub account, but you didn't add the GPG key you are using to your GitHub account. For this reason all your commits appear as "Unverified". Can you fix that?

i'll figure it out, thanks

class MatchFinder was removed
now returns empty List if pattern or text null
{} added on if, so RegExUtils will pass checks
@ppkarwasz
Copy link
Copy Markdown
Member

Regarding the utility of this method, with the help of Copilot, I found some projects that have a similar method that returns List<String>:

@ppkarwasz
Copy link
Copy Markdown
Member

ppkarwasz commented May 16, 2025

I'm still not convinced this belongs here.

why?

When adding a new method to Commons Lang, we need to consider the impact on the entire ecosystem. Even small additions increase the library's size for thousands of downstream projects. Given the wide range of potential utility methods, we prioritize those likely to benefit a significant portion of users—let's say, at least 1%.

While your proposed method may be useful in some contexts, it might not offer enough broad utility to justify inclusion in the core library.

tests added
public List<String> findMatches(String text, Pattern pattern) is removed
tests added
public List<String> findMatches(String text, Pattern pattern) is removed
@komatzu1337 komatzu1337 requested a review from ppkarwasz May 17, 2025 18:19
Copy link
Copy Markdown
Member

@ppkarwasz ppkarwasz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@komatzu1337,

As noted earlier, we're still not convinced that even a single findAll method is broadly useful. Including three separate implementations feels like more than we can justify at this point. Could you please limit the pull request to the findAll(CharSequence, Pattern) method, and optionally a second variant that delegates to it?

* @param pattern the pattern to search for
* @return an unmodifiable list of found matches; returns an empty List if text or pattern is null
*/
public List<String> findMatches(CharSequence text, Pattern pattern) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public List<String> findMatches(CharSequence text, Pattern pattern) {
public static List<String> findAll(CharSequence text, Pattern pattern) {

This is an utility class, so all the methods must be static.

Nitpick: Naming is certainly on of the hardest tasks. Since all the methods in this class deal with matching and have names like replaceAll, removeAll, I would call this method findAll.

* @param regex the regular expression pattern
* @return an array of found matches; returns an empty array if text or regex is null
*/
public String[] findMatchesAsArray(CharSequence text, String regex) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the remaining methods of this class, each method with signature (CharSequence, Pattern) has a corresponding method with signature (String, String). Therefore I would define a method:

public static List<String> findAll(String text, String regex)

I don't think it is useful to have both a method returning List<String> and String[]. The return type should be coherent.

* @param pattern the pattern to search for
* @return a modifiable ArrayList of found matches; returns an empty List if text or pattern is null
*/
public List<String> findMatchesModifiable(CharSequence text, Pattern pattern){
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1

We should not start multiplying methods.

Comment on lines +386 to +389
@BeforeEach
void setUp() {
utils = new RegExUtils();
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Utility classes are never instantiated.

}

@Test
public void testFindMultipleMatches() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use parameterized logging instead of multiple methods with the same logic?

Copy link
Copy Markdown
Member

@garydgregory garydgregory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are so many possibilities of combining Pattern, Matcher, and MatchResult that I don't see why this specific use case needs to be here. For example, where would this be reused in Commons Lang or from other Commons libraries?

@ppkarwasz
Copy link
Copy Markdown
Member

There are so many possibilities of combining Pattern, Matcher, and MatchResult that I don't see why this specific use case needs to be here. For example, where would this be reused in Commons Lang or from other Commons libraries?

Good point! @komatzu1337, could you share a few concrete examples where this method would be especially useful—either in real-world projects or potential usage within Commons libraries?

The current unit tests illustrate some interesting scenarios, but they can be addressed using existing utilities:

  • Splitting a sentence into words can be done with StringUtils.split.
  • To extract numbers from a sentence, you could split into words and then filter those that are numeric.

Understanding your intended use cases would help us evaluate whether this method brings enough added value and general applicability to warrant inclusion here.

@komatzu1337
Copy link
Copy Markdown
Author

@ppkarwasz @garydgregory Thank you for your patience and valuable feedback.After considering your feedback, i agree that this method doesn’t align well with Commons Lang’s scope. I apologize for the time spent reviewing this proposal - in hindsight, I should have more thoroughly vetted the use case before submitting it.

This experience has helped me better understand Commons Lang's design philosophy. I will be sure to:

  1. Review the contribution guidelines more carefully next time
  2. Get feedback in advance through discussion questions for future changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants