Add extractMatches() to RegExUtils#1382
Conversation
There was a problem hiding this comment.
I'm not sure how this one. This is a pretty narrow use case. Some people will want a List, others an array; some will want String input, others will say it should be a CharSequence. What should happen if the string is null? What should happen if the pattern is null? Should an empty list (or array) be returned? A null? Should the List be immutable? Can this even be reused with Lang?
Supports both String and CharSequence Provides both List<String> and String[] return Added regex-string variant All returned collections are unmodifiable Never returns null (empty collections for null) Improved documentation
|
TY for your feedback! I implemented the following improvements: Now throws IllegalArgumentException with clear messages for null I will add tests later if you find this class worthwhile |
ppkarwasz
left a comment
There was a problem hiding this comment.
Hi @komatzu1337,
Thanks for the PR, could you add some unit tests please?
Note also that you enabled "Vigilant mode" on your GitHub account, but you didn't add the GPG key you are using to your GitHub account. For this reason all your commits appear as "Unverified". Can you fix that?
|
I'm still not convinced this belongs here. |
@komatzu1337, I appreciate the effort you've put into this contribution. I also have some reservations about the utility of Just as a reminder, per ASF guidelines, all maintainers need to reach consensus before any changes are merged. |
why? |
hi
i'll figure it out, thanks |
class MatchFinder was removed
now returns empty List if pattern or text null
{} added on if, so RegExUtils will pass checks
|
Regarding the utility of this method, with the help of Copilot, I found some projects that have a similar method that returns
|
When adding a new method to Commons Lang, we need to consider the impact on the entire ecosystem. Even small additions increase the library's size for thousands of downstream projects. Given the wide range of potential utility methods, we prioritize those likely to benefit a significant portion of users—let's say, at least 1%. While your proposed method may be useful in some contexts, it might not offer enough broad utility to justify inclusion in the core library. |
tests added public List<String> findMatches(String text, Pattern pattern) is removed
tests added public List<String> findMatches(String text, Pattern pattern) is removed
ppkarwasz
left a comment
There was a problem hiding this comment.
As noted earlier, we're still not convinced that even a single findAll method is broadly useful. Including three separate implementations feels like more than we can justify at this point. Could you please limit the pull request to the findAll(CharSequence, Pattern) method, and optionally a second variant that delegates to it?
| * @param pattern the pattern to search for | ||
| * @return an unmodifiable list of found matches; returns an empty List if text or pattern is null | ||
| */ | ||
| public List<String> findMatches(CharSequence text, Pattern pattern) { |
There was a problem hiding this comment.
| public List<String> findMatches(CharSequence text, Pattern pattern) { | |
| public static List<String> findAll(CharSequence text, Pattern pattern) { |
This is an utility class, so all the methods must be static.
Nitpick: Naming is certainly on of the hardest tasks. Since all the methods in this class deal with matching and have names like replaceAll, removeAll, I would call this method findAll.
| * @param regex the regular expression pattern | ||
| * @return an array of found matches; returns an empty array if text or regex is null | ||
| */ | ||
| public String[] findMatchesAsArray(CharSequence text, String regex) { |
There was a problem hiding this comment.
Looking at the remaining methods of this class, each method with signature (CharSequence, Pattern) has a corresponding method with signature (String, String). Therefore I would define a method:
public static List<String> findAll(String text, String regex)I don't think it is useful to have both a method returning List<String> and String[]. The return type should be coherent.
| * @param pattern the pattern to search for | ||
| * @return a modifiable ArrayList of found matches; returns an empty List if text or pattern is null | ||
| */ | ||
| public List<String> findMatchesModifiable(CharSequence text, Pattern pattern){ |
There was a problem hiding this comment.
-1
We should not start multiplying methods.
| @BeforeEach | ||
| void setUp() { | ||
| utils = new RegExUtils(); | ||
| } |
There was a problem hiding this comment.
Utility classes are never instantiated.
| } | ||
|
|
||
| @Test | ||
| public void testFindMultipleMatches() { |
There was a problem hiding this comment.
Can you use parameterized logging instead of multiple methods with the same logic?
garydgregory
left a comment
There was a problem hiding this comment.
There are so many possibilities of combining Pattern, Matcher, and MatchResult that I don't see why this specific use case needs to be here. For example, where would this be reused in Commons Lang or from other Commons libraries?
Good point! @komatzu1337, could you share a few concrete examples where this method would be especially useful—either in real-world projects or potential usage within Commons libraries? The current unit tests illustrate some interesting scenarios, but they can be addressed using existing utilities:
Understanding your intended use cases would help us evaluate whether this method brings enough added value and general applicability to warrant inclusion here. |
|
@ppkarwasz @garydgregory Thank you for your patience and valuable feedback.After considering your feedback, i agree that this method doesn’t align well with Commons Lang’s scope. I apologize for the time spent reviewing this proposal - in hindsight, I should have more thoroughly vetted the use case before submitting it. This experience has helped me better understand Commons Lang's design philosophy. I will be sure to:
|
Provides simple way to get all regex matches as list without manual Matcher handling.