Skip to content

Fix Inefficient regular expression#1718

Merged
laeubi merged 1 commit intoeclipse-pde:masterfrom
laeubi:Inefficient_regular_expression
May 3, 2025
Merged

Fix Inefficient regular expression#1718
laeubi merged 1 commit intoeclipse-pde:masterfrom
laeubi:Inefficient_regular_expression

Conversation

@laeubi
Copy link
Copy Markdown
Contributor

@laeubi laeubi commented Apr 8, 2025

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

This replaces the problematic regular expression by a little parser that checks for the same results as the regular expression.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 8, 2025

Test Results

   285 files  ±0     285 suites  ±0   46m 12s ⏱️ - 4m 23s
 3 609 tests ±0   3 533 ✅ ±0   76 💤 ±0  0 ❌ ±0 
11 019 runs  ±0  10 788 ✅ ±0  231 💤 ±0  0 ❌ ±0 

Results for commit ed9afa1. ± Comparison against base commit ea1089d.

♻️ This comment has been updated with latest results.

Copy link
Copy Markdown
Member

@HannesWell HannesWell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change seems reasonable to me. Especially as the Pattern is currently very difficult to understand, it's probably easier to read the sequential code (therefore I haven't compared the logi ...).

I just have two nitpicks below.

// Determine whether the given attribute list matches the pattern
return matcher.find();
private static boolean isValidAttributeKeyChar(char c) {
return (c >= 'A' && c <= 'Z') || //
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't be something like Character.isUpperCase/isLowerCase/isDigit() etc. be more readable?

Suggested change
return (c >= 'A' && c <= 'Z') || //
return Character.isUpperCase(c) || //

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to replicate what the RegEx is doing here, Character.isUpperCase/isLowerCase/ and alike might include more characters as before and I haven really checked why we test this particular things.

Comment on lines +310 to +317
int length = key.length();
for (int i = 0; i < length; i++) {
char c = key.charAt(i);
if (!isValidAttributeKeyChar(c)) {
return false;
}
}
return true;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a stream-pipeline fit here?

Suggested change
int length = key.length();
for (int i = 0; i < length; i++) {
char c = key.charAt(i);
if (!isValidAttributeKeyChar(c)) {
return false;
}
}
return true;
return key.chars().allMatch(PDETextHelper::isValidAttributeKeyChar);

With the suggestions below one could maybe even inline the isValidAttributeKeyChar method?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about this part?

@laeubi laeubi force-pushed the Inefficient_regular_expression branch from 9b56aff to fb6e5c9 Compare April 22, 2025 07:54
Some regular expressions take a long time to match certain input strings
to the point where the time it takes to match a string of length n is
proportional to nk or even 2n. Such regular expressions can negatively
affect performance, or even allow a malicious user to perform a Denial
of Service ("DoS") attack by crafting an expensive input string for the
regular expression to match.

This replaces the problematic regular expression by a little parser that
checks for the same results as the regular expression.
@laeubi laeubi force-pushed the Inefficient_regular_expression branch from fb6e5c9 to ed9afa1 Compare May 3, 2025 04:49
@laeubi laeubi merged commit 98a5134 into eclipse-pde:master May 3, 2025
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants