Add a check for the leading and trailing spaces#2504
Add a check for the leading and trailing spaces#2504amieiro wants to merge 9 commits intoWordPress:developfrom
Conversation
jrfnl
left a comment
There was a problem hiding this comment.
Hi @amieiro, thanks for this PR.
I haven't tested it yet, but based on a code review alone, this is looking good.
I wonder if the tests can be made a little more varied ? I.e. not just use leading/trailing spaces, but also have some leading/trailing tabs, new lines and combinations of spaces, tabs, new lines.
What do you think ?
|
@amieiro Just checking in as you never responded to the question I posed in my initial review.... Do you still intend to finish off this PR ? |
|
Sorry, I don't have the time to finish it at this time. |
| array( $param_info['clean'] ) | ||
| ); | ||
| } | ||
|
|
| array( $param_info['clean'] ) | ||
| ); | ||
| } | ||
|
|
| array( $param_info['clean'] ) | ||
| ); | ||
| } | ||
|
|
| array( $param_info['clean'] ) | ||
| ); | ||
| } | ||
|
|
| array( $param_info['clean'] ) | ||
| ); | ||
| } | ||
|
|
| $pattern_trailing_spaces = '/[\x20]+$/u'; | ||
| $pattern_leading_tabs = '/^\x09+/u'; | ||
| $pattern_trailing_tabs = '/\x09+$/u'; | ||
| $pattern_leading_vtabs = '/^\x0B+/u'; |
There was a problem hiding this comment.
| $pattern_leading_vtabs = '/^\x0B+/u'; | |
| $pattern_leading_vtabs = '/^\x0B+/u'; |
| $pattern_leading_spaces = '/^[\x20]+/u'; | ||
| $pattern_trailing_spaces = '/[\x20]+$/u'; | ||
| $pattern_leading_tabs = '/^\x09+/u'; | ||
| $pattern_trailing_tabs = '/\x09+$/u'; |
There was a problem hiding this comment.
| $pattern_trailing_tabs = '/\x09+$/u'; | |
| $pattern_trailing_tabs = '/\x09+$/u'; |
| // Define regex patterns. | ||
| $pattern_leading_spaces = '/^[\x20]+/u'; | ||
| $pattern_trailing_spaces = '/[\x20]+$/u'; | ||
| $pattern_leading_tabs = '/^\x09+/u'; |
There was a problem hiding this comment.
| $pattern_leading_tabs = '/^\x09+/u'; | |
| $pattern_leading_tabs = '/^\x09+/u'; |
|
|
||
| // Define regex patterns. | ||
| $pattern_leading_spaces = '/^[\x20]+/u'; | ||
| $pattern_trailing_spaces = '/[\x20]+$/u'; |
There was a problem hiding this comment.
| $pattern_trailing_spaces = '/[\x20]+$/u'; | |
| $pattern_trailing_spaces. = '/[\x20]+$/u'; |
| $first_non_empty = $this->phpcsFile->findNext( Tokens::$emptyTokens, $param_info['start'], ( $param_info['end'] + 1 ), true ); | ||
|
|
||
| // Define regex patterns. | ||
| $pattern_leading_spaces = '/^[\x20]+/u'; |
There was a problem hiding this comment.
| $pattern_leading_spaces = '/^[\x20]+/u'; | |
| $pattern_leading_spaces = '/^[\x20]+/u'; |
|
@amieiro Seeing as you've been working on this again - could you please explain why you have set things up to have separate errors for spaces vs tabs vs vtabs vs newlines ? My previous question was only about having tests in place with these and the sniff handling all types of whitespace, not about splitting that up into multiple errors. I cannot think of any situation where one type of (leading/trailing) whitespace would be acceptable, but another would not be, so having different error codes doesn't make sense to me. Is there a reason for this distinction based on your experiences in the Polyglots team ? I'd be interested to hear your reasoning for this. |
|
@jrfnl Sorry, this PR is not yet ready for review and I forgot to convert it to draft. As we return to contribute with WordPress.org, I resumed work on this PR. The horizontal tabulations are not often seen at the beginning or end of sentences, but they are seen in the middle, so I wanted to prevent its use, since I don't think it is correct to use it. You can see a use case inside a string in the core here. The same applies to the new lines. You can see an example in the core here. The most common problem is the U+0020 space problem (in fact it is the only one I usually see), so the others are really edge situations. I can continue this PR in two different approaches:
Which approach do you prefer? |
Okay, thanks for letting me know. I've converted it to draft now and won't do an in-detail review until it has been marked as "Ready to review".
I think that this PR should probably focus on the original problem it was trying to solve - leading and trailing whitespace. Expanding the scope to mid-string whitespace characters feels like scope-creep and may have different implications (i.e., would involve a different decision point, potentially delaying the merging of the code related to the original problem). I suggest opening a follow-up issue for that to discuss if this needs flagging and if so, what would need flagging. (I mean, looking at the examples, these seem like perfectly valid use-cases for mid string vertical whitespace, especially if you also consider RTL and BTT languages).
For leading and trailing whitespace, IMO the new errors should flag all whitespace characters, but without differentiating in the error message or code.
Does that help ? |
|
@amieiro Just checking in - will you have time to finish this PR in the near future ? It's been over a year since you opened it and it's still not marked as ready for review, while the base premise of the PR seems like a valid and valuable addition, which would help WPCS users. |
|
Thanks for the feedback. I just pushed a refactor that addresses all of it. What changed:
Out of scope here, happy to open follow-up issues: mid-string whitespace and Unicode whitespace characters (U+00A0, U+2000, U+200B, etc.). |
|
@amieiro, thanks for updating this PR! Could you please rebase it to fix the conflicts? |
… two error codes - replace eight per-character error codes (Leading/Trailing x Spaces/Tabs/VTabs/NewLines) with two generic codes: LeadingWhiteSpace and TrailingWhiteSpace - reduce the eight hex-code regexes to a single `\s` match on each side (covers space, tab, vertical tab, newline, CR, form feed in one pass) - reword error messages to use the generic term "whitespace" - bump @SInCE annotation from 3.2.0 to 3.4.0 - add four combination test cases (leading tab+space, trailing space+tab, mixed leading/trailing, multi-line with newline+tab on each side) to verify strings with multiple whitespace types produce at most one leading and one trailing error
d03a56f to
0f6a099
Compare
Done! |
rodrigoprimo
left a comment
There was a problem hiding this comment.
Thanks for updating this PR, @amieiro. I confirmed that you addressed the points raised in previous reviews. Besides that, I also left some inline comments for us to discuss.
| $content_without_quotes = TextStrings::stripQuotes( $param_info['clean'] ); | ||
| $first_non_empty = $this->phpcsFile->findNext( Tokens::$emptyTokens, $param_info['start'], ( $param_info['end'] + 1 ), true ); | ||
|
|
||
| if ( preg_match( '/^\s+/u', $content_without_quotes ) ) { |
There was a problem hiding this comment.
Nitpick: I believe the + quantifier is redundant here and in the regex below. A single surrounding whitespace character is enough to trigger these errors.
| if ( preg_match( '/^\s+/u', $content_without_quotes ) ) { | |
| if ( preg_match( '/^\s/u', $content_without_quotes ) ) { |
| $content_without_quotes = TextStrings::stripQuotes( $param_info['clean'] ); | ||
| $first_non_empty = $this->phpcsFile->findNext( Tokens::$emptyTokens, $param_info['start'], ( $param_info['end'] + 1 ), true ); | ||
|
|
||
| if ( preg_match( '/^\s+/u', $content_without_quotes ) ) { |
There was a problem hiding this comment.
I wonder about the use of the u modifier. I assume it was added so that \s also matches non-ASCII whitespace characters? I worry a bit about unintended consequences of this modifier, as PHP's UTF-8 handling can be subtle, but I'm not sure whether any such edge cases apply here.
If it is important to match non-ASCII whitespace, I'd suggest adding tests to cover the cases the sniff should catch (presumably the ones you mentioned here). Currently, the tests pass if the u modifier is removed.
| * | ||
| * @return void | ||
| */ | ||
| private function check_string_has_no_leading_trailing_whitespace( $matched_content, $param_name, $param_info ) { |
There was a problem hiding this comment.
This method accepts $matched_content and $param_name but doesn't use either. The sibling check_string_has_no_html_wrapper() and other related methods (introduced in c9e8dbd) have the same pattern.
@jrfnl, is there a reason for the uniform signature with unused parameters that I'm missing?
If not, I lean toward dropping the unused parameters for check_string_has_no_leading_trailing_whitespace().
There was a problem hiding this comment.
@jrfnl, is there a reason for the uniform signature with unused parameters that I'm missing?
IIRC, in the original sniff (before extensive refactoring), most, if not all, the "sub-functions" were called via call_user_func(), which meant the expected parameters of these "sub-functions" needed to be the same.
You may very well be correct in that the various refactoring done since that time have made that requirement redundant. This would need to be verified and confirmed and if so, any method signatures for private methods can be cleaned up.
| * | ||
| * @return void | ||
| */ | ||
| private function check_string_has_no_leading_trailing_whitespace( $matched_content, $param_name, $param_info ) { |
There was a problem hiding this comment.
Nitpick: maybe check_string_has_no_surrounding_whitespace() is a clearer name?
| namespace\esc_html_e( 'foo', '' ); // The sniff should start flagging this once it can resolve relative namespaces. | ||
| namespace\Sub\translate( 'foo', 'my-slug' ); // Ok. | ||
|
|
||
| __( ' string', 'my-slug' ); // Bad: leading space. |
There was a problem hiding this comment.
Looking at the new test cases for this check, I wonder if there is a reason to have duplicate tests that differ only in function name? They seem redundant to me because the method's logic doesn't vary based on the function name.
To me, the tests should cover different whitespace positions, whitespace types, and potentially parameter names, plus a few variations in whitespace quantity to guard against future regressions. What do you think?
As I described in #2501, at translate.w.org, sometimes we get translations that start or end with one or more empty spaces, and we want to avoid this.
This PR adds a new check to detect if a translatable string has leading or trailing spaces.
Fixes #2501.