Skip to content

Add a check for the leading and trailing spaces#2504

Open
amieiro wants to merge 9 commits intoWordPress:developfrom
amieiro:leading-trailing-spaces
Open

Add a check for the leading and trailing spaces#2504
amieiro wants to merge 9 commits intoWordPress:developfrom
amieiro:leading-trailing-spaces

Conversation

@amieiro
Copy link
Copy Markdown

@amieiro amieiro commented Nov 4, 2024

As I described in #2501, at translate.w.org, sometimes we get translations that start or end with one or more empty spaces, and we want to avoid this.

This PR adds a new check to detect if a translatable string has leading or trailing spaces.

Fixes #2501.

Copy link
Copy Markdown
Member

@jrfnl jrfnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @amieiro, thanks for this PR.

I haven't tested it yet, but based on a code review alone, this is looking good.

I wonder if the tests can be made a little more varied ? I.e. not just use leading/trailing spaces, but also have some leading/trailing tabs, new lines and combinations of spaces, tabs, new lines.

What do you think ?

@jrfnl
Copy link
Copy Markdown
Member

jrfnl commented Dec 24, 2024

@amieiro Just checking in as you never responded to the question I posed in my initial review.... Do you still intend to finish off this PR ?

@amieiro
Copy link
Copy Markdown
Author

amieiro commented May 27, 2025

Sorry, I don't have the time to finish it at this time.

Comment thread WordPress/Sniffs/WP/I18nSniff.php Outdated
array( $param_info['clean'] )
);
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Comment thread WordPress/Sniffs/WP/I18nSniff.php Outdated
array( $param_info['clean'] )
);
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Comment thread WordPress/Sniffs/WP/I18nSniff.php Outdated
array( $param_info['clean'] )
);
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Comment thread WordPress/Sniffs/WP/I18nSniff.php Outdated
array( $param_info['clean'] )
);
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Comment thread WordPress/Sniffs/WP/I18nSniff.php Outdated
array( $param_info['clean'] )
);
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Comment thread WordPress/Sniffs/WP/I18nSniff.php Outdated
$pattern_trailing_spaces = '/[\x20]+$/u';
$pattern_leading_tabs = '/^\x09+/u';
$pattern_trailing_tabs = '/\x09+$/u';
$pattern_leading_vtabs = '/^\x0B+/u';
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$pattern_leading_vtabs = '/^\x0B+/u';
$pattern_leading_vtabs = '/^\x0B+/u';

Comment thread WordPress/Sniffs/WP/I18nSniff.php Outdated
$pattern_leading_spaces = '/^[\x20]+/u';
$pattern_trailing_spaces = '/[\x20]+$/u';
$pattern_leading_tabs = '/^\x09+/u';
$pattern_trailing_tabs = '/\x09+$/u';
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$pattern_trailing_tabs = '/\x09+$/u';
$pattern_trailing_tabs = '/\x09+$/u';

Comment thread WordPress/Sniffs/WP/I18nSniff.php Outdated
// Define regex patterns.
$pattern_leading_spaces = '/^[\x20]+/u';
$pattern_trailing_spaces = '/[\x20]+$/u';
$pattern_leading_tabs = '/^\x09+/u';
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$pattern_leading_tabs = '/^\x09+/u';
$pattern_leading_tabs = '/^\x09+/u';

Comment thread WordPress/Sniffs/WP/I18nSniff.php Outdated

// Define regex patterns.
$pattern_leading_spaces = '/^[\x20]+/u';
$pattern_trailing_spaces = '/[\x20]+$/u';
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$pattern_trailing_spaces = '/[\x20]+$/u';
$pattern_trailing_spaces. = '/[\x20]+$/u';

Comment thread WordPress/Sniffs/WP/I18nSniff.php Outdated
$first_non_empty = $this->phpcsFile->findNext( Tokens::$emptyTokens, $param_info['start'], ( $param_info['end'] + 1 ), true );

// Define regex patterns.
$pattern_leading_spaces = '/^[\x20]+/u';
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$pattern_leading_spaces = '/^[\x20]+/u';
$pattern_leading_spaces = '/^[\x20]+/u';

@jrfnl
Copy link
Copy Markdown
Member

jrfnl commented Jun 15, 2025

@amieiro Seeing as you've been working on this again - could you please explain why you have set things up to have separate errors for spaces vs tabs vs vtabs vs newlines ?

My previous question was only about having tests in place with these and the sniff handling all types of whitespace, not about splitting that up into multiple errors.

I cannot think of any situation where one type of (leading/trailing) whitespace would be acceptable, but another would not be, so having different error codes doesn't make sense to me.

Is there a reason for this distinction based on your experiences in the Polyglots team ? I'd be interested to hear your reasoning for this.

@amieiro
Copy link
Copy Markdown
Author

amieiro commented Jun 16, 2025

@jrfnl Sorry, this PR is not yet ready for review and I forgot to convert it to draft. As we return to contribute with WordPress.org, I resumed work on this PR.

The horizontal tabulations are not often seen at the beginning or end of sentences, but they are seen in the middle, so I wanted to prevent its use, since I don't think it is correct to use it. You can see a use case inside a string in the core here. The same applies to the new lines. You can see an example in the core here.
And we have another thing to check: the different whitespace characters. The regular space characters is U+0020, but there are a lot of different whitespace characters: U+00A0, U+2000, U+2001, U+2002,...

The most common problem is the U+0020 space problem (in fact it is the only one I usually see), so the others are really edge situations. I can continue this PR in two different approaches:

  • Contemplate only the U+0020 case and add the other cases to the tests.
  • Add all the commented situations.

Which approach do you prefer?

@jrfnl jrfnl marked this pull request as draft June 16, 2025 14:29
@jrfnl
Copy link
Copy Markdown
Member

jrfnl commented Jun 16, 2025

@jrfnl Sorry, this PR is not yet ready for review and I forgot to convert it to draft. As we return to contribute with WordPress.org, I resumed work on this PR.

Okay, thanks for letting me know. I've converted it to draft now and won't do an in-detail review until it has been marked as "Ready to review".

The horizontal tabulations are not often seen at the beginning or end of sentences, but they are seen in the middle, so I wanted to prevent its use, since I don't think it is correct to use it. You can see a use case inside a string in the core here. The same applies to the new lines. You can see an example in the core here.

I think that this PR should probably focus on the original problem it was trying to solve - leading and trailing whitespace.

Expanding the scope to mid-string whitespace characters feels like scope-creep and may have different implications (i.e., would involve a different decision point, potentially delaying the merging of the code related to the original problem).

I suggest opening a follow-up issue for that to discuss if this needs flagging and if so, what would need flagging. (I mean, looking at the examples, these seem like perfectly valid use-cases for mid string vertical whitespace, especially if you also consider RTL and BTT languages).

And we have another thing to check: the different whitespace characters. The regular space characters is U+0020, but there are a lot of different whitespace characters: U+00A0, U+2000, U+2001, U+2002,...

The most common problem is the U+0020 space problem (in fact it is the only one I usually see), so the others are really edge situations. I can continue this PR in two different approaches:

  • Contemplate only the U+0020 case and add the other cases to the tests.
  • Add all the commented situations.

Which approach do you prefer?

For leading and trailing whitespace, IMO the new errors should flag all whitespace characters, but without differentiating in the error message or code.
What I mean by that is:

  • Change the error message(s) to use the term "whitespace" instead of explicitly saying what type of whitespace you're flagging.
  • Change the error code(s) to also use the term "whitespace" - this would mean you'd end up with two error codes: LeadingWhiteSpace and TrailingWhiteSpace.

Does that help ?

@jrfnl jrfnl removed this from the 3.2.0 milestone Jun 23, 2025
@jrfnl
Copy link
Copy Markdown
Member

jrfnl commented Mar 19, 2026

@amieiro Just checking in - will you have time to finish this PR in the near future ? It's been over a year since you opened it and it's still not marked as ready for review, while the base premise of the PR seems like a valid and valuable addition, which would help WPCS users.

@amieiro
Copy link
Copy Markdown
Author

amieiro commented Apr 22, 2026

Thanks for the feedback. I just pushed a refactor that addresses all of it.

What changed:

  • Collapsed the eight error codes (Leading/Trailing × Spaces/Tabs/VTabs/NewLines) into just two: LeadingWhiteSpace and TrailingWhiteSpace.
  • Reworded both error messages to use the generic term "whitespace".
  • Replaced the eight hex-code regexes with a single \s match on each side. It already covers space, tab, vertical tab, newline, CR, and form feed in one pass.
  • Added four combination test cases (leading tab + space, trailing space + tab, mixed leading/trailing, and multi-line with newline + tab on each side).
  • Bumped the @since annotation to 3.4.0.

Out of scope here, happy to open follow-up issues: mid-string whitespace and Unicode whitespace characters (U+00A0, U+2000, U+200B, etc.).

@amieiro amieiro marked this pull request as ready for review April 22, 2026 16:11
@amieiro amieiro requested review from GaryJones and jrfnl April 22, 2026 16:13
@rodrigoprimo
Copy link
Copy Markdown
Contributor

@amieiro, thanks for updating this PR! Could you please rebase it to fix the conflicts?

amieiro added 9 commits April 23, 2026 16:55
… two error codes

- replace eight per-character error codes (Leading/Trailing x Spaces/Tabs/VTabs/NewLines)
  with two generic codes: LeadingWhiteSpace and TrailingWhiteSpace
- reduce the eight hex-code regexes to a single `\s` match on each side (covers space,
  tab, vertical tab, newline, CR, form feed in one pass)
- reword error messages to use the generic term "whitespace"
- bump @SInCE annotation from 3.2.0 to 3.4.0
- add four combination test cases (leading tab+space, trailing space+tab, mixed
  leading/trailing, multi-line with newline+tab on each side) to verify strings with
  multiple whitespace types produce at most one leading and one trailing error
@amieiro amieiro force-pushed the leading-trailing-spaces branch from d03a56f to 0f6a099 Compare April 23, 2026 15:04
@amieiro
Copy link
Copy Markdown
Author

amieiro commented Apr 23, 2026

@amieiro, thanks for updating this PR! Could you please rebase it to fix the conflicts?

Done!

Copy link
Copy Markdown
Contributor

@rodrigoprimo rodrigoprimo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating this PR, @amieiro. I confirmed that you addressed the points raised in previous reviews. Besides that, I also left some inline comments for us to discuss.

$content_without_quotes = TextStrings::stripQuotes( $param_info['clean'] );
$first_non_empty = $this->phpcsFile->findNext( Tokens::$emptyTokens, $param_info['start'], ( $param_info['end'] + 1 ), true );

if ( preg_match( '/^\s+/u', $content_without_quotes ) ) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: I believe the + quantifier is redundant here and in the regex below. A single surrounding whitespace character is enough to trigger these errors.

Suggested change
if ( preg_match( '/^\s+/u', $content_without_quotes ) ) {
if ( preg_match( '/^\s/u', $content_without_quotes ) ) {

$content_without_quotes = TextStrings::stripQuotes( $param_info['clean'] );
$first_non_empty = $this->phpcsFile->findNext( Tokens::$emptyTokens, $param_info['start'], ( $param_info['end'] + 1 ), true );

if ( preg_match( '/^\s+/u', $content_without_quotes ) ) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder about the use of the u modifier. I assume it was added so that \s also matches non-ASCII whitespace characters? I worry a bit about unintended consequences of this modifier, as PHP's UTF-8 handling can be subtle, but I'm not sure whether any such edge cases apply here.

If it is important to match non-ASCII whitespace, I'd suggest adding tests to cover the cases the sniff should catch (presumably the ones you mentioned here). Currently, the tests pass if the u modifier is removed.

*
* @return void
*/
private function check_string_has_no_leading_trailing_whitespace( $matched_content, $param_name, $param_info ) {
Copy link
Copy Markdown
Contributor

@rodrigoprimo rodrigoprimo Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method accepts $matched_content and $param_name but doesn't use either. The sibling check_string_has_no_html_wrapper() and other related methods (introduced in c9e8dbd) have the same pattern.

@jrfnl, is there a reason for the uniform signature with unused parameters that I'm missing?

If not, I lean toward dropping the unused parameters for check_string_has_no_leading_trailing_whitespace().

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrfnl, is there a reason for the uniform signature with unused parameters that I'm missing?

IIRC, in the original sniff (before extensive refactoring), most, if not all, the "sub-functions" were called via call_user_func(), which meant the expected parameters of these "sub-functions" needed to be the same.

You may very well be correct in that the various refactoring done since that time have made that requirement redundant. This would need to be verified and confirmed and if so, any method signatures for private methods can be cleaned up.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the additional details, @jrfnl.

@amieiro, considering the above, I think it is better to remove the unused parameters from check_string_has_no_leading_trailing_whitespace(). No need to change the other methods.

*
* @return void
*/
private function check_string_has_no_leading_trailing_whitespace( $matched_content, $param_name, $param_info ) {
Copy link
Copy Markdown
Contributor

@rodrigoprimo rodrigoprimo Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: maybe check_string_has_no_surrounding_whitespace() is a clearer name?

namespace\esc_html_e( 'foo', '' ); // The sniff should start flagging this once it can resolve relative namespaces.
namespace\Sub\translate( 'foo', 'my-slug' ); // Ok.

__( ' string', 'my-slug' ); // Bad: leading space.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the new test cases for this check, I wonder if there is a reason to have duplicate tests that differ only in function name? They seem redundant to me because the method's logic doesn't vary based on the function name.

To me, the tests should cover different whitespace positions, whitespace types, and potentially parameter names, plus a few variations in whitespace quantity to guard against future regressions. What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

I18n: Report the translatable strings with leading and trailing spaces

4 participants