Skip to content

Fix handling of word boundaries in Filter cog#6725

Merged
Jackenmen merged 2 commits into
Cog-Creators:V3/developfrom
Evanroby:filter-boundary
May 23, 2026
Merged

Fix handling of word boundaries in Filter cog#6725
Jackenmen merged 2 commits into
Cog-Creators:V3/developfrom
Evanroby:filter-boundary

Conversation

@Evanroby

@Evanroby Evanroby commented Apr 1, 2026

Copy link
Copy Markdown
Contributor

Description of the changes

Fixes #3682

refactors word list pattern generation to use a dedicated _build_word_pattern helper, for a more flexible and accurate word boundary handling

Have the changes in this PR been tested?

Yes

@github-actions github-actions Bot added the Category: Cogs - Filter This is related to the Filter cog. label Apr 1, 2026

@Jackenmen Jackenmen left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using negative lookahead/lookbehind with \w for word boundaries seems like an improvement over \b since, unlike \b, it always checks that the preceding character is not a word character instead of determining the behaviour based on the first/last character of the word.

For example, the current implementation (as seen in stable releases) for a word such as <h> has:

  • false positives: "x<h>y"
  • false negatives: " <h> "

This PR handles both of these correctly.

With that said, this case is also handled fine by a simpler:

rf"(?<!\w){re.escape(w)}(?!\w)"

I don't think we should really be deciding whether a word boundary should be checked based on whether the first/last character is a word character. Do you have some concrete cases where this would actually be an improvement rather than just make things possibly more confusing?

@Jackenmen Jackenmen added this to the 3.5.x milestone May 21, 2026
@Jackenmen Jackenmen added the Type: Bug Unexpected behavior, result, or exception. In case of PRs, it is a fix for the foregoing. label May 21, 2026
@Evanroby

Copy link
Copy Markdown
Contributor Author

Applied change!

@Evanroby Evanroby requested a review from Jackenmen May 21, 2026 05:57
@Jackenmen Jackenmen modified the milestones: 3.5.x, 3.5.25 May 23, 2026
@Jackenmen Jackenmen changed the title [Filter]: Fix for unicode symbols and more Fix handling of word boundaries in Filter cog May 23, 2026
@Jackenmen Jackenmen merged commit 17fea2f into Cog-Creators:V3/develop May 23, 2026
18 checks passed
@red-githubbot red-githubbot Bot added the Changelog Entry: Pending Changelog entry for this PR hasn't been added by repo maintainers yet. label May 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Category: Cogs - Filter This is related to the Filter cog. Changelog Entry: Pending Changelog entry for this PR hasn't been added by repo maintainers yet. Type: Bug Unexpected behavior, result, or exception. In case of PRs, it is a fix for the foregoing.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Filter Inconsistencies

2 participants