Skip to content

Commit f304e18

Browse files
authored
Merge pull request #155 from 2Toad/jp-issue-154
Fixes #154: Unicode characters cause wholeWord to break
2 parents 43a9402 + 44d9f5b commit f304e18

13 files changed

Lines changed: 1323 additions & 752 deletions

.nvmrc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
20.17.0
1+
22.19.0

README.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -135,6 +135,21 @@ profanity.censor('I like big butts and I cannot lie', CensorType.AllVowels);
135135
// I like big b$tts and I cannot lie
136136
```
137137

138+
### unicodeWordBoundaries
139+
140+
Controls whether word boundaries are Unicode-aware. By default this is set to `false` due to the performance impact.
141+
142+
- When `false` (default), whole-word matching uses ASCII-style boundaries (similar to `\b`) plus underscore `_` as a separator. This is fastest and ideal for ASCII inputs.
143+
- When `true`, whole-word matching uses Unicode-aware boundaries so words with diacritics (e.g., `vehículo`, `horário`) and compound separators are handled correctly.
144+
145+
```JavaScript
146+
// Enable Unicode-aware boundaries when processing non-ASCII input
147+
const profanity = new Profanity({ unicodeWordBoundaries: true });
148+
149+
profanity.exists('vehículo horario');
150+
// false (does not match on "culo" inside "vehículo")
151+
```
152+
138153
## Customize the word list
139154

140155
Add words:

contribute.md

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -65,10 +65,6 @@ The Profanity project includes an .nvmrc file, so you can run `nvm use` to switc
6565

6666
The Profanity project includes Husky for running Git Hooks. Running `git commit` will trigger `lint-staged` which will lint all files currently staged in Git. If linting fails, the commit will be cancelled
6767

68-
### Dependencies
69-
70-
- `chai`: we must use v4.x because v5.x is pure ESM, and we require CommonJS modules
71-
7268
### Translations
7369

7470
We utilize a self-hosted instance of the Open Source [LibreTranslate](https://github.com/LibreTranslate/LibreTranslate) lib to translate the core English list of profane words.

eslint.config.mjs

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,15 @@ export default [
1111
js.configs.recommended,
1212
...ts.configs.recommended,
1313
security.configs.recommended,
14+
{
15+
// Disable noisy security warnings that are intentional in this codebase
16+
rules: {
17+
// Dynamic regex construction is required for the profanity alternation
18+
"security/detect-non-literal-regexp": "off",
19+
// Indexed access in benchmarks/translate is safe in our context
20+
"security/detect-object-injection": "off",
21+
},
22+
},
1423
{
1524
// These file-matching rules will be processed after the above configs
1625
files: ["**/*.{js,ts}"],

0 commit comments

Comments
 (0)