Commit 3f7f7ee
committed
Fix binary string audit: exclude bare token vocabulary matches
The embedded Nomic code token vocabulary (40K tokens) includes words like
"wget" as code tokens. Filter out bare single-word matches (2-10 lowercase
chars) since real dangerous strings appear in command context, not as
standalone vocabulary entries.1 parent 74099d8 commit 3f7f7ee
1 file changed
+4
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
97 | 97 | | |
98 | 98 | | |
99 | 99 | | |
100 | | - | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
101 | 104 | | |
102 | 105 | | |
103 | 106 | | |
| |||
0 commit comments