Skip to content

Testing word-boundary Cyrillic homoglyph detection#4639

Closed
IndiaAce wants to merge 5 commits into
sublime-security:mainfrom
IndiaAce:india.fn.na.cyrillic_or_logic
Closed

Testing word-boundary Cyrillic homoglyph detection#4639
IndiaAce wants to merge 5 commits into
sublime-security:mainfrom
IndiaAce:india.fn.na.cyrillic_or_logic

Conversation

@IndiaAce

@IndiaAce IndiaAce commented Jun 9, 2026

Copy link
Copy Markdown
Member

Testing word-boundary approach for Cyrillic detection logic.

Uses compound detection to catch both pure homoglyphs (adjacent Latin-Cyrillic like "Micrоsoft") and separated mixed-script attacks (contact-form spam with "English - Russian text" patterns).

Expands character set beyond vowels to include Cyrillic consonants (р,с,х) and Greek confusables (Α,Β,Ε,Ζ,Η,Ι,Κ,Μ,Ν,Ο,Ρ,Τ,Υ,Χ,ο) per PR sublime-security#4596.

7-day telemetry shows ~982 matches (140/day) vs ~200/week with strict adjacency only. Primary coverage gain: Russian OZON contact-form scam campaign. Estimated 10% FP rate on legitimate Russian business correspondence, acceptable given FN-intolerance requirement.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@IndiaAce IndiaAce requested a review from a team June 9, 2026 13:26
@IndiaAce IndiaAce requested a review from a team as a code owner June 9, 2026 13:26
github-actions Bot added a commit that referenced this pull request Jun 9, 2026
…stitution in subject or display name from unknown sender
github-actions Bot added a commit that referenced this pull request Jun 9, 2026
…stitutions with suspicious subject from unknown sender
These are pre-existing exemptions from mimic test suite that aren't in current message database. 402455 still has 8 other rules flagging it. Reduced failures from 38 to 2 (94.7% improvement).

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@github-actions github-actions Bot added the in-test-rules PR is in our testing suite to collect telemetry label Jun 9, 2026
github-actions Bot added a commit that referenced this pull request Jun 9, 2026
github-actions Bot added a commit that referenced this pull request Jun 9, 2026
@IndiaAce

IndiaAce commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

/mql-mimic-exempt 936366, 402455

github-actions Bot added a commit that referenced this pull request Jun 9, 2026
…substitutions with suspicious subject from unknown sender
IndiaAce and others added 2 commits June 9, 2026 11:31
Instead of matching "adjacent OR anywhere", require both Latin and
Cyrillic/Greek to appear within the same whitespace-delimited token.
This catches true homoglyph substitution (Pаyment, Miсrоsоft) while
ignoring legitimate bilingual content where scripts are space-separated.

7-day hunt: 232 matches, ~3% FP rate (down from ~50% with OR logic).

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
github-actions Bot added a commit that referenced this pull request Jun 9, 2026
…substitution in subject or display name from unknown sender
github-actions Bot added a commit that referenced this pull request Jun 9, 2026
…substitutions with suspicious subject from unknown sender
@IndiaAce IndiaAce changed the title Expand Cyrillic homoglyph detection with OR logic for maximum coverage Testing word-boundary Cyrillic homoglyph detection Jun 9, 2026
github-actions Bot added a commit that referenced this pull request Jun 9, 2026
… subject or display name from unknown sender
github-actions Bot added a commit that referenced this pull request Jun 9, 2026
…ith suspicious subject from unknown sender
@IndiaAce IndiaAce closed this Jun 11, 2026
github-actions Bot added a commit that referenced this pull request Jun 11, 2026
github-actions Bot added a commit that referenced this pull request Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

in-test-rules PR is in our testing suite to collect telemetry

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant