Skip to content

Commit 865b472

Browse files
rofeclaude
andcommitted
fix: tighten TLD-glue lookahead so .com at end-of-string isn't split
The previous lookahead only required any letter after the TLD, so when .com appeared at end-of-string (e.g. last email in a flattened bullet list), the engine backtracked the alternation from .com (lookahead fail) to .co followed by a lone trailing m, producing emails like rofe@adobe.co. Require the following letter(s) to lead into another local part and @, so the split only fires when there really is a glued-on next email. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 74a0fc4 commit 865b472

1 file changed

Lines changed: 4 additions & 1 deletion

File tree

src/parser.js

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,10 @@ const INTENT_WORDS = /\b(add|invite|include|onboard|grant|give\s+access|join)\b/
77
// immediately followed by another letter (which can only happen when two
88
// emails were concatenated).
99
const COMMON_TLDS = 'com|org|net|edu|gov|mil|io|co|us|uk|de|fr|jp|cn|au|in|br|ca|me|tv|info|biz|app|dev|ai|cloud';
10-
const TLD_GLUE_RE = new RegExp(`\\.(${COMMON_TLDS})(?=[a-zA-Z])`, 'gi');
10+
// Lookahead requires the following letter(s) to lead into another local
11+
// part and @ — otherwise the engine would happily backtrack from .com
12+
// (failing its lookahead at end-of-string) to .co followed by a lone m.
13+
const TLD_GLUE_RE = new RegExp(`\\.(${COMMON_TLDS})(?=[a-zA-Z][a-zA-Z0-9._%+\\-]*@)`, 'gi');
1114

1215
const HTML_ENTITIES = { lt: '<', gt: '>', amp: '&', quot: '"', apos: "'", nbsp: ' ' };
1316

0 commit comments

Comments
 (0)