Skip to content

Commit cf7839c

Browse files
fix(wikipedia): handle branded TLDs in extractBrandFromUrl
hdfc.bank.in was resolving to companyName=bank instead of hdfc because .bank was not recognized as a multi-part TLD prefix. Add bank, firm, gen, ind, res, nic to the prefix set (India ccSLD patterns). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent a25729b commit cf7839c

2 files changed

Lines changed: 12 additions & 0 deletions

File tree

src/wikipedia-analysis/handler.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ const REGION_SUFFIXES_RE = /(?:usa|us|uk|eu|de|fr|es|it|nl|be|at|ch|au|ca|jp|kr|
2222

2323
const MULTI_PART_TLD_PREFIXES = new Set([
2424
'co', 'com', 'org', 'net', 'ac', 'gov', 'edu', 'mil',
25+
'bank', 'firm', 'gen', 'ind', 'res', 'nic',
2526
]);
2627

2728
/**

test/audits/wikipedia-analysis/handler.test.js

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -807,5 +807,16 @@ describe('Wikipedia Analysis Handler', () => {
807807
expect(extractBrandFromUrl('https://a.b.walmart.com')).to.equal('walmart');
808808
expect(extractBrandFromUrl('https://dev.blog.google.com')).to.equal('google');
809809
});
810+
811+
it('should handle branded TLDs like .bank.in', () => {
812+
expect(extractBrandFromUrl('https://hdfc.bank.in')).to.equal('hdfc');
813+
expect(extractBrandFromUrl('https://icici.bank.in')).to.equal('icici');
814+
});
815+
816+
it('should handle other industry-specific ccSLDs', () => {
817+
expect(extractBrandFromUrl('https://acme.firm.in')).to.equal('acme');
818+
expect(extractBrandFromUrl('https://iitd.res.in')).to.equal('iitd');
819+
expect(extractBrandFromUrl('https://example.nic.in')).to.equal('example');
820+
});
810821
});
811822
});

0 commit comments

Comments
 (0)