Skip to content

Commit 0ff0c06

Browse files
authored
feat: path-based scanning of urls (MetaMask#8662)
## Explanation Why: Dapp scanning now supports path-level dapp scanning. Without this client-side change, the API never receives paths and the path-scanning capability goes unused. <!-- Thanks for your contribution! Take a moment to answer these questions so that reviewers have the information they need to properly understand your changes: * What is the current state of things and why does it need to change? * What is the solution your changes offer and how does it work? * Are there any changes whose purpose might not obvious to those unfamiliar with the domain? * If your primary goal was to update one package but you found you had to update another one along the way, why did you do so? * If you had to upgrade a dependency, why did you do so? --> ## References <!-- Are there any issues that this pull request is tied to? Are there other links that reviewers should consult to understand these changes better? Are there client or consumer pull requests to adopt any breaking changes? For example: * Fixes #12345 * Related to #67890 --> Fixes: https://consensyssoftware.atlassian.net/jira/software/c/projects/PSAFE/boards/1950?selectedIssue=PSAFE-419 Extension PR: MetaMask/metamask-extension#42311 ## Screenshots I've ran MetaMask Extension locally with these changes. Paths are now included in the API request. <img width="483" height="119" alt="image" src="https://github.com/user-attachments/assets/ce278da6-fa90-4e7c-9ac2-90e4ecfd671f" /> ## Checklist - [ ] I've updated the test suite for new or updated code as appropriate - [ ] I've updated documentation (JSDoc, Markdown, etc.) for new or updated code as appropriate - [ ] I've communicated my changes to consumers by [updating changelogs for packages I've changed](https://github.com/MetaMask/core/tree/main/docs/processes/updating-changelogs.md) - [ ] I've introduced [breaking changes](https://github.com/MetaMask/core/tree/main/docs/processes/breaking-changes.md) in this PR and have prepared draft pull requests for clients and consumer packages to resolve them <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Medium Risk** > Modifies `scanUrl` request/caching semantics to sometimes key on `hostname+pathname`, which can change phishing detection outcomes and cache behavior for gateway domains and could affect API load if misclassified. > > **Overview** > **Adds path-aware phishing URL scanning for shared gateway hosts.** `PhishingController.scanUrl` now sends `hostname+pathname` (instead of hostname-only) for a curated set of gateway root domains and subdomains, and caches results by this scan parameter. > > Introduces new utilities/constants (`PHISHING_DETECTION_PATH_BASED_ROOT_DOMAINS`, `isPhishingDetectionPathBasedHostname`, `getPhishingDetectionScanUrlParam`), exports them from `index.ts`, and updates tests/changelog to cover the new request format and per-path caching behavior. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit 74ef4dc. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY -->
1 parent 4d2e89f commit 0ff0c06

7 files changed

Lines changed: 178 additions & 10 deletions

File tree

packages/phishing-controller/CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
### Added
11+
12+
- Support path-based phishing lists (`blocklistPaths`, `whitelistPaths`) and path-aware URL scanning for shared gateways (for example IPFS gateways and `sites.google.com`) via `getPhishingDetectionScanUrlParam`, `isPhishingDetectionPathBasedHostname`, and `PHISHING_DETECTION_PATH_BASED_ROOT_DOMAINS` ([#8662](https://github.com/MetaMask/core/pull/8662))
13+
1014
### Changed
1115

1216
- Bump `@metamask/controller-utils` from `^12.0.0` to `^12.1.0` ([#8774](https://github.com/MetaMask/core/pull/8774))

packages/phishing-controller/src/PhishingController-method-action-types.ts

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -59,8 +59,9 @@ export type PhishingControllerBypassAction = {
5959
};
6060

6161
/**
62-
* Scan a URL for phishing. It will only scan the hostname of the URL. It also only supports
63-
* web URLs.
62+
* Scan a URL for phishing. For most hosts only the hostname is sent to the API; for known
63+
* shared gateways the pathname is included (see `PHISHING_DETECTION_PATH_BASED_ROOT_DOMAINS`).
64+
* Only supports web URLs (`http:` / `https:`).
6465
*
6566
* @param url - The URL to scan.
6667
* @returns The phishing detection scan result.

packages/phishing-controller/src/PhishingController.test.ts

Lines changed: 51 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2813,7 +2813,7 @@ describe('PhishingController', () => {
28132813
it('should return a PhishingDetectionScanResult with a fetchError on timeout', async () => {
28142814
const scope = nock(PHISHING_DETECTION_BASE_URL)
28152815
.get(`/${PHISHING_DETECTION_SCAN_ENDPOINT}`)
2816-
.query({ url: testUrl })
2816+
.query({ url: 'example.com' })
28172817
.delayConnection(10000)
28182818
.reply(200, {});
28192819

@@ -2935,6 +2935,56 @@ describe('PhishingController', () => {
29352935
expect(response).toMatchObject(mockResponse);
29362936
expect(scope.isDone()).toBe(true);
29372937
});
2938+
2939+
it('should send hostname and path for path-based gateways and cache per path', async () => {
2940+
const urlA = 'https://ipfs.io/ipfs/QmAAA';
2941+
const urlB = 'https://ipfs.io/ipfs/QmBBB';
2942+
2943+
const scopeA = nock(PHISHING_DETECTION_BASE_URL)
2944+
.get(`/${PHISHING_DETECTION_SCAN_ENDPOINT}`)
2945+
.query({ url: 'ipfs.io/ipfs/QmAAA' })
2946+
.reply(200, {
2947+
recommendedAction: RecommendedAction.Warn,
2948+
});
2949+
2950+
const scopeB = nock(PHISHING_DETECTION_BASE_URL)
2951+
.get(`/${PHISHING_DETECTION_SCAN_ENDPOINT}`)
2952+
.query({ url: 'ipfs.io/ipfs/QmBBB' })
2953+
.reply(200, {
2954+
recommendedAction: RecommendedAction.Block,
2955+
});
2956+
2957+
const fetchSpy = jest.spyOn(global, 'fetch');
2958+
2959+
const resultA1 = await rootMessenger.call(
2960+
'PhishingController:scanUrl',
2961+
urlA,
2962+
);
2963+
const resultB = await rootMessenger.call(
2964+
'PhishingController:scanUrl',
2965+
urlB,
2966+
);
2967+
const resultA2 = await rootMessenger.call(
2968+
'PhishingController:scanUrl',
2969+
urlA,
2970+
);
2971+
2972+
expect(resultA1).toMatchObject({
2973+
hostname: 'ipfs.io',
2974+
recommendedAction: RecommendedAction.Warn,
2975+
});
2976+
expect(resultB).toMatchObject({
2977+
hostname: 'ipfs.io',
2978+
recommendedAction: RecommendedAction.Block,
2979+
});
2980+
expect(resultA2).toStrictEqual(resultA1);
2981+
2982+
expect(scopeA.isDone()).toBe(true);
2983+
expect(scopeB.isDone()).toBe(true);
2984+
expect(fetchSpy).toHaveBeenCalledTimes(2);
2985+
2986+
fetchSpy.mockRestore();
2987+
});
29382988
});
29392989

29402990
describe('bulkScanUrls', () => {

packages/phishing-controller/src/PhishingController.ts

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@ import {
4848
getHostnameFromUrl,
4949
roundToNearestMinute,
5050
getHostnameFromWebUrl,
51+
getPhishingDetectionScanUrlParam,
5152
buildCacheKey,
5253
splitCacheHits,
5354
resolveChainName,
@@ -910,31 +911,34 @@ export class PhishingController extends BaseController<
910911
}
911912

912913
/**
913-
* Scan a URL for phishing. It will only scan the hostname of the URL. It also only supports
914-
* web URLs.
914+
* Scan a URL for phishing. For most hosts only the hostname is sent to the API; for known
915+
* shared gateways the pathname is included (see `PHISHING_DETECTION_PATH_BASED_ROOT_DOMAINS`).
916+
* Only supports web URLs (`http:` / `https:`).
915917
*
916918
* @param url - The URL to scan.
917919
* @returns The phishing detection scan result.
918920
*/
919921
async scanUrl(url: string): Promise<PhishingDetectionScanResult> {
920-
const [hostname, ok] = getHostnameFromWebUrl(url);
921-
if (!ok) {
922+
const [scanUrlParam, scanParamOk] = getPhishingDetectionScanUrlParam(url);
923+
if (!scanParamOk) {
922924
return {
923925
hostname: '',
924926
recommendedAction: RecommendedAction.None,
925927
fetchError: 'url is not a valid web URL',
926928
};
927929
}
928930

929-
const cachedResult = this.#urlScanCache.get(hostname);
931+
const [hostname] = getHostnameFromWebUrl(url);
932+
933+
const cachedResult = this.#urlScanCache.get(scanUrlParam);
930934
if (cachedResult) {
931935
return cachedResult;
932936
}
933937

934938
const apiResponse = await safelyExecuteWithTimeout(
935939
async () => {
936940
const res = await fetch(
937-
`${PHISHING_DETECTION_BASE_URL}/${PHISHING_DETECTION_SCAN_ENDPOINT}?url=${encodeURIComponent(hostname)}`,
941+
`${PHISHING_DETECTION_BASE_URL}/${PHISHING_DETECTION_SCAN_ENDPOINT}?url=${encodeURIComponent(scanUrlParam)}`,
938942
{
939943
method: 'GET',
940944
headers: {
@@ -974,7 +978,7 @@ export class PhishingController extends BaseController<
974978
recommendedAction: apiResponse.recommendedAction,
975979
};
976980

977-
this.#urlScanCache.set(hostname, result);
981+
this.#urlScanCache.set(scanUrlParam, result);
978982

979983
return result;
980984
}

packages/phishing-controller/src/index.ts

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,11 @@ export {
2929
ApprovalFeatureType,
3030
} from './types';
3131
export type { CacheEntry } from './CacheManager';
32+
export {
33+
PHISHING_DETECTION_PATH_BASED_ROOT_DOMAINS,
34+
getPhishingDetectionScanUrlParam,
35+
isPhishingDetectionPathBasedHostname,
36+
} from './utils';
3237

3338
export type {
3439
PhishingControllerMaybeUpdateStateAction,

packages/phishing-controller/src/utils.test.ts

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@ import {
1010
getHostnameAndPathComponents,
1111
getHostnameFromUrl,
1212
getHostnameFromWebUrl,
13+
getPhishingDetectionScanUrlParam,
14+
isPhishingDetectionPathBasedHostname,
1315
matchPartsAgainstList,
1416
processConfigs,
1517
processDomainList,
@@ -981,6 +983,54 @@ describe('getHostnameFromWebUrl', () => {
981983
);
982984
});
983985

986+
describe('isPhishingDetectionPathBasedHostname', () => {
987+
it('returns true for registered roots and subdomains', () => {
988+
expect(isPhishingDetectionPathBasedHostname('ipfs.io')).toBe(true);
989+
expect(isPhishingDetectionPathBasedHostname('gateway.ipfs.io')).toBe(true);
990+
expect(isPhishingDetectionPathBasedHostname('dweb.link')).toBe(true);
991+
expect(isPhishingDetectionPathBasedHostname('sites.google.com')).toBe(true);
992+
});
993+
994+
it('is case-insensitive', () => {
995+
expect(isPhishingDetectionPathBasedHostname('IPFS.IO')).toBe(true);
996+
expect(isPhishingDetectionPathBasedHostname('Gateway.IPFS.IO')).toBe(true);
997+
});
998+
999+
it('returns false for unrelated hosts', () => {
1000+
expect(isPhishingDetectionPathBasedHostname('example.com')).toBe(false);
1001+
expect(isPhishingDetectionPathBasedHostname('evil-ipfs.io')).toBe(false);
1002+
});
1003+
});
1004+
1005+
describe('getPhishingDetectionScanUrlParam', () => {
1006+
it('returns hostname only for non-gateway hosts', () => {
1007+
expect(
1008+
getPhishingDetectionScanUrlParam('https://example.com/path?q=1#h'),
1009+
).toStrictEqual(['example.com', true]);
1010+
});
1011+
1012+
it('returns hostname plus path for path-based gateway hosts', () => {
1013+
expect(
1014+
getPhishingDetectionScanUrlParam(
1015+
'https://ipfs.io/ipfs/QmAAA/foo?x=1#frag',
1016+
),
1017+
).toStrictEqual(['ipfs.io/ipfs/QmAAA/foo', true]);
1018+
});
1019+
1020+
it('does not append path when pathname is /', () => {
1021+
expect(
1022+
getPhishingDetectionScanUrlParam('https://dweb.link/'),
1023+
).toStrictEqual(['dweb.link', true]);
1024+
});
1025+
1026+
it('returns ok false for invalid web URLs', () => {
1027+
expect(getPhishingDetectionScanUrlParam('not-a-url')).toStrictEqual([
1028+
'',
1029+
false,
1030+
]);
1031+
});
1032+
});
1033+
9841034
/**
9851035
* Extracts the domain name (e.g., example.com) from a given hostname.
9861036
*

packages/phishing-controller/src/utils.ts

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -365,6 +365,60 @@ export const getHostnameFromWebUrl = (url: string): [string, boolean] => {
365365
return [hostname || '', Boolean(hostname)];
366366
};
367367

368+
/**
369+
* Hosts where PDS single-URL scans include the URL path (shared gateways / hosts where many sites
370+
* share one origin). For all other hosts, only the hostname is sent.
371+
*/
372+
export const PHISHING_DETECTION_PATH_BASED_ROOT_DOMAINS = [
373+
'ipfs.io',
374+
'dweb.link',
375+
'cf-ipfs.com',
376+
'cloudflare-ipfs.com',
377+
'irys.xyz',
378+
'sites.google.com',
379+
] as const;
380+
381+
/**
382+
* @param hostname - Lowercase normalization is applied for matching registered roots and subdomains.
383+
* @returns Whether {@link getPhishingDetectionScanUrlParam} appends pathname for this hostname.
384+
*/
385+
export function isPhishingDetectionPathBasedHostname(
386+
hostname: string,
387+
): boolean {
388+
const normalizedHost = hostname.toLowerCase();
389+
return PHISHING_DETECTION_PATH_BASED_ROOT_DOMAINS.some(
390+
(root) => normalizedHost === root || normalizedHost.endsWith(`.${root}`),
391+
);
392+
}
393+
394+
/**
395+
* Builds the `url` query parameter for {@link PhishingController.scanUrl}. For hosts in
396+
* {@link PHISHING_DETECTION_PATH_BASED_ROOT_DOMAINS} (and their subdomains), the value is hostname
397+
* plus pathname, without protocol, query, or fragment. For all other hosts, only hostname is used.
398+
*
399+
* @param url - A web URL string (must use `http:` or `https:` — same rules as {@link getHostnameFromWebUrl}).
400+
* @returns A tuple of `[scanUrlParam, ok]` where `ok` is false when the URL is not a valid web URL.
401+
*/
402+
export const getPhishingDetectionScanUrlParam = (
403+
url: string,
404+
): [scanUrlParam: string, ok: boolean] => {
405+
const [hostname, ok] = getHostnameFromWebUrl(url);
406+
if (!ok) {
407+
return ['', false];
408+
}
409+
410+
if (!isPhishingDetectionPathBasedHostname(hostname)) {
411+
return [hostname, true];
412+
}
413+
414+
// `getHostnameFromWebUrl` already required a successful `new URL(url)` parse.
415+
const { pathname } = new URL(url);
416+
const pathSuffix = pathname === '/' ? '' : pathname;
417+
const scanUrlParam = pathSuffix ? `${hostname}${pathSuffix}` : hostname;
418+
419+
return [scanUrlParam, true];
420+
};
421+
368422
export const getPathnameFromUrl = (url: string): string => {
369423
try {
370424
const { pathname } = new URL(url);

0 commit comments

Comments
 (0)