Skip to content

Commit cd2679d

Browse files
NiallJoeMaherclaude
andcommitted
fix(feed): anchor unwrapDoubledUrl to the host boundary
Code review caught that the naive lastIndexOf scan would truncate a URL whose path legitimately contains http://https:// deeper down (path-based image proxies, or a slug literally containing a scheme). Match only an absolute URL sitting immediately after the host — the exact shape of the upstream doubled-origin bug — and leave everything else untouched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent ba31d94 commit cd2679d

3 files changed

Lines changed: 19 additions & 9 deletions

File tree

.claude/scheduled_tasks.lock

Lines changed: 0 additions & 1 deletion
This file was deleted.

utils/url.test.ts

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,15 @@ describe("unwrapDoubledUrl", () => {
2929
).toBe("https://img.proxy/optimize?url=https://cdn.site/x.png");
3030
});
3131

32+
it("leaves a scheme that appears deeper in the path untouched", () => {
33+
// A slug or nested path segment that contains `http://` is NOT a doubled
34+
// origin — only a scheme immediately after the host counts.
35+
const slug = "https://blog.com/2021/http-vs-https/cover.png";
36+
expect(unwrapDoubledUrl(slug)).toBe(slug);
37+
const nested = "https://cdn.site.com/a/https://b/x.png";
38+
expect(unwrapDoubledUrl(nested)).toBe(nested);
39+
});
40+
3241
it("passes through null/empty", () => {
3342
expect(unwrapDoubledUrl(null)).toBeNull();
3443
expect(unwrapDoubledUrl(undefined)).toBeNull();

utils/url.ts

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -29,22 +29,24 @@ export function safeExternalHref(
2929
return parsed.protocol === "https:" ? url.trim() : undefined;
3030
}
3131

32+
// `https://host[:port]/<absolute-url>` — an upstream feed prefixed its own
33+
// origin onto an already-absolute URL. Anchored so the embedded scheme must sit
34+
// immediately after the host's first slash: this is the precise shape of the
35+
// bug and avoids mangling a scheme that legitimately appears deeper in the path
36+
// (path-based image proxies, a slug containing `http://`) or in a query string.
37+
const DOUBLED_ORIGIN = /^https?:\/\/[^/]+\/(https?:\/\/.+)$/i;
38+
3239
/**
3340
* Unwraps a URL that an upstream feed prefixed with its own origin, leaving an
3441
* already-absolute URL doubled up — e.g. HackerNoon's `media:thumbnail` serves
35-
* `https://hackernoon.com/https://cdn.hackernoon.com/x.png`, which 404s. We
36-
* slice from the LAST embedded scheme so the inner, real URL wins. A normal URL
37-
* (only scheme at index 0) and a `?url=https://...` proxy param are left intact.
42+
* `https://hackernoon.com/https://cdn.hackernoon.com/x.png`, which 404s. Returns
43+
* the inner URL; a normal URL is returned unchanged.
3844
*/
3945
export function unwrapDoubledUrl(
4046
url: string | null | undefined,
4147
): string | null {
4248
if (!url) return null;
43-
// Ignore a scheme that appears inside the query string — only unwrap when the
44-
// embedded scheme sits in the path (before any `?`).
45-
const path = url.split("?")[0];
46-
const i = Math.max(path.lastIndexOf("http://"), path.lastIndexOf("https://"));
47-
return i > 0 ? url.slice(i) : url;
49+
return DOUBLED_ORIGIN.exec(url)?.[1] ?? url;
4850
}
4951

5052
/** Upgrades an `http://` URL to `https://`. Returns null for empty input. */

0 commit comments

Comments
 (0)