Skip to content

Commit 377c9eb

Browse files
authored
Merge pull request #483 from dahlia/feature/media-proxy
Media proxy
2 parents 0ebf4ad + 856e0df commit 377c9eb

36 files changed

Lines changed: 2045 additions & 223 deletions

AGENTS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -426,6 +426,8 @@ STORAGE_URL_BASE=https://your-bucket.s3.amazonaws.com
426426
| `REMOTE_REPLIES_SCRAPE_INTERVAL_SECONDS` | 5 | Delay between scrape requests per origin |
427427
| `REMOTE_REPLIES_SCRAPE_BACKOFF_SECONDS` | 300 | Backoff for 429 without `Retry-After` |
428428
| `REMOTE_REPLIES_SCRAPE_COOLDOWN_SECONDS` | 300 | Completed scrape deduplication window |
429+
| `MEDIA_PROXY` | off | Remote media proxy: `off`, `proxy`, `cache` (booleans accepted: `true``proxy`, `false``off`) |
430+
| `REMOTE_MEDIA_THUMBNAILS` | on | Generate local sharp thumbnails for remote attachments (boolean) |
429431

430432

431433
Adding new environment variables

CHANGES.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,47 @@ Version 0.9.0
66

77
To be released.
88

9+
- Added a media proxy that re-serves remote avatars, headers, post
10+
attachments, custom emojis, and preview-card images from Hollo's own
11+
origin. This sidesteps CORS configurations on remote object stores
12+
and prevents the visitor's browser from talking directly to the
13+
source server. Controlled by a new `MEDIA_PROXY` environment
14+
variable with three levels: [[#481]]
15+
16+
- `off` (default): the Mastodon API and web UI hand the original
17+
remote URL to clients, matching the historical behaviour.
18+
- `proxy`: every remote media URL is rewritten to a signed
19+
`/proxy/<sig>/<b64url>` path served by Hollo itself. The proxy
20+
runs SSRF checks on the upstream URL and on every redirect
21+
target, allows only image/video/audio Content-Types (image/svg+xml
22+
is explicitly blocked to avoid same-origin XSS), caps the body at
23+
32 MiB, and serves the response with
24+
`Cache-Control: public, max-age=2592000, immutable` and
25+
`X-Content-Type-Options: nosniff`. No on-disk cache.
26+
- `cache`: same URL rewriting, but the streamed body is persisted
27+
to the configured storage backend as `proxy/<sha256>.bin`, with
28+
a content-type sidecar alongside it at `proxy/<sha256>.json`.
29+
Subsequent requests skip the upstream fetch. The admin
30+
dashboard at */thumbnail_cleanup* can purge the cache on demand.
31+
32+
`MEDIA_PROXY` also accepts the Boolean synonyms `true`/`on`/`1`
33+
(as aliases for `proxy`) and `false`/`off`/`0` (as aliases for
34+
`off`). Disk caching is opt-in only via the explicit `cache`
35+
value.
36+
37+
Outbound federation is unaffected: Hollo still publishes the
38+
original remote URLs in ActivityPub `icon`, `image`, `attachment`,
39+
and emoji `Tag` references.
40+
41+
- Added a `REMOTE_MEDIA_THUMBNAILS` environment variable that controls
42+
whether Hollo downloads incoming remote attachments to generate a
43+
local WebP thumbnail. Set to `off` to skip the upstream fetch and
44+
Sharp pipeline entirely, storing the remote URL itself as the
45+
thumbnail URL—useful in combination with `MEDIA_PROXY=proxy` or
46+
`cache` to free up the disk space the local thumbnails would
47+
otherwise occupy. Defaults to `on` (the historical behavior).
48+
[[#481]]
49+
950
- Added [FEP-044f] quote authorization and policy support on top of the
1051
Mastodon-compatible quote APIs. [[#457], [#459], [#460]]
1152

@@ -156,6 +197,7 @@ To be released.
156197
[#466]: https://github.com/fedify-dev/hollo/pull/466
157198
[#467]: https://github.com/fedify-dev/hollo/pull/467
158199
[#479]: https://github.com/fedify-dev/hollo/issues/479
200+
[#481]: https://github.com/fedify-dev/hollo/issues/481
159201
[#482]: https://github.com/fedify-dev/hollo/pull/482
160202

161203

compose-fs.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@ services:
1111
DRIVE_DISK: fs
1212
STORAGE_URL_BASE: http://localhost:3000/assets/
1313
FS_STORAGE_PATH: /var/lib/hollo
14+
# MEDIA_PROXY: off # or "proxy" / "cache" — see docs/install/env
15+
# REMOTE_MEDIA_THUMBNAILS: "on" # set to "off" to skip local thumbnails
1416
depends_on:
1517
- postgres
1618
volumes:

compose.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,8 @@ services:
1616
S3_FORCE_PATH_STYLE: "true"
1717
AWS_ACCESS_KEY_ID: minioadmin
1818
AWS_SECRET_ACCESS_KEY: minioadmin
19+
# MEDIA_PROXY: off # or "proxy" / "cache" — see docs/install/env
20+
# REMOTE_MEDIA_THUMBNAILS: "on" # set to "off" to skip local thumbnails
1921
depends_on:
2022
- postgres
2123
- minio

docs/src/content/docs/install/env.mdx

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -143,6 +143,51 @@ duplicate jobs for the same replies collection.
143143

144144
`300` by default.
145145

146+
### `MEDIA_PROXY` <Badge text="Optional" />
147+
148+
Controls how Hollo serves media that lives on remote servers (avatars,
149+
headers, attachments, custom emojis, preview-card images). Valid values
150+
are:
151+
152+
- `off` (default): the Mastodon API and web UI hand the original remote
153+
URL to clients, matching the historical behaviour.
154+
- `proxy`: every remote media URL is rewritten to a signed
155+
`/proxy/<sig>/<b64url>` path served by Hollo itself. Hollo streams
156+
the upstream bytes through on each request and does not write them
157+
to disk. Clients see only the Hollo origin, sidestepping remote
158+
CORS configuration and leaks of the visitor's IP address.
159+
- `cache`: same URL rewriting as `proxy`, but the streamed body is
160+
persisted to the configured storage backend as `proxy/<sha256>.bin`
161+
alongside a content-type sidecar at `proxy/<sha256>.json`.
162+
Subsequent requests skip the upstream fetch. The admin dashboard
163+
at */thumbnail_cleanup* can purge the cache on demand.
164+
165+
The boolean synonyms `true` / `on` / `1` are accepted as aliases for
166+
`proxy`, and `false` / `off` / `0` as aliases for `off`. Disk caching
167+
must be requested explicitly with `cache`.
168+
169+
In `proxy` and `cache` modes, Hollo refuses non-HTTP(S) schemes, runs
170+
SSRF checks on each upstream URL and every redirect target, enforces a
171+
32 MiB body cap, and never proxies image/svg+xml — SVG could carry
172+
inline scripts that execute under the Hollo origin.
173+
174+
`off` by default.
175+
176+
### `REMOTE_MEDIA_THUMBNAILS` <Badge text="Optional" />
177+
178+
Controls whether Hollo downloads remote media attachments to generate a
179+
local WebP thumbnail when it ingests a post. Accepts `on` / `true` /
180+
`1` (the historical behaviour) or `off` / `false` / `0`.
181+
182+
When set to `off`, Hollo skips the upstream fetch and Sharp pipeline
183+
entirely for incoming attachments, storing the remote URL itself as the
184+
thumbnail URL. Combined with `MEDIA_PROXY=proxy` or `cache`, clients
185+
still see a same-origin URL at render time; with `MEDIA_PROXY=off`,
186+
they receive the upstream URL directly. This frees up significant
187+
disk space on instances that ingest many media-heavy posts.
188+
189+
`on` by default.
190+
146191
### `REMOTE_ACTOR_STALENESS_DAYS` <Badge text="Optional" />
147192

148193
The number of days after which a remote actor's cached data is considered stale.

docs/src/content/docs/ja/install/env.mdx

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -141,6 +141,55 @@ HolloがL7ロードバランサーの後ろにある場合(通常はそうす
141141

142142
デフォルトは`300`です。
143143

144+
### `MEDIA_PROXY` <Badge text="オプション" />
145+
146+
リモートサーバー上のメディア(アバター、ヘッダー、添付メディア、
147+
カスタム絵文字、プレビューカードの画像)をHolloがクライアントに対して
148+
どう配信するかを制御します。指定可能な値:
149+
150+
- `off`(デフォルト):MastodonクライアントAPIとウェブUIは
151+
リモートのURLをそのままクライアントに渡します。これまでと同じ
152+
動作です。
153+
- `proxy`:すべてのリモートメディアURLを署名付きの
154+
`/proxy/<sig>/<b64url>` パスに書き換えます。Holloはリクエストごとに
155+
アップストリームのバイト列をストリーミングして応答し、ディスクには
156+
書き込みません。クライアントからはHolloのオリジンしか見えないため、
157+
リモートサーバーのCORS設定の影響を受けず、訪問者のIPアドレスも
158+
外部に漏れません。
159+
- `cache`:URLの書き換えは`proxy`と同じですが、ストリーミングした
160+
本文を設定済みのストレージバックエンドの`proxy/<sha256>.bin`
161+
保存し、Content-Type情報を持つサイドカーを`proxy/<sha256>.json`
162+
一緒に保存します。以降のリクエストはアップストリームを再取得
163+
しません。管理ダッシュボードの */thumbnail_cleanup* から必要に
164+
応じてキャッシュを消去できます。
165+
166+
真偽値の同義語として`true` / `on` / `1``proxy`の別名、
167+
`false` / `off` / `0``off`の別名として受け付けます。ディスク
168+
キャッシュは`cache`で明示的に有効化する必要があります。
169+
170+
`proxy``cache`モードでは、HolloはHTTP(S)以外のスキームをプロキシ
171+
しません。アップストリームのURLとすべてのリダイレクト先に対してSSRF
172+
チェックを行い、本文は32 MiBに制限し、image/svg+xmlはHolloのオリジン
173+
上で実行され得るインラインスクリプトを含むため一切プロキシしません。
174+
175+
デフォルトは`off`です。
176+
177+
### `REMOTE_MEDIA_THUMBNAILS` <Badge text="オプション" />
178+
179+
投稿を取り込む際に、Holloがリモートの添付メディアをダウンロードして
180+
ローカルのWebPサムネイルを生成するかを制御します。値は`on` / `true` /
181+
`1`(これまでの動作)または`off` / `false` / `0`です。
182+
183+
`off`に設定すると、Holloは受信した添付に対してアップストリームの取得
184+
とSharpパイプラインを完全にスキップし、リモートURLをそのまま
185+
サムネイルURLとして保存します。`MEDIA_PROXY=proxy`または`cache`
186+
組み合わせれば、クライアントはレンダリング時に依然として同一オリジンの
187+
URLを受け取り、`MEDIA_PROXY=off`の場合はアップストリームのURLを直接
188+
受け取ります。多数のメディア投稿を取り込むインスタンスでは、大量の
189+
ディスク容量を節約できます。
190+
191+
デフォルトは`on`です。
192+
144193
### `REMOTE_ACTOR_STALENESS_DAYS` <Badge text="オプション" />
145194

146195
リモートアクターのキャッシュされたデータが古いと見なされるまでの日数。

docs/src/content/docs/ko/install/env.mdx

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -139,6 +139,51 @@ Hollo가 L7 로드 밸런서 뒤에 위치할 경우 (일반적으로 그래야
139139

140140
기본값은 `300`입니다.
141141

142+
### `MEDIA_PROXY` <Badge text="선택" />
143+
144+
다른 서버에 있는 미디어(아바타, 헤더, 첨부 미디어, 커스텀 이모지,
145+
미리보기 카드 이미지)를 Hollo가 어떻게 클라이언트에게 전달할지
146+
결정합니다. 지정 가능한 값:
147+
148+
- `off` (기본값): Mastodon API와 웹 UI가 원격 URL을 그대로 전달합니다.
149+
이전과 동일한 동작입니다.
150+
- `proxy`: 모든 원격 미디어 URL을 서명된 `/proxy/<sig>/<b64url>`
151+
경로로 재작성합니다. Hollo가 매 요청마다 원본을 스트리밍해 응답하며
152+
디스크에는 저장하지 않습니다. 클라이언트는 Hollo 도메인만 보게 되어
153+
원격 서버의 CORS 설정에 영향을 받지 않고, 방문자 IP도 외부로
154+
노출되지 않습니다.
155+
- `cache`: `proxy`와 동일한 URL 재작성에 더해, 스트리밍한 본문을 설정된
156+
저장소 백엔드의 `proxy/<sha256>.bin`에 저장하고, 콘텐츠 타입 정보를
157+
담은 사이드카 파일을 `proxy/<sha256>.json`에 함께 저장합니다.
158+
이후 요청은 원본을 다시 가져오지 않습니다. */thumbnail_cleanup*
159+
관리자 페이지에서 필요할 때 캐시를 비울 수 있습니다.
160+
161+
불리언 동의어로 `true` / `on` / `1``proxy`의 별칭으로,
162+
`false` / `off` / `0``off`의 별칭으로 받아들입니다. 디스크 캐싱은
163+
반드시 `cache`로 명시적으로 요청해야 합니다.
164+
165+
`proxy``cache` 모드에서 Hollo는 HTTP(S)가 아닌 스킴은 프록시하지
166+
않고, 원본 URL과 모든 리다이렉트 대상에 대해 SSRF 검사를 수행하며,
167+
본문 크기를 32 MiB로 제한하고, image/svg+xml은 Hollo 도메인에서
168+
실행되는 인라인 스크립트를 포함할 수 있어 절대 프록시하지 않습니다.
169+
170+
기본값은 `off`입니다.
171+
172+
### `REMOTE_MEDIA_THUMBNAILS` <Badge text="선택" />
173+
174+
게시물을 수신할 때 Hollo가 원격 첨부 미디어를 내려받아 로컬 WebP
175+
썸네일을 생성할지 결정합니다. 값은 `on` / `true` / `1`(이전과 동일한
176+
동작) 또는 `off` / `false` / `0`입니다.
177+
178+
`off`로 설정하면 Hollo는 들어오는 첨부에 대해 원본 다운로드와 Sharp
179+
파이프라인을 건너뛰고, 원격 URL을 그대로 썸네일 URL로 저장합니다.
180+
`MEDIA_PROXY=proxy` 또는 `cache`와 함께 쓰면 클라이언트는 렌더 시
181+
여전히 같은 출처의 URL을 보게 되며, `MEDIA_PROXY=off`일 때는 원본
182+
URL을 그대로 받습니다. 미디어가 많은 게시물을 자주 받는 인스턴스에서
183+
디스크 공간을 크게 절약할 수 있습니다.
184+
185+
기본값은 `on`입니다.
186+
142187
### `REMOTE_ACTOR_STALENESS_DAYS` <Badge text="선택" />
143188

144189
원격 액터의 캐시된 데이터가 오래된 것으로 간주되기까지의 일수.

docs/src/content/docs/zh-cn/install/env.mdx

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -129,6 +129,46 @@ openssl rand -hex 32
129129

130130
默认为`300`
131131

132+
### `MEDIA_PROXY` <Badge text="可选" />
133+
134+
控制 Hollo 如何向客户端提供位于远程服务器的媒体(头像、横幅图、附件、
135+
自定义表情、链接预览卡片图片)。可用值:
136+
137+
- `off`(默认):Mastodon API 和 Web UI 将远程 URL 原样交给客户端,
138+
与历史行为一致。
139+
- `proxy`:将所有远程媒体 URL 改写为带签名的
140+
`/proxy/<sig>/<b64url>` 路径,由 Hollo 自己提供。Hollo 在每次请求时
141+
将上游字节流式转发,不写入磁盘。客户端只看到 Hollo 自身的源,
142+
避免远程 CORS 配置的影响,也避免访问者的 IP 被泄露。
143+
- `cache`:URL 改写与 `proxy` 相同,但会把流式获取的响应主体保存到
144+
所配置存储后端的 `proxy/<sha256>.bin`,并把记录内容类型的旁路文件
145+
保存到 `proxy/<sha256>.json`。后续请求会跳过上游请求。管理面板的
146+
*/thumbnail_cleanup* 页面可以按需清空缓存。
147+
148+
布尔同义值:`true` / `on` / `1` 作为 `proxy` 的别名,
149+
`false` / `off` / `0` 作为 `off` 的别名。磁盘缓存必须用 `cache` 显式
150+
开启。
151+
152+
`proxy``cache` 模式下,Hollo 会拒绝代理非 HTTP(S) 协议,对上游
153+
URL 和每一次重定向目标执行 SSRF 检查,将响应主体限制为 32 MiB,并且
154+
绝不代理 image/svg+xml——SVG 可能包含会在 Hollo 自身源上执行的内联
155+
脚本。
156+
157+
默认为`off`
158+
159+
### `REMOTE_MEDIA_THUMBNAILS` <Badge text="可选" />
160+
161+
控制 Hollo 接收帖子时是否下载远程附件媒体以生成本地 WebP 缩略图。
162+
可用值为 `on` / `true` / `1`(历史行为)或 `off` / `false` / `0`
163+
164+
设置为 `off` 时,Hollo 会对收到的附件完全跳过上游下载和 Sharp 流程,
165+
直接将远程 URL 作为缩略图 URL 存储。配合 `MEDIA_PROXY=proxy`
166+
`cache` 使用时,客户端在渲染时仍然得到同源 URL;若 `MEDIA_PROXY=off`
167+
客户端会直接收到上游 URL。在接收大量媒体型帖子的实例上可以显著
168+
节省磁盘空间。
169+
170+
默认为`on`
171+
132172
### `REMOTE_ACTOR_STALENESS_DAYS` <Badge text="可选" />
133173

134174
远程用户的缓存数据被视为过期的天数。

src/api/v1/index.ts

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ import { db } from "../../db";
88
import { serializeAccount } from "../../entities/account";
99
import { getPostRelations, serializePost } from "../../entities/status";
1010
import { serializeTag } from "../../entities/tag";
11+
import { proxyUrl } from "../../media-proxy";
1112
import {
1213
scopeRequired,
1314
tokenRequired,
@@ -73,14 +74,21 @@ app.get(
7374

7475
app.get("/custom_emojis", async (c) => {
7576
const emojis = await db.query.customEmojis.findMany();
77+
const baseUrl = c.req.url;
7678
return c.json(
77-
emojis.map((emoji) => ({
78-
shortcode: emoji.shortcode,
79-
url: emoji.url,
80-
static_url: emoji.url,
81-
visible_in_picker: true,
82-
category: emoji.category,
83-
})),
79+
emojis.flatMap((emoji) => {
80+
const url = proxyUrl(emoji.url, baseUrl);
81+
if (url == null) return [];
82+
return [
83+
{
84+
shortcode: emoji.shortcode,
85+
url,
86+
static_url: url,
87+
visible_in_picker: true,
88+
category: emoji.category,
89+
},
90+
];
91+
}),
8492
);
8593
});
8694

src/api/v1/media.ts

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ export async function postMedia(
7575
if (result.length < 1) {
7676
return c.json({ error: "Failed to insert media" }, 500);
7777
}
78-
return c.json(serializeMedium(result[0]));
78+
return c.json(serializeMedium(result[0], c.req.url));
7979
}
8080

8181
app.post(
@@ -93,7 +93,7 @@ app.get("/:id", async (c) => {
9393
where: eq(media.id, mediumId),
9494
});
9595
if (medium == null) return c.json({ error: "Not found" }, 404);
96-
return c.json(serializeMedium(medium));
96+
return c.json(serializeMedium(medium, c.req.url));
9797
});
9898

9999
app.put("/:id", tokenRequired, scopeRequired(["write:media"]), async (c) => {
@@ -116,7 +116,7 @@ app.put("/:id", tokenRequired, scopeRequired(["write:media"]), async (c) => {
116116
.where(eq(media.id, mediumId))
117117
.returning();
118118
if (result.length < 1) return c.json({ error: "Not found" }, 404);
119-
return c.json(serializeMedium(result[0]));
119+
return c.json(serializeMedium(result[0], c.req.url));
120120
});
121121

122122
export default app;

0 commit comments

Comments
 (0)