You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Want a full app example? See [`sample/`](./sample), which includes:
48
+
49
+
- A polished Next.js UI
50
+
- Local API testing with your machine's network egress
51
+
- A Dockerized Hono API deployed through Cloudflare Containers
52
+
- A server-side token-protected proxy so the container API is not publicly open
53
+
19
54
## Installation
20
55
21
56
```sh
22
57
npm install youtube-caption-extractor
23
58
```
24
59
25
-
Requires **Node.js ≥ 18** (uses the global `fetch` API). Works in Node.js,
26
-
Bun, Deno, Cloudflare Workers, and any other modern JavaScript runtime that
27
-
provides `fetch`. See [Deployment notes](#deployment-notes) for tips on
28
-
keeping calls reliable from your runtime of choice.
60
+
Requires **Node.js ≥ 18** when running on Node.js because the library uses the
61
+
global `fetch` API. It also works in Bun, Deno, Cloudflare Workers, and other
62
+
modern JavaScript runtimes that provide `fetch`. See
63
+
[Deployment notes](#deployment-notes) for tips on keeping calls reliable from
64
+
your runtime of choice.
29
65
30
66
## API
31
67
@@ -132,7 +168,35 @@ try {
132
168
133
169
## Deployment notes
134
170
135
-
The library calls YouTube directly, so real-world reliability can depend on the network egress of your deployment. Local development and self-hosted setups tend to work out of the box. Some serverless and edge environments share IP ranges that see broader traffic patterns and may occasionally rate-limit, so for production workloads it's worth combining the patterns below.
171
+
The library calls YouTube directly, so reliability depends partly on the network
172
+
egress of the process making the request.
173
+
174
+
Local development and self-hosted servers tend to work out of the box. Shared
175
+
serverless, container, and edge IP ranges can sometimes be rate-limited or gated
176
+
by YouTube's bot checks. That is not a library API issue; it is an egress
177
+
reputation issue. For production, use the patterns below.
178
+
179
+
### Recommended app architecture
180
+
181
+
Keep YouTube extraction server-side. Do not call YouTube directly from browser
182
+
code.
183
+
184
+
```txt
185
+
Browser → your app API route → youtube-caption-extractor → YouTube
186
+
```
187
+
188
+
If you use a separate API service, protect it with a server-side token:
189
+
190
+
```txt
191
+
Browser → your app API route → token-protected caption API → YouTube
192
+
```
193
+
194
+
The included [`sample/`](./sample) demonstrates this pattern with:
195
+
196
+
- Next.js API routes as the public browser-facing API
197
+
- A Cloudflare Worker that rejects requests without `Authorization: Bearer <token>`
198
+
- A Cloudflare Container running a Hono/Node API
199
+
-`CAPTION_API_TOKEN` kept server-side only, never in `NEXT_PUBLIC_*`
136
200
137
201
### Building resilient calls
138
202
@@ -191,6 +255,17 @@ Common uses for a custom `fetch`:
191
255
-**Authenticated proxies** — add `Authorization` headers via a wrapper
192
256
-**Regional routing** — direct outbound traffic through a specific region or provider
193
257
258
+
### Local vs hosted behavior
259
+
260
+
If extraction works locally but fails in a hosted environment with a message like
261
+
`LOGIN_REQUIRED` or "Sign in to confirm you're not a bot", the hosted provider's
262
+
egress IP is likely being challenged by YouTube. Your options are:
263
+
264
+
1. Run the extraction API somewhere with reliable egress for your workload.
265
+
2. Use the `fetch` option to route outbound YouTube requests through a trusted proxy.
266
+
3. Cache successful results aggressively so fewer requests reach YouTube.
267
+
4. Treat these failures as transient and retry with backoff where appropriate.
Copy file name to clipboardExpand all lines: sample/README.md
+59-16Lines changed: 59 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,28 +8,71 @@ This is a Next.js project demonstrating the use of the `youtube-caption-extracto
8
8
- Retrieve video details including title and description
9
9
- Support for multiple languages
10
10
11
-
## Getting Started
11
+
## Local development
12
12
13
-
First, install the dependencies:
14
-
15
-
````bash
13
+
```bash
16
14
npm install
17
-
# or
18
-
yarn install
19
-
# or
20
-
pnpm install
15
+
npm run dev
16
+
```
21
17
22
-
Then, run the development server:
18
+
Open [http://localhost:3000](http://localhost:3000). By default, the UI calls the local Next.js API routes.
19
+
20
+
The API app lives in `server/` and can be run separately from Next.js.
21
+
22
+
For the fastest local loop, run the Hono API directly and point the Next.js API routes at it:
23
23
24
24
```bash
25
-
npm run dev
26
-
# or
27
-
yarn dev
28
-
# or
29
-
pnpm dev
30
-
````
25
+
npm --prefix server install
26
+
npm run api:dev
27
+
CAPTION_API_BASE_URL=http://localhost:8080 npm run dev
28
+
```
29
+
30
+
To test the full local Worker → Cloudflare Container path, run Wrangler with Docker and point the Next.js API routes at Wrangler:
31
+
32
+
```bash
33
+
npm --prefix server install
34
+
npm run cf:dev
35
+
CAPTION_API_BASE_URL=http://localhost:8787 CAPTION_API_TOKEN=<token> npm run dev
36
+
```
37
+
38
+
Cloudflare dashboard "live instances" only reflects deployed Cloudflare traffic. It does not change when testing the local Hono API or local Wrangler container.
39
+
40
+
## Cloudflare container API
41
+
42
+
This sample includes a self-contained Cloudflare Containers app in `server/`. It runs the API as a Dockerized Hono Node server and proxies requests through a Hono Worker.
43
+
44
+
```bash
45
+
npm --prefix server install
46
+
npm run cf:deploy
47
+
```
48
+
49
+
After deploy, set Vercel/Next.js server-side environment variables to the Worker URL and shared API token:
Do not put `CAPTION_API_TOKEN` in a `NEXT_PUBLIC_*` variable. The browser calls the Next.js API routes, and the Next.js server attaches the token when it calls the Worker.
57
+
58
+
The container endpoint supports:
59
+
60
+
-`GET /health`
61
+
-`GET /api/subtitles?videoID=<id>&lang=en`
62
+
-`GET /api/videoDetails?videoID=<id>&lang=en`
63
+
64
+
Optional runtime environment variables:
65
+
66
+
-`CAPTION_API_TOKEN` — shared bearer token required by the Worker before it proxies to the container.
67
+
-`OUTBOUND_PROXY_URL` — routes YouTube requests through an HTTP(S) proxy via `undici`.
68
+
-`CACHE_TTL_SECONDS` — controls the warm in-memory response cache, default `21600`.
69
+
-`ALLOWED_ORIGINS` — comma-separated browser origins for CORS, default `*`.
70
+
-`CONTAINER_VERSION` — version prefix for container instance names; bump it to force fresh instances during rollouts.
71
+
72
+
From `server/`, use `npx wrangler secret put CAPTION_API_TOKEN` to set the shared token on Cloudflare.
73
+
From `server/`, use `npx wrangler secret put OUTBOUND_PROXY_URL` if the proxy URL contains credentials.
31
74
32
-
Open [http://localhost:3000](http://localhost:3000) with your browser to see the result.
75
+
If the API returns `youtube_blocked_datacenter_ip`, the request reached the Cloudflare container but YouTube blocked the container's outbound datacenter IP. Use direct local API testing (`http://localhost:8080`) for local machine egress, or configure `OUTBOUND_PROXY_URL`with a trusted proxy for deployed Cloudflare container testing.
Copy file name to clipboardExpand all lines: sample/app/api/_lib/handleError.ts
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -21,7 +21,7 @@ export function handleApiError(error: unknown): NextResponse {
21
21
{
22
22
code: 'youtube_blocked_datacenter_ip',
23
23
message:
24
-
'YouTube is blocking this server. Most cloud hosts (Vercel, AWS Lambda, Cloudflare Workers) share IP ranges that YouTube gates with a bot challenge — no client-side fix can bypass it. The library works on residential IPs: run the demo locally to see it in action, or wire up a residential proxy via the `fetch` option.',
24
+
'YouTube is blocking this server egress. Cloud/container hosts often use shared datacenter IP ranges that YouTube gates with a bot challenge. If this persists on Cloudflare Containers, route outbound YouTube requests through the library `fetch` option using a trusted proxy.',
0 commit comments