Skip to content

Commit a99da80

Browse files
committed
Add Cloudflare container API sample
1 parent a9a38a4 commit a99da80

18 files changed

Lines changed: 2213 additions & 29 deletions

.gitignore

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,15 @@ dist
33

44
*.log
55
.DS_Store
6+
.env*.local
67

78

89
/sample/.next/
910
/sample/out/
1011
/sample/build/
1112
/sample/node_modules/
13+
/sample/server/node_modules/
14+
/sample/server/.wrangler/
15+
/sample/server/.dev.vars*
1216

13-
/docs
17+
/docs

README.md

Lines changed: 83 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,15 @@
11
# youtube-caption-extractor
22

3-
A small, dependency-light library that extracts the transcript (and basic
4-
metadata) from any public YouTube video. Works with both manual captions and
5-
YouTube's auto-generated subtitles, in any language the video has tracks for.
3+
Extract readable transcripts and basic metadata from public YouTube videos with
4+
one small TypeScript library. It works with manual captions and YouTube's
5+
auto-generated captions, and it can run in Node.js, serverless functions, edge
6+
runtimes, or a containerized API.
7+
8+
- **Simple API**`getSubtitles()` and `getVideoDetails()`
9+
- **Typed output** — ships first-party TypeScript definitions
10+
- **Runtime-friendly** — uses global `fetch`, with an optional custom transport
11+
- **Production-aware** — supports retries, caching, and proxy/custom egress
12+
- **Demo included** — a Next.js sample app plus a Cloudflare Container API
613

714
```ts
815
import { getSubtitles } from 'youtube-caption-extractor';
@@ -16,16 +23,45 @@ const subtitles = await getSubtitles({ videoID: '7GeFt8suV8E', lang: 'en' });
1623
// ]
1724
```
1825

26+
## Try it quickly
27+
28+
```sh
29+
npm install youtube-caption-extractor
30+
```
31+
32+
```ts
33+
import { getVideoDetails } from 'youtube-caption-extractor';
34+
35+
const video = await getVideoDetails({
36+
videoID: '7GeFt8suV8E',
37+
lang: 'en',
38+
});
39+
40+
console.log(video.title);
41+
console.log(video.subtitles.map((s) => s.text).join('\n'));
42+
```
43+
44+
Want to click around first? Try the hosted demo:
45+
[youtube-caption-extractor.vercel.app](https://youtube-caption-extractor.vercel.app/).
46+
47+
Want a full app example? See [`sample/`](./sample), which includes:
48+
49+
- A polished Next.js UI
50+
- Local API testing with your machine's network egress
51+
- A Dockerized Hono API deployed through Cloudflare Containers
52+
- A server-side token-protected proxy so the container API is not publicly open
53+
1954
## Installation
2055

2156
```sh
2257
npm install youtube-caption-extractor
2358
```
2459

25-
Requires **Node.js ≥ 18** (uses the global `fetch` API). Works in Node.js,
26-
Bun, Deno, Cloudflare Workers, and any other modern JavaScript runtime that
27-
provides `fetch`. See [Deployment notes](#deployment-notes) for tips on
28-
keeping calls reliable from your runtime of choice.
60+
Requires **Node.js ≥ 18** when running on Node.js because the library uses the
61+
global `fetch` API. It also works in Bun, Deno, Cloudflare Workers, and other
62+
modern JavaScript runtimes that provide `fetch`. See
63+
[Deployment notes](#deployment-notes) for tips on keeping calls reliable from
64+
your runtime of choice.
2965

3066
## API
3167

@@ -132,7 +168,35 @@ try {
132168

133169
## Deployment notes
134170

135-
The library calls YouTube directly, so real-world reliability can depend on the network egress of your deployment. Local development and self-hosted setups tend to work out of the box. Some serverless and edge environments share IP ranges that see broader traffic patterns and may occasionally rate-limit, so for production workloads it's worth combining the patterns below.
171+
The library calls YouTube directly, so reliability depends partly on the network
172+
egress of the process making the request.
173+
174+
Local development and self-hosted servers tend to work out of the box. Shared
175+
serverless, container, and edge IP ranges can sometimes be rate-limited or gated
176+
by YouTube's bot checks. That is not a library API issue; it is an egress
177+
reputation issue. For production, use the patterns below.
178+
179+
### Recommended app architecture
180+
181+
Keep YouTube extraction server-side. Do not call YouTube directly from browser
182+
code.
183+
184+
```txt
185+
Browser → your app API route → youtube-caption-extractor → YouTube
186+
```
187+
188+
If you use a separate API service, protect it with a server-side token:
189+
190+
```txt
191+
Browser → your app API route → token-protected caption API → YouTube
192+
```
193+
194+
The included [`sample/`](./sample) demonstrates this pattern with:
195+
196+
- Next.js API routes as the public browser-facing API
197+
- A Cloudflare Worker that rejects requests without `Authorization: Bearer <token>`
198+
- A Cloudflare Container running a Hono/Node API
199+
- `CAPTION_API_TOKEN` kept server-side only, never in `NEXT_PUBLIC_*`
136200

137201
### Building resilient calls
138202

@@ -191,6 +255,17 @@ Common uses for a custom `fetch`:
191255
- **Authenticated proxies** — add `Authorization` headers via a wrapper
192256
- **Regional routing** — direct outbound traffic through a specific region or provider
193257

258+
### Local vs hosted behavior
259+
260+
If extraction works locally but fails in a hosted environment with a message like
261+
`LOGIN_REQUIRED` or "Sign in to confirm you're not a bot", the hosted provider's
262+
egress IP is likely being challenged by YouTube. Your options are:
263+
264+
1. Run the extraction API somewhere with reliable egress for your workload.
265+
2. Use the `fetch` option to route outbound YouTube requests through a trusted proxy.
266+
3. Cache successful results aggressively so fewer requests reach YouTube.
267+
4. Treat these failures as transient and retry with backoff where appropriate.
268+
194269
## Usage examples
195270

196271
### Next.js (App Router)

sample/.env.example

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# Browser requests go to the Next.js API routes by default. The Next.js
2+
# server can then proxy to a local API, local Wrangler, or deployed Worker.
3+
CAPTION_API_BASE_URL=http://localhost:8080
4+
CAPTION_API_TOKEN=
5+
6+
# Local Wrangler Worker/container:
7+
# CAPTION_API_BASE_URL=http://localhost:8787
8+
9+
# Deployed Cloudflare Worker/container:
10+
# CAPTION_API_BASE_URL=https://youtube-caption-extractor-api.thinktank-himanshu.workers.dev

sample/README.md

Lines changed: 59 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -8,28 +8,71 @@ This is a Next.js project demonstrating the use of the `youtube-caption-extracto
88
- Retrieve video details including title and description
99
- Support for multiple languages
1010

11-
## Getting Started
11+
## Local development
1212

13-
First, install the dependencies:
14-
15-
````bash
13+
```bash
1614
npm install
17-
# or
18-
yarn install
19-
# or
20-
pnpm install
15+
npm run dev
16+
```
2117

22-
Then, run the development server:
18+
Open [http://localhost:3000](http://localhost:3000). By default, the UI calls the local Next.js API routes.
19+
20+
The API app lives in `server/` and can be run separately from Next.js.
21+
22+
For the fastest local loop, run the Hono API directly and point the Next.js API routes at it:
2323

2424
```bash
25-
npm run dev
26-
# or
27-
yarn dev
28-
# or
29-
pnpm dev
30-
````
25+
npm --prefix server install
26+
npm run api:dev
27+
CAPTION_API_BASE_URL=http://localhost:8080 npm run dev
28+
```
29+
30+
To test the full local Worker → Cloudflare Container path, run Wrangler with Docker and point the Next.js API routes at Wrangler:
31+
32+
```bash
33+
npm --prefix server install
34+
npm run cf:dev
35+
CAPTION_API_BASE_URL=http://localhost:8787 CAPTION_API_TOKEN=<token> npm run dev
36+
```
37+
38+
Cloudflare dashboard "live instances" only reflects deployed Cloudflare traffic. It does not change when testing the local Hono API or local Wrangler container.
39+
40+
## Cloudflare container API
41+
42+
This sample includes a self-contained Cloudflare Containers app in `server/`. It runs the API as a Dockerized Hono Node server and proxies requests through a Hono Worker.
43+
44+
```bash
45+
npm --prefix server install
46+
npm run cf:deploy
47+
```
48+
49+
After deploy, set Vercel/Next.js server-side environment variables to the Worker URL and shared API token:
50+
51+
```bash
52+
CAPTION_API_BASE_URL=https://<your-worker>.<your-subdomain>.workers.dev
53+
CAPTION_API_TOKEN=<same-token-configured-on-the-worker>
54+
```
55+
56+
Do not put `CAPTION_API_TOKEN` in a `NEXT_PUBLIC_*` variable. The browser calls the Next.js API routes, and the Next.js server attaches the token when it calls the Worker.
57+
58+
The container endpoint supports:
59+
60+
- `GET /health`
61+
- `GET /api/subtitles?videoID=<id>&lang=en`
62+
- `GET /api/videoDetails?videoID=<id>&lang=en`
63+
64+
Optional runtime environment variables:
65+
66+
- `CAPTION_API_TOKEN` — shared bearer token required by the Worker before it proxies to the container.
67+
- `OUTBOUND_PROXY_URL` — routes YouTube requests through an HTTP(S) proxy via `undici`.
68+
- `CACHE_TTL_SECONDS` — controls the warm in-memory response cache, default `21600`.
69+
- `ALLOWED_ORIGINS` — comma-separated browser origins for CORS, default `*`.
70+
- `CONTAINER_VERSION` — version prefix for container instance names; bump it to force fresh instances during rollouts.
71+
72+
From `server/`, use `npx wrangler secret put CAPTION_API_TOKEN` to set the shared token on Cloudflare.
73+
From `server/`, use `npx wrangler secret put OUTBOUND_PROXY_URL` if the proxy URL contains credentials.
3174

32-
Open [http://localhost:3000](http://localhost:3000) with your browser to see the result.
75+
If the API returns `youtube_blocked_datacenter_ip`, the request reached the Cloudflare container but YouTube blocked the container's outbound datacenter IP. Use direct local API testing (`http://localhost:8080`) for local machine egress, or configure `OUTBOUND_PROXY_URL` with a trusted proxy for deployed Cloudflare container testing.
3376

3477
## Usage
3578

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
import { NextResponse } from 'next/server';
2+
3+
const PROXY_HEADERS = ['content-type', 'cache-control', 'x-cache'];
4+
5+
function captionApiBaseUrl(): string {
6+
return (process.env.CAPTION_API_BASE_URL || '').replace(/\/+$/, '');
7+
}
8+
9+
function captionApiToken(): string {
10+
return process.env.CAPTION_API_TOKEN || '';
11+
}
12+
13+
export async function proxyCaptionApi(
14+
path: string,
15+
searchParams: URLSearchParams
16+
): Promise<NextResponse | null> {
17+
const baseUrl = captionApiBaseUrl();
18+
if (!baseUrl) return null;
19+
20+
const upstreamUrl = `${baseUrl}${path}?${searchParams.toString()}`;
21+
const token = captionApiToken();
22+
const headers = new Headers();
23+
24+
if (token) {
25+
headers.set('Authorization', `Bearer ${token}`);
26+
}
27+
28+
try {
29+
const upstream = await fetch(upstreamUrl, {
30+
cache: 'no-store',
31+
headers,
32+
});
33+
const responseHeaders = new Headers();
34+
35+
for (const header of PROXY_HEADERS) {
36+
const value = upstream.headers.get(header);
37+
if (value) responseHeaders.set(header, value);
38+
}
39+
40+
return new NextResponse(await upstream.text(), {
41+
headers: responseHeaders,
42+
status: upstream.status,
43+
});
44+
} catch (error) {
45+
return NextResponse.json(
46+
{
47+
code: 'caption_api_unreachable',
48+
message:
49+
error instanceof Error ? error.message : 'Caption API unreachable',
50+
},
51+
{ status: 502 }
52+
);
53+
}
54+
}

sample/app/api/_lib/handleError.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ export function handleApiError(error: unknown): NextResponse {
2121
{
2222
code: 'youtube_blocked_datacenter_ip',
2323
message:
24-
'YouTube is blocking this server. Most cloud hosts (Vercel, AWS Lambda, Cloudflare Workers) share IP ranges that YouTube gates with a bot challenge — no client-side fix can bypass it. The library works on residential IPs: run the demo locally to see it in action, or wire up a residential proxy via the `fetch` option.',
24+
'YouTube is blocking this server egress. Cloud/container hosts often use shared datacenter IP ranges that YouTube gates with a bot challenge. If this persists on Cloudflare Containers, route outbound YouTube requests through the library `fetch` option using a trusted proxy.',
2525
debug: message,
2626
},
2727
{ status: 503 }

sample/app/api/subtitles/route.ts

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
import { getSubtitles } from 'youtube-caption-extractor';
22
import { NextResponse, type NextRequest } from 'next/server';
3+
import { proxyCaptionApi } from '../_lib/captionApiProxy';
34
import { handleApiError } from '../_lib/handleError';
45

56
export async function GET(request: NextRequest) {
@@ -14,6 +15,9 @@ export async function GET(request: NextRequest) {
1415
);
1516
}
1617

18+
const proxied = await proxyCaptionApi('/api/subtitles', searchParams);
19+
if (proxied) return proxied;
20+
1721
try {
1822
const subtitles = await getSubtitles({ videoID, lang });
1923
return NextResponse.json({ subtitles }, { status: 200 });

sample/app/api/videoDetails/route.ts

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
import { getVideoDetails } from 'youtube-caption-extractor';
22
import { NextResponse, type NextRequest } from 'next/server';
3+
import { proxyCaptionApi } from '../_lib/captionApiProxy';
34
import { handleApiError } from '../_lib/handleError';
45

56
export async function GET(request: NextRequest) {
@@ -14,6 +15,9 @@ export async function GET(request: NextRequest) {
1415
);
1516
}
1617

18+
const proxied = await proxyCaptionApi('/api/videoDetails', searchParams);
19+
if (proxied) return proxied;
20+
1721
try {
1822
const videoDetails = await getVideoDetails({ videoID, lang });
1923
return NextResponse.json({ videoDetails }, { status: 200 });

sample/app/page.tsx

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,17 @@ const LANGUAGES = [
8686
['zh', 'Chinese'],
8787
] as const;
8888

89-
const SAMPLE_IDS = ['7GeFt8suV8E', 'jNQXAC9IVRw'];
89+
const SAMPLE_IDS = [
90+
'7GeFt8suV8E',
91+
'jNQXAC9IVRw',
92+
'D37Ijn2o5U0',
93+
'g9JIUM0MHgQ',
94+
'6BB6exR8Zd8',
95+
];
96+
function apiUrl(path: string, params: URLSearchParams): string {
97+
const query = params.toString();
98+
return `${path}${query ? `?${query}` : ''}`;
99+
}
90100

91101
export default function HomePage() {
92102
const [input, setInput] = useState('');
@@ -150,9 +160,10 @@ export default function HomePage() {
150160
setError(null);
151161
setQuery('');
152162
try {
163+
const params = new URLSearchParams({ videoID: id, lang });
153164
const [subsRes, detailsRes] = await Promise.all([
154-
fetch(`/api/subtitles?videoID=${id}&lang=${lang}`),
155-
fetch(`/api/videoDetails?videoID=${id}&lang=${lang}`),
165+
fetch(apiUrl('/api/subtitles', params)),
166+
fetch(apiUrl('/api/videoDetails', params)),
156167
]);
157168
if (!subsRes.ok) {
158169
setError(await readApiError(subsRes));

0 commit comments

Comments
 (0)