Skip to content

Commit ab44d9f

Browse files
fix(xiaoyuzhou): migrate from broken SSR scraping to authenticated API (fixes #1023) (#1059)
* fix(xiaoyuzhou): migrate from broken SSR scraping to authenticated API (fixes #1023) Xiaoyuzhou removed SSR rendering — /podcast/<id> and /episode/<id> pages now return 404, breaking fetchPageProps() which scraped __NEXT_DATA__. Migrate podcast, podcast-episodes, episode, and download commands to use the existing authenticated API client (requestXiaoyuzhouJson) that transcript.js already uses successfully. Changes: - podcast.js: use /v1/podcast/get API endpoint - podcast-episodes.js: use /v1/podcast/listEpisode API endpoint - episode.js: use /v1/episode/get API endpoint - download.js: use /v1/episode/get API endpoint - utils.js: remove unused fetchPageProps, keep format helpers - Update all affected tests (download.test.js, utils.test.js) - Change strategy from PUBLIC to LOCAL (requires credentials) * fix(xiaoyuzhou): align local strategy contract * fix(xiaoyuzhou): align local api metadata --------- Co-authored-by: jackwener <jakevingoo@gmail.com>
1 parent 4ebada6 commit ab44d9f

17 files changed

Lines changed: 142 additions & 188 deletions

File tree

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -206,11 +206,11 @@ To load the source Browser Bridge extension:
206206
| **quark** | `ls` `mkdir` `mv` `rename` `rm` `save` `share-tree` |
207207
| **uiverse** | `code` `preview` |
208208
| **nowcoder** | `hot` `trending` `topics` `recommend` `creators` `companies` `jobs` `search` `suggest` `experience` `referral` `salary` `papers` `practice` `notifications` `detail` |
209-
| **xiaoyuzhou** | `podcast` `podcast-episodes` `episode` `download` `transcript*` |
209+
| **xiaoyuzhou** | `podcast*` `podcast-episodes*` `episode*` `download*` `transcript*` |
210210

211211
87+ adapters in total — **[→ see all supported sites & commands](./docs/adapters/index.md)**
212212

213-
`*` `opencli xiaoyuzhou transcript` requires local Xiaoyuzhou credentials in `~/.opencli/xiaoyuzhou.json`.
213+
`*` `opencli xiaoyuzhou podcast`, `podcast-episodes`, `episode`, `download`, and `transcript` require local Xiaoyuzhou credentials in `~/.opencli/xiaoyuzhou.json`.
214214

215215
## CLI Hub
216216

@@ -261,7 +261,7 @@ OpenCLI supports downloading images, videos, and articles from supported platfor
261261
| **douban** | Images | Poster / still image lists |
262262
| **pixiv** | Images | Original-quality illustrations, multi-page |
263263
| **1688** | Images, Videos | Downloads page-visible product media from item pages |
264-
| **xiaoyuzhou** | Audio, Transcript | Downloads episode audio from public pages and transcript JSON/text with local credentials |
264+
| **xiaoyuzhou** | Audio, Transcript | Downloads episode audio and transcript JSON/text with local credentials |
265265
| **zhihu** | Articles (Markdown) | Exports with optional image download |
266266
| **weixin** | Articles (Markdown) | WeChat Official Account articles |
267267

@@ -277,7 +277,7 @@ opencli xiaoyuzhou download 69b3b675772ac2295bfc01d0 --output ./xiaoyuzhou
277277
opencli xiaoyuzhou transcript 69dd0c98e2c8be31551f6a33 --output ./xiaoyuzhou-transcripts
278278
```
279279

280-
`opencli xiaoyuzhou transcript` requires local Xiaoyuzhou credentials in `~/.opencli/xiaoyuzhou.json`.
280+
`opencli xiaoyuzhou download` and `transcript` require local Xiaoyuzhou credentials in `~/.opencli/xiaoyuzhou.json`.
281281

282282
## Output Formats
283283

README.zh-CN.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -209,8 +209,7 @@ npm link
209209
| **uiverse** | `code` `preview` | 浏览器 |
210210
| **apple-podcasts** | `search` `episodes` `top` | 公开 |
211211
| **nowcoder** | `hot` `trending` `topics` `recommend` `creators` `companies` `jobs` `search` `suggest` `experience` `referral` `salary` `papers` `practice` `notifications` `detail` | 公开 / 浏览器 |
212-
| **xiaoyuzhou** | `podcast` `podcast-episodes` `episode` | 公开 |
213-
| **xiaoyuzhou** | `podcast` `podcast-episodes` `episode` `download` `transcript*` | 公开 |
212+
| **xiaoyuzhou** | `podcast*` `podcast-episodes*` `episode*` `download*` `transcript*` | 本地凭证 |
214213
| **zhihu** | `hot` `search` `question` `download` `follow` `like` `favorite` `comment` `answer` | 浏览器 |
215214
| **weixin** | `download` | 浏览器 |
216215
| **youtube** | `search` `video` `transcript` `comments` `channel` `playlist` `feed` `history` `watch-later` `subscriptions` `like` `unlike` `subscribe` `unsubscribe` | 浏览器 |
@@ -270,7 +269,7 @@ npm link
270269

271270
87+ 适配器 — **[→ 查看完整命令列表](./docs/adapters/index.md)**
272271

273-
`*` `opencli xiaoyuzhou transcript` 需要本地小宇宙凭证:`~/.opencli/xiaoyuzhou.json`
272+
`*` `opencli xiaoyuzhou podcast``podcast-episodes``episode``download``transcript` 需要本地小宇宙凭证:`~/.opencli/xiaoyuzhou.json`
274273

275274
### 外部 CLI 枢纽
276275

@@ -324,7 +323,7 @@ OpenCLI 支持从各平台下载图片、视频和文章。
324323
| **Twitter/X** | 图片、视频 | 从用户媒体页或单条推文下载 |
325324
| **Pixiv** | 图片 | 下载原始画质插画,支持多页作品 |
326325
| **1688** | 图片、视频 | 下载商品页中可见的商品素材 |
327-
| **小宇宙** | 音频、转录 | 从公开单集数据下载音频,并使用本地凭证下载转录 JSON / 文本 |
326+
| **小宇宙** | 音频、转录 | 使用本地凭证下载单集音频和转录 JSON / 文本 |
328327
| **知乎** | 文章(Markdown) | 导出文章,可选下载图片到本地 |
329328
| **微信公众号** | 文章(Markdown) | 导出微信公众号文章为 Markdown |
330329
| **豆瓣** | 图片 | 下载电影条目的海报 / 剧照图片 |
@@ -379,7 +378,7 @@ opencli zhihu download "https://zhuanlan.zhihu.com/p/xxx" --download-images
379378
opencli weixin download --url "https://mp.weixin.qq.com/s/xxx" --output ./weixin
380379
```
381380

382-
`opencli xiaoyuzhou transcript` 需要本地小宇宙凭证:`~/.opencli/xiaoyuzhou.json`
381+
`opencli xiaoyuzhou download``transcript` 需要本地小宇宙凭证:`~/.opencli/xiaoyuzhou.json`
383382

384383

385384

cli-manifest.json

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -17341,7 +17341,7 @@
1734117341
"name": "download",
1734217342
"description": "Download Xiaoyuzhou episode audio",
1734317343
"domain": "www.xiaoyuzhoufm.com",
17344-
"strategy": "public",
17344+
"strategy": "local",
1734517345
"browser": false,
1734617346
"args": [
1734717347
{
@@ -17424,7 +17424,7 @@
1742417424
"name": "episode",
1742517425
"description": "View details of a Xiaoyuzhou podcast episode",
1742617426
"domain": "www.xiaoyuzhoufm.com",
17427-
"strategy": "public",
17427+
"strategy": "local",
1742817428
"browser": false,
1742917429
"args": [
1743017430
{
@@ -17453,7 +17453,7 @@
1745317453
"name": "podcast",
1745417454
"description": "View a Xiaoyuzhou podcast profile",
1745517455
"domain": "www.xiaoyuzhoufm.com",
17456-
"strategy": "public",
17456+
"strategy": "local",
1745717457
"browser": false,
1745817458
"args": [
1745917459
{
@@ -17479,9 +17479,9 @@
1747917479
{
1748017480
"site": "xiaoyuzhou",
1748117481
"name": "podcast-episodes",
17482-
"description": "List recent episodes of a Xiaoyuzhou podcast (up to 15, SSR limit)",
17482+
"description": "List episodes of a Xiaoyuzhou podcast",
1748317483
"domain": "www.xiaoyuzhoufm.com",
17484-
"strategy": "public",
17484+
"strategy": "local",
1748517485
"browser": false,
1748617486
"args": [
1748717487
{
@@ -17494,9 +17494,9 @@
1749417494
{
1749517495
"name": "limit",
1749617496
"type": "int",
17497-
"default": 15,
17497+
"default": 20,
1749817498
"required": false,
17499-
"help": "Max episodes to show (up to 15, SSR limit)"
17499+
"help": "Max episodes to show"
1750017500
}
1750117501
],
1750217502
"columns": [
@@ -19556,4 +19556,4 @@
1955619556
"sourceFile": "zsxq/topics.js",
1955719557
"navigateBefore": "https://wx.zsxq.com"
1955819558
}
19559-
]
19559+
]

clis/xiaoyuzhou/download.js

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,23 +4,27 @@ import { cli, Strategy } from '@jackwener/opencli/registry';
44
import { CliError } from '@jackwener/opencli/errors';
55
import { httpDownload, sanitizeFilename } from '@jackwener/opencli/download';
66
import { formatBytes } from '@jackwener/opencli/download/progress';
7-
import { fetchPageProps } from './utils.js';
7+
import { loadXiaoyuzhouCredentials, requestXiaoyuzhouJson } from './auth.js';
88

99
cli({
1010
site: 'xiaoyuzhou',
1111
name: 'download',
1212
description: 'Download Xiaoyuzhou episode audio',
1313
domain: 'www.xiaoyuzhoufm.com',
14-
strategy: Strategy.PUBLIC,
14+
strategy: Strategy.LOCAL,
1515
browser: false,
1616
args: [
1717
{ name: 'id', positional: true, required: true, help: 'Episode ID (eid from podcast-episodes output)' },
1818
{ name: 'output', default: './xiaoyuzhou-downloads', help: 'Output directory' },
1919
],
2020
columns: ['title', 'podcast', 'status', 'size', 'file'],
2121
func: async (_page, args) => {
22-
const pageProps = await fetchPageProps(`/episode/${args.id}`);
23-
const ep = pageProps.episode;
22+
const credentials = loadXiaoyuzhouCredentials();
23+
const response = await requestXiaoyuzhouJson('/v1/episode/get', {
24+
query: { eid: args.id },
25+
credentials,
26+
});
27+
const ep = response.data;
2428
if (!ep) {
2529
throw new CliError('NOT_FOUND', 'Episode not found', 'Please check the ID');
2630
}

clis/xiaoyuzhou/download.test.js

Lines changed: 23 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -2,17 +2,19 @@ import path from 'node:path';
22
import { beforeAll, beforeEach, describe, expect, it, vi } from 'vitest';
33
import { getRegistry } from '@jackwener/opencli/registry';
44

5-
const { mockFetchPageProps, mockHttpDownload, mockMkdirSync } = vi.hoisted(() => ({
6-
mockFetchPageProps: vi.fn(),
5+
const { mockRequestJson, mockLoadCredentials, mockHttpDownload, mockMkdirSync } = vi.hoisted(() => ({
6+
mockRequestJson: vi.fn(),
7+
mockLoadCredentials: vi.fn(),
78
mockHttpDownload: vi.fn(),
89
mockMkdirSync: vi.fn(),
910
}));
1011

11-
vi.mock('./utils.js', async () => {
12-
const actual = await vi.importActual('./utils.js');
12+
vi.mock('./auth.js', async () => {
13+
const actual = await vi.importActual('./auth.js');
1314
return {
1415
...actual,
15-
fetchPageProps: mockFetchPageProps,
16+
requestXiaoyuzhouJson: mockRequestJson,
17+
loadXiaoyuzhouCredentials: mockLoadCredentials,
1618
};
1719
});
1820

@@ -44,14 +46,17 @@ beforeAll(() => {
4446

4547
describe('xiaoyuzhou download', () => {
4648
beforeEach(() => {
47-
mockFetchPageProps.mockReset();
49+
mockRequestJson.mockReset();
50+
mockLoadCredentials.mockReset();
4851
mockHttpDownload.mockReset();
4952
mockMkdirSync.mockReset();
53+
mockLoadCredentials.mockReturnValue({});
5054
});
5155

5256
it('downloads audio from media.source.url into an episode subdirectory', async () => {
53-
mockFetchPageProps.mockResolvedValue({
54-
episode: {
57+
mockRequestJson.mockResolvedValue({
58+
credentials: {},
59+
data: {
5560
title: 'Hello World',
5661
podcast: { title: 'OpenCLI FM' },
5762
media: {
@@ -68,7 +73,10 @@ describe('xiaoyuzhou download', () => {
6873
output: '/tmp/xiaoyuzhou-test',
6974
});
7075

71-
expect(mockFetchPageProps).toHaveBeenCalledWith('/episode/ep123');
76+
expect(mockRequestJson).toHaveBeenCalledWith('/v1/episode/get', {
77+
query: { eid: 'ep123' },
78+
credentials: {},
79+
});
7280
expect(toPosixPath(mockMkdirSync.mock.calls[0][0])).toBe('/tmp/xiaoyuzhou-test/ep123');
7381
expect(mockMkdirSync.mock.calls[0][1]).toEqual({ recursive: true });
7482
expect(mockHttpDownload).toHaveBeenCalledWith('https://media.xyzcdn.net/audio/hello-world.mp3?sign=abc', expect.stringContaining('/tmp/xiaoyuzhou-test/ep123/ep123_Hello_World.mp3'), {
@@ -84,8 +92,9 @@ describe('xiaoyuzhou download', () => {
8492
});
8593

8694
it('preserves non-mp3 extensions from media.source.url', async () => {
87-
mockFetchPageProps.mockResolvedValue({
88-
episode: {
95+
mockRequestJson.mockResolvedValue({
96+
credentials: {},
97+
data: {
8998
title: 'Lossless Episode',
9099
podcast: { title: 'OpenCLI FM' },
91100
media: {
@@ -107,8 +116,9 @@ describe('xiaoyuzhou download', () => {
107116
});
108117

109118
it('throws when media.source.url is missing', async () => {
110-
mockFetchPageProps.mockResolvedValue({
111-
episode: {
119+
mockRequestJson.mockResolvedValue({
120+
credentials: {},
121+
data: {
112122
title: 'No Audio',
113123
podcast: { title: 'OpenCLI FM' },
114124
media: {},

clis/xiaoyuzhou/episode.js

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,23 @@
11
import { cli, Strategy } from '@jackwener/opencli/registry';
22
import { CliError } from '@jackwener/opencli/errors';
3-
import { fetchPageProps, formatDuration, formatDate } from './utils.js';
3+
import { loadXiaoyuzhouCredentials, requestXiaoyuzhouJson } from './auth.js';
4+
import { formatDuration, formatDate } from './utils.js';
45
cli({
56
site: 'xiaoyuzhou',
67
name: 'episode',
78
description: 'View details of a Xiaoyuzhou podcast episode',
89
domain: 'www.xiaoyuzhoufm.com',
9-
strategy: Strategy.PUBLIC,
10+
strategy: Strategy.LOCAL,
1011
browser: false,
1112
args: [{ name: 'id', positional: true, required: true, help: 'Episode ID (eid from podcast-episodes output)' }],
1213
columns: ['title', 'podcast', 'duration', 'plays', 'comments', 'likes', 'date'],
1314
func: async (_page, args) => {
14-
const pageProps = await fetchPageProps(`/episode/${args.id}`);
15-
const ep = pageProps.episode;
15+
const credentials = loadXiaoyuzhouCredentials();
16+
const response = await requestXiaoyuzhouJson('/v1/episode/get', {
17+
query: { eid: args.id },
18+
credentials,
19+
});
20+
const ep = response.data;
1621
if (!ep)
1722
throw new CliError('NOT_FOUND', 'Episode not found', 'Please check the ID');
1823
return [{

clis/xiaoyuzhou/podcast-episodes.js

Lines changed: 15 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,34 @@
11
import { cli, Strategy } from '@jackwener/opencli/registry';
22
import { CliError } from '@jackwener/opencli/errors';
3-
import { fetchPageProps, formatDuration, formatDate } from './utils.js';
3+
import { loadXiaoyuzhouCredentials, requestXiaoyuzhouJson } from './auth.js';
4+
import { formatDuration, formatDate } from './utils.js';
45
cli({
56
site: 'xiaoyuzhou',
67
name: 'podcast-episodes',
7-
description: 'List recent episodes of a Xiaoyuzhou podcast (up to 15, SSR limit)',
8+
description: 'List episodes of a Xiaoyuzhou podcast',
89
domain: 'www.xiaoyuzhoufm.com',
9-
strategy: Strategy.PUBLIC,
10+
strategy: Strategy.LOCAL,
1011
browser: false,
1112
args: [
1213
{ name: 'id', positional: true, required: true, help: 'Podcast ID (from xiaoyuzhoufm.com URL)' },
13-
{ name: 'limit', type: 'int', default: 15, help: 'Max episodes to show (up to 15, SSR limit)' },
14+
{ name: 'limit', type: 'int', default: 20, help: 'Max episodes to show' },
1415
],
1516
columns: ['eid', 'title', 'duration', 'plays', 'date'],
1617
func: async (_page, args) => {
17-
const pageProps = await fetchPageProps(`/podcast/${args.id}`);
18-
const podcast = pageProps.podcast;
19-
if (!podcast)
20-
throw new CliError('NOT_FOUND', 'Podcast not found', 'Please check the ID');
21-
const allEpisodes = podcast.episodes ?? [];
2218
const requestedLimit = Number(args.limit);
2319
if (!Number.isInteger(requestedLimit) || requestedLimit < 1) {
2420
throw new CliError('INVALID_ARGUMENT', 'limit must be a positive integer', 'Example: --limit 5');
2521
}
26-
const limit = Math.min(requestedLimit, allEpisodes.length);
27-
const episodes = allEpisodes.slice(0, limit);
22+
const credentials = loadXiaoyuzhouCredentials();
23+
const response = await requestXiaoyuzhouJson('/v1/podcast/listEpisode', {
24+
method: 'POST',
25+
body: { pid: args.id, limit: requestedLimit },
26+
credentials,
27+
});
28+
const episodes = response.data ?? [];
29+
if (!Array.isArray(episodes)) {
30+
throw new CliError('PARSE_ERROR', 'Unexpected API response format', 'Expected an array of episodes');
31+
}
2832
return episodes.map((ep) => ({
2933
eid: ep.eid,
3034
title: ep.title,

clis/xiaoyuzhou/podcast.js

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,23 @@
11
import { cli, Strategy } from '@jackwener/opencli/registry';
22
import { CliError } from '@jackwener/opencli/errors';
3-
import { fetchPageProps, formatDate } from './utils.js';
3+
import { loadXiaoyuzhouCredentials, requestXiaoyuzhouJson } from './auth.js';
4+
import { formatDate } from './utils.js';
45
cli({
56
site: 'xiaoyuzhou',
67
name: 'podcast',
78
description: 'View a Xiaoyuzhou podcast profile',
89
domain: 'www.xiaoyuzhoufm.com',
9-
strategy: Strategy.PUBLIC,
10+
strategy: Strategy.LOCAL,
1011
browser: false,
1112
args: [{ name: 'id', positional: true, required: true, help: 'Podcast ID (from xiaoyuzhoufm.com URL)' }],
1213
columns: ['title', 'author', 'description', 'subscribers', 'episodes', 'updated'],
1314
func: async (_page, args) => {
14-
const pageProps = await fetchPageProps(`/podcast/${args.id}`);
15-
const p = pageProps.podcast;
15+
const credentials = loadXiaoyuzhouCredentials();
16+
const response = await requestXiaoyuzhouJson('/v1/podcast/get', {
17+
query: { pid: args.id },
18+
credentials,
19+
});
20+
const p = response.data;
1621
if (!p)
1722
throw new CliError('NOT_FOUND', 'Podcast not found', 'Please check the ID');
1823
return [{

clis/xiaoyuzhou/utils.js

Lines changed: 0 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -1,43 +1,3 @@
1-
/**
2-
* Shared Xiaoyuzhou utilities — page data extraction and formatting.
3-
*
4-
* Xiaoyuzhou (小宇宙) is a Next.js app that embeds full page data in
5-
* <script id="__NEXT_DATA__">. We fetch the HTML and extract that JSON
6-
* instead of using their authenticated API.
7-
*/
8-
import { CliError } from '@jackwener/opencli/errors';
9-
/**
10-
* Fetch a Xiaoyuzhou page and extract __NEXT_DATA__.props.pageProps.
11-
* @param path - URL path, e.g. '/podcast/xxx' or '/episode/xxx'
12-
*/
13-
export async function fetchPageProps(path) {
14-
const url = `https://www.xiaoyuzhoufm.com${path}`;
15-
// Node.js fetch sends UA "node" which gets blocked; use a browser-like UA
16-
const resp = await fetch(url, {
17-
headers: { 'User-Agent': 'Mozilla/5.0 (compatible; opencli)' },
18-
});
19-
if (!resp.ok) {
20-
throw new CliError('FETCH_ERROR', `HTTP ${resp.status} for ${path}`, 'Please check the ID — you can find it in xiaoyuzhoufm.com URLs');
21-
}
22-
const html = await resp.text();
23-
// [\s\S]*? for multiline safety (JSON may span lines)
24-
const match = html.match(/<script id="__NEXT_DATA__"[^>]*>([\s\S]*?)<\/script>/);
25-
if (!match) {
26-
throw new CliError('PARSE_ERROR', 'Failed to extract __NEXT_DATA__', 'Page structure may have changed');
27-
}
28-
let parsed;
29-
try {
30-
parsed = JSON.parse(match[1]);
31-
}
32-
catch {
33-
throw new CliError('PARSE_ERROR', 'Malformed __NEXT_DATA__ JSON', 'Page structure may have changed');
34-
}
35-
const pageProps = parsed.props?.pageProps;
36-
if (!pageProps || Object.keys(pageProps).length === 0) {
37-
throw new CliError('NOT_FOUND', 'Resource not found', 'Please check the ID — you can find it in xiaoyuzhoufm.com URLs');
38-
}
39-
return pageProps;
40-
}
411
/** Format seconds to mm:ss (e.g. 3890 → "64:50"). Returns '-' for invalid input. */
422
export function formatDuration(seconds) {
433
if (!Number.isFinite(seconds) || seconds < 0)

0 commit comments

Comments
 (0)