Skip to content

Commit 0773616

Browse files
Astro-Hanjackwener
andauthored
feat(imdb): add IMDb adapter with 6 commands (#472)
* feat(imdb): add IMDb adapter with 6 commands Add a public IMDb adapter using browser-based JSON-LD and __NEXT_DATA__ extraction. All commands use Strategy.PUBLIC with browser: true. Commands: - imdb search <query> — search movies, TV shows, and people - imdb title <id> — get movie/show details (Movie, TVSeries, TVEpisode, TVMiniseries, TVMovie, etc.) - imdb top — IMDb Top 250 chart - imdb trending — Most Popular Movies - imdb person <id> — actor/director info with filmography - imdb reviews <id> — user reviews (first page, max 25) Shared utils: ID normalization, ISO 8601 duration formatting, locale forcing, JSON-LD extraction (supports type array filtering), and anti-bot challenge detection. * review: harden imdb adapter loading and tests * test: unblock PR CI on merge head --------- Co-authored-by: jackwener <jakevingoo@gmail.com>
1 parent 5731881 commit 0773616

16 files changed

Lines changed: 1339 additions & 31 deletions

File tree

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -191,6 +191,7 @@ Run `opencli list` for the live registry.
191191
| **facebook** | `feed` `profile` `search` `friends` `groups` `events` `notifications` `memories` `add-friend` `join-group` | Browser |
192192
| **google** | `news` `search` `suggest` `trends` | Public |
193193
| **36kr** | `news` `hot` `search` `article` | Public / Browser |
194+
| **imdb** | `search` `title` `top` `trending` `person` `reviews` | Public |
194195
| **producthunt** | `posts` `today` `hot` `browse` | Public / Browser |
195196
| **instagram** | `explore` `profile` `search` `user` `followers` `following` `follow` `unfollow` `like` `unlike` `comment` `save` `unsave` `saved` | Browser |
196197
| **lobsters** | `hot` `newest` `active` `tag` | Public |

README.zh-CN.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -173,6 +173,7 @@ npm install -g @jackwener/opencli@latest
173173
| **facebook** | `feed` `profile` `search` `friends` `groups` `events` `notifications` `memories` `add-friend` `join-group` | 浏览器 |
174174
| **google** | `news` `search` `suggest` `trends` | 公开 |
175175
| **36kr** | `news` `hot` `search` `article` | 公开 / 浏览器 |
176+
| **imdb** | `search` `title` `top` `trending` `person` `reviews` | 公开 |
176177
| **producthunt** | `posts` `today` `hot` `browse` | 公开 / 浏览器 |
177178
| **instagram** | `explore` `profile` `search` `user` `followers` `following` `follow` `unfollow` `like` `unlike` `comment` `save` `unsave` `saved` | 浏览器 |
178179
| **lobsters** | `hot` `newest` `active` `tag` | 公开 |

docs/.vitepress/config.mts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,7 @@ export default defineConfig({
7979
{ text: 'Doubao', link: '/adapters/browser/doubao' },
8080
{ text: 'Facebook', link: '/adapters/browser/facebook' },
8181
{ text: 'Google', link: '/adapters/browser/google' },
82+
{ text: 'IMDb', link: '/adapters/browser/imdb' },
8283
{ text: 'Instagram', link: '/adapters/browser/instagram' },
8384
{ text: 'JD.com', link: '/adapters/browser/jd' },
8485
{ text: 'Medium', link: '/adapters/browser/medium' },

docs/adapters/browser/imdb.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# IMDb
2+
3+
**Mode**: 🌐 Public (Browser) · **Domain**: `www.imdb.com`
4+
5+
## Commands
6+
7+
| Command | Description |
8+
|---------|-------------|
9+
| `opencli imdb search` | Search movies, TV shows, and people |
10+
| `opencli imdb title` | Get movie or TV show details |
11+
| `opencli imdb top` | IMDb Top 250 Movies |
12+
| `opencli imdb trending` | IMDb Most Popular Movies |
13+
| `opencli imdb person` | Get actor or director info |
14+
| `opencli imdb reviews` | Get user reviews for a title |
15+
16+
## Usage Examples
17+
18+
```bash
19+
# Search for a movie
20+
opencli imdb search "inception" --limit 10
21+
22+
# Get movie details
23+
opencli imdb title tt1375666
24+
25+
# Get TV series details (also accepts full URL)
26+
opencli imdb title "https://www.imdb.com/title/tt0903747/"
27+
28+
# Top 250 movies
29+
opencli imdb top --limit 20
30+
31+
# Currently trending movies
32+
opencli imdb trending --limit 10
33+
34+
# Actor/director info with filmography
35+
opencli imdb person nm0634240 --limit 5
36+
37+
# User reviews
38+
opencli imdb reviews tt1375666 --limit 5
39+
40+
# JSON output
41+
opencli imdb top --limit 5 -f json
42+
```
43+
44+
## Prerequisites
45+
46+
- Chrome with Browser Bridge extension installed
47+
- No login required (all data is public)

docs/adapters/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ Run `opencli list` for the live registry.
3232
| **[weread](/adapters/browser/weread)** | `shelf` `search` `book` `ranking` `notebooks` `highlights` `notes` | 🔐 Browser |
3333
| **[douban](/adapters/browser/douban)** | `search` `top250` `subject` `photos` `download` `marks` `reviews` `movie-hot` `book-hot` | 🔐 Browser |
3434
| **[facebook](/adapters/browser/facebook)** | `feed` `profile` `search` `friends` `groups` `events` `notifications` `memories` `add-friend` `join-group` | 🔐 Browser |
35+
| **[imdb](/adapters/browser/imdb)** | `search` `title` `top` `trending` `person` `reviews` | 🌐 / 🔐 |
3536
| **[instagram](/adapters/browser/instagram)** | `explore` `profile` `search` `user` `followers` `following` `follow` `unfollow` `like` `unlike` `comment` `save` `unsave` `saved` | 🔐 Browser |
3637
| **[medium](/adapters/browser/medium)** | `feed` `search` `user` | 🔐 Browser |
3738
| **[sinablog](/adapters/browser/sinablog)** | `hot` `search` `article` `user` | 🔐 Browser |

src/clis/douban/download.test.ts

Lines changed: 27 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
import { beforeAll, beforeEach, describe, expect, it, vi } from 'vitest';
2+
import path from 'node:path';
23
import type { CliCommand } from '../../registry.js';
34
import { getRegistry } from '../../registry.js';
45
import type { IPage } from '../../types.js';
@@ -39,6 +40,10 @@ beforeAll(() => {
3940
expect(cmd?.func).toBeTypeOf('function');
4041
});
4142

43+
function toPosixPath(value: string): string {
44+
return value.replaceAll(path.sep, '/');
45+
}
46+
4247
describe('douban download', () => {
4348
beforeEach(() => {
4449
mockHttpDownload.mockReset();
@@ -89,26 +94,22 @@ describe('douban download', () => {
8994
type: 'Rb',
9095
limit: 20,
9196
});
92-
expect(mockMkdirSync).toHaveBeenCalledWith('/tmp/douban-test/30382501', { recursive: true });
97+
expect(mockMkdirSync).toHaveBeenCalledTimes(1);
98+
expect(toPosixPath(mockMkdirSync.mock.calls[0][0])).toBe('/tmp/douban-test/30382501');
99+
expect(mockMkdirSync.mock.calls[0][1]).toEqual({ recursive: true });
93100
expect(mockHttpDownload).toHaveBeenCalledTimes(2);
94-
expect(mockHttpDownload).toHaveBeenNthCalledWith(
95-
1,
96-
'https://img1.doubanio.com/view/photo/l/public/p2913450214.webp',
97-
'/tmp/douban-test/30382501/30382501_001_2913450214_Main_poster.webp',
98-
expect.objectContaining({
99-
headers: { Referer: 'https://movie.douban.com/photos/photo/2913450214/' },
100-
timeout: 60000,
101-
}),
102-
);
103-
expect(mockHttpDownload).toHaveBeenNthCalledWith(
104-
2,
105-
'https://img1.doubanio.com/view/photo/l/public/p2913450215.jpg',
106-
'/tmp/douban-test/30382501/30382501_002_2913450215_Character_poster.jpg',
107-
expect.objectContaining({
108-
headers: { Referer: 'https://movie.douban.com/photos/photo/2913450215/' },
109-
timeout: 60000,
110-
}),
111-
);
101+
expect(mockHttpDownload.mock.calls[0]?.[0]).toBe('https://img1.doubanio.com/view/photo/l/public/p2913450214.webp');
102+
expect(toPosixPath(mockHttpDownload.mock.calls[0]?.[1])).toBe('/tmp/douban-test/30382501/30382501_001_2913450214_Main_poster.webp');
103+
expect(mockHttpDownload.mock.calls[0]?.[2]).toEqual(expect.objectContaining({
104+
headers: { Referer: 'https://movie.douban.com/photos/photo/2913450214/' },
105+
timeout: 60000,
106+
}));
107+
expect(mockHttpDownload.mock.calls[1]?.[0]).toBe('https://img1.doubanio.com/view/photo/l/public/p2913450215.jpg');
108+
expect(toPosixPath(mockHttpDownload.mock.calls[1]?.[1])).toBe('/tmp/douban-test/30382501/30382501_002_2913450215_Character_poster.jpg');
109+
expect(mockHttpDownload.mock.calls[1]?.[2]).toEqual(expect.objectContaining({
110+
headers: { Referer: 'https://movie.douban.com/photos/photo/2913450215/' },
111+
timeout: 60000,
112+
}));
112113

113114
expect(result).toEqual([
114115
{
@@ -164,14 +165,13 @@ describe('douban download', () => {
164165
type: 'Rb',
165166
targetPhotoId: '2913450215',
166167
});
167-
expect(mockHttpDownload).toHaveBeenCalledWith(
168-
'https://img1.doubanio.com/view/photo/l/public/p2913450215.jpg',
169-
'/tmp/douban-test/30382501/30382501_002_2913450215_Character_poster.jpg',
170-
expect.objectContaining({
171-
headers: { Referer: 'https://movie.douban.com/photos/photo/2913450215/' },
172-
timeout: 60000,
173-
}),
174-
);
168+
expect(mockHttpDownload).toHaveBeenCalledTimes(1);
169+
expect(mockHttpDownload.mock.calls[0]?.[0]).toBe('https://img1.doubanio.com/view/photo/l/public/p2913450215.jpg');
170+
expect(toPosixPath(mockHttpDownload.mock.calls[0]?.[1])).toBe('/tmp/douban-test/30382501/30382501_002_2913450215_Character_poster.jpg');
171+
expect(mockHttpDownload.mock.calls[0]?.[2]).toEqual(expect.objectContaining({
172+
headers: { Referer: 'https://movie.douban.com/photos/photo/2913450215/' },
173+
timeout: 60000,
174+
}));
175175

176176
expect(result).toEqual([
177177
{

src/clis/imdb/person.ts

Lines changed: 232 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,232 @@
1+
import { CommandExecutionError } from '../../errors.js';
2+
import { cli, Strategy } from '../../registry.js';
3+
import {
4+
forceEnglishUrl,
5+
getCurrentImdbId,
6+
isChallengePage,
7+
normalizeImdbId,
8+
waitForImdbPath,
9+
} from './utils.js';
10+
11+
/**
12+
* Read IMDb person details from public profile pages.
13+
*/
14+
cli({
15+
site: 'imdb',
16+
name: 'person',
17+
description: 'Get actor or director info',
18+
domain: 'www.imdb.com',
19+
strategy: Strategy.PUBLIC,
20+
browser: true,
21+
args: [
22+
{ name: 'id', positional: true, required: true, help: 'IMDb person ID (nm0634240) or URL' },
23+
{ name: 'limit', type: 'int', default: 10, help: 'Max filmography entries' },
24+
],
25+
columns: ['field', 'value'],
26+
func: async (page, args) => {
27+
const id = normalizeImdbId(String(args.id), 'nm');
28+
// Clamp to 30 to match the internal evaluate cap
29+
const limit = Math.max(1, Math.min(Number(args.limit) || 10, 30));
30+
const url = forceEnglishUrl(`https://www.imdb.com/name/${id}/`);
31+
32+
await page.goto(url);
33+
const onPersonPage = await waitForImdbPath(page, `^/name/${id}/`);
34+
35+
if (await isChallengePage(page)) {
36+
throw new CommandExecutionError(
37+
'IMDb blocked this request',
38+
'Try again with a normal browser session or extension mode',
39+
);
40+
}
41+
if (!onPersonPage) {
42+
throw new CommandExecutionError(
43+
`Person page did not finish loading: ${id}`,
44+
'Retry the command; if it persists, IMDb may have changed their navigation flow',
45+
);
46+
}
47+
48+
const currentId = await getCurrentImdbId(page, 'nm');
49+
if (currentId && currentId !== id) {
50+
throw new CommandExecutionError(
51+
`IMDb redirected to a different person: ${currentId}`,
52+
'Retry the command; if it persists, the person page may have changed',
53+
);
54+
}
55+
56+
const data = await page.evaluate(`
57+
(function() {
58+
var result = {
59+
nameId: '',
60+
name: '',
61+
description: '',
62+
birthDate: '',
63+
filmography: []
64+
};
65+
66+
var scripts = document.querySelectorAll('script[type="application/ld+json"]');
67+
for (var i = 0; i < scripts.length; i++) {
68+
try {
69+
var ld = JSON.parse(scripts[i].textContent || 'null');
70+
if (ld && ld['@type'] === 'Person') {
71+
if (typeof ld.url === 'string') {
72+
var ldMatch = ld.url.match(/(nm\\d{7,8})/);
73+
if (ldMatch) {
74+
result.nameId = ldMatch[1];
75+
}
76+
}
77+
result.name = result.name || ld.name || '';
78+
result.description = result.description || ld.description || '';
79+
break;
80+
}
81+
} catch (error) {
82+
void error;
83+
}
84+
}
85+
86+
var nextDataEl = document.getElementById('__NEXT_DATA__');
87+
if (!nextDataEl) {
88+
return result;
89+
}
90+
91+
try {
92+
var nextData = JSON.parse(nextDataEl.textContent || 'null');
93+
var pageProps = nextData && nextData.props && nextData.props.pageProps;
94+
var above = pageProps && (pageProps.aboveTheFold || pageProps.aboveTheFoldData);
95+
var main = pageProps && (pageProps.mainColumnData || pageProps.belowTheFold);
96+
97+
if (above) {
98+
if (!result.nameId && above.id) {
99+
result.nameId = String(above.id);
100+
}
101+
if (!result.name && above.nameText && above.nameText.text) {
102+
result.name = above.nameText.text;
103+
}
104+
105+
if (above.birthDate) {
106+
if (above.birthDate.displayableProperty && above.birthDate.displayableProperty.value) {
107+
result.birthDate = above.birthDate.displayableProperty.value.plainText || '';
108+
}
109+
if (!result.birthDate && above.birthDate.dateComponents) {
110+
var dc = above.birthDate.dateComponents;
111+
result.birthDate = [dc.year, dc.month, dc.day].filter(Boolean).join('-');
112+
}
113+
}
114+
115+
if (above.bio && above.bio.text && above.bio.text.plainText) {
116+
result.description = above.bio.text.plainText.substring(0, 300);
117+
}
118+
}
119+
120+
var pushFilmography = function(title, year, role) {
121+
if (!title) {
122+
return;
123+
}
124+
result.filmography.push({
125+
title: title,
126+
year: year || '',
127+
role: role || ''
128+
});
129+
};
130+
131+
var knownFor = main && main.knownForFeatureV2;
132+
if (knownFor && Array.isArray(knownFor.credits)) {
133+
for (var j = 0; j < knownFor.credits.length; j++) {
134+
var knownNode = knownFor.credits[j];
135+
if (!knownNode || !knownNode.title) {
136+
continue;
137+
}
138+
var knownRole = '';
139+
var knownRoleEdge = knownNode.creditedRoles && Array.isArray(knownNode.creditedRoles.edges)
140+
? knownNode.creditedRoles.edges[0]
141+
: null;
142+
if (knownRoleEdge && knownRoleEdge.node) {
143+
knownRole = knownRoleEdge.node.text
144+
|| (knownRoleEdge.node.category ? knownRoleEdge.node.category.text || '' : '');
145+
}
146+
pushFilmography(
147+
knownNode.title.titleText ? knownNode.title.titleText.text : '',
148+
knownNode.title.releaseYear ? String(knownNode.title.releaseYear.year || '') : '',
149+
knownRole
150+
);
151+
}
152+
}
153+
154+
if (result.filmography.length === 0) {
155+
var creditSources = [];
156+
if (main && main.released && Array.isArray(main.released.edges)) {
157+
creditSources.push(main.released.edges);
158+
}
159+
if (main && main.groupings && Array.isArray(main.groupings.edges)) {
160+
creditSources.push(main.groupings.edges);
161+
}
162+
163+
for (var k = 0; k < creditSources.length && result.filmography.length < 30; k++) {
164+
var groups = creditSources[k];
165+
for (var m = 0; m < groups.length && result.filmography.length < 30; m++) {
166+
var groupNode = groups[m] && groups[m].node;
167+
if (!groupNode) {
168+
continue;
169+
}
170+
171+
var roleName = groupNode.grouping ? groupNode.grouping.text || '' : '';
172+
var credits = groupNode.credits && Array.isArray(groupNode.credits.edges)
173+
? groupNode.credits.edges
174+
: [];
175+
for (var n = 0; n < credits.length && result.filmography.length < 30; n++) {
176+
var creditNode = credits[n] && credits[n].node;
177+
if (!creditNode || !creditNode.title) {
178+
continue;
179+
}
180+
pushFilmography(
181+
creditNode.title.titleText ? creditNode.title.titleText.text : (creditNode.title.originalTitleText ? creditNode.title.originalTitleText.text : ''),
182+
creditNode.title.releaseYear ? String(creditNode.title.releaseYear.year || '') : '',
183+
roleName
184+
);
185+
}
186+
}
187+
}
188+
}
189+
} catch (error) {
190+
void error;
191+
}
192+
193+
return result;
194+
})()
195+
`);
196+
197+
if (!data || typeof data !== 'object' || !('name' in data) || !(data as Record<string, unknown>).name) {
198+
throw new CommandExecutionError(`Person not found: ${id}`, 'Check the person ID and try again');
199+
}
200+
201+
const result = data as Record<string, any>;
202+
if (result.nameId && result.nameId !== id) {
203+
throw new CommandExecutionError(
204+
`IMDb returned a different person payload: ${result.nameId}`,
205+
'Retry the command; if it persists, the person parser may need updating',
206+
);
207+
}
208+
const filmography = Array.isArray(result.filmography) ? result.filmography : [];
209+
210+
// Override url with a clean canonical URL (no query params like ?language=en-US)
211+
result.url = `https://www.imdb.com/name/${id}/`;
212+
213+
const rows = Object.entries(result)
214+
.filter(([field, value]) => field !== 'filmography' && field !== 'nameId' && value !== '' && value != null)
215+
.map(([field, value]) => ({ field, value: String(value) }));
216+
217+
if (filmography.length > 0) {
218+
rows.push({ field: 'filmography', value: '' });
219+
for (const entry of filmography.slice(0, limit)) {
220+
const suffix = [entry.year ? `(${entry.year})` : '', entry.role ? `[${entry.role}]` : '']
221+
.filter(Boolean)
222+
.join(' ');
223+
rows.push({
224+
field: String(entry.title || ''),
225+
value: suffix,
226+
});
227+
}
228+
}
229+
230+
return rows;
231+
},
232+
});

0 commit comments

Comments
 (0)