diff --git a/CHANGELOG.md b/CHANGELOG.md index c5af090ea..9e3c37dd2 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,11 @@ # Changelog +## Unreleased + +### ⚠ BREAKING CHANGES + +* **browser** — replace the `--session ` flag with a `` positional argument that immediately follows `browser`. `opencli browser work click 12` instead of `opencli browser --session work click 12`; `opencli browser work bind` instead of `opencli browser bind --session work`. Required-flag semantics are now encoded structurally as a positional, matching the Docker/git convention for required operation-target identifiers. The internal `--session` flag is preserved for the daemon protocol and for direct `program.parseAsync` callers but is no longer part of the user-facing surface. + ## [1.7.18](https://github.com/jackwener/opencli/compare/v1.7.17...v1.7.18) (2026-05-12) Hotfix release for the 1.7.17 doctor regression: `opencli doctor` failed connectivity probe with `Browser session is required` because the doctor probe didn't pass a session to the new strict-session browser bridge. Also adds new adapters and adapter fixes that were ready immediately after 1.7.17. diff --git a/README.md b/README.md index a793e8eae..070fae6ac 100644 --- a/README.md +++ b/README.md @@ -152,7 +152,7 @@ The agent handles all the `opencli browser` commands internally — you just des Available browser commands include `open`, `state`, `click`, `type`, `fill`, `select`, `keys`, `wait`, `get`, `find`, `extract`, `frames`, `screenshot`, `scroll`, `back`, `eval`, `network`, `tab list`, `tab new`, `tab select`, `tab close`, `init`, `verify`, and `close`. -`opencli browser` commands require `--session `. `opencli browser --session work open ` and `opencli browser --session work tab new [url]` both return a target ID. Use `opencli browser --session work tab list` to inspect target IDs, then pass `--tab ` to route a command to a specific tab. `tab new` creates a new tab without changing the default browser target; only `tab select ` promotes that tab to the default target for later untargeted commands in the same session. +`opencli browser` commands require a `` positional immediately after `browser`. `opencli browser work open ` and `opencli browser work tab new [url]` both return a target ID. Use `opencli browser work tab list` to inspect target IDs, then pass `--tab ` to route a command to a specific tab. `tab new` creates a new tab without changing the default browser target; only `tab select ` promotes that tab to the default target for later untargeted commands in the same session. ## Core Concepts @@ -160,7 +160,7 @@ Available browser commands include `open`, `state`, `click`, `type`, `fill`, `se `opencli browser` commands are the low-level primitives that AI Agents use to operate websites. You don't run these manually — instead, install the `opencli-adapter-author` skill into your AI agent, describe what you want in natural language, and the agent handles the browser operations. -For example, tell your agent: *"Help me check my Xiaohongshu notifications"* — the agent will use `opencli browser --session open`, `state`, `click`, etc. under the hood. +For example, tell your agent: *"Help me check my Xiaohongshu notifications"* — the agent will use `opencli browser open`, `state`, `click`, etc. under the hood. ### Built-in adapters: stable commands @@ -174,7 +174,7 @@ When the site you need is not yet covered, use the `opencli-adapter-author` skil 2. Discover the right endpoint — network inspection, initial state, bundle search, token trace, or interceptor fallback. 3. Decide the auth strategy — `PUBLIC` / `COOKIE` / `INTERCEPT` / `UI` / `LOCAL`. 4. Decode response fields and design output columns. -5. `opencli browser --session recon analyze ` for one-shot recon, then `opencli browser --session recon init /` → write adapter → `opencli browser --session recon verify /`. +5. `opencli browser recon analyze ` for one-shot recon, then `opencli browser recon init /` → write adapter → `opencli browser recon verify /`. 6. Persist site knowledge to `~/.opencli/sites//` so the next adapter for the same site is faster. ### CLI Hub and desktop adapters @@ -207,7 +207,7 @@ OpenCLI is not only for websites. It can also: | `OPENCLI_VERBOSE` | `false` | Enable verbose logging (`-v` flag also works) | | `DEBUG_SNAPSHOT` | — | Set to `1` for DOM snapshot debug output | -`opencli browser *` requires an explicit `--session `, uses a foreground browser window by default, and keeps that session's tab lease until `browser --session close` or idle cleanup. Browser-backed adapters use a background adapter window and release one-shot tab leases by default. Interactive adapters can declare `siteSession: 'persistent'` to keep a stable site tab for continuity; pass `--site-session ephemeral` for a one-shot tab. +`opencli browser *` requires an explicit `` positional, uses a foreground browser window by default, and keeps that session's tab lease until `opencli browser close` or idle cleanup. Browser-backed adapters use a background adapter window and release one-shot tab leases by default. Interactive adapters can declare `siteSession: 'persistent'` to keep a stable site tab for continuity; pass `--site-session ephemeral` for a one-shot tab. ## Update @@ -411,10 +411,10 @@ See [Plugins Guide](./docs/guide/plugins.md) for creating your own plugin. Before writing any adapter code, read the [`opencli-adapter-author` skill](./skills/opencli-adapter-author/SKILL.md). It takes you end-to-end: - Recon the site and pick a pattern (SPA / SSR / JSONP / Token / Streaming). -- Discover the right endpoint via `opencli browser --session network`, `eval`, or the interceptor fallback. +- Discover the right endpoint via `opencli browser network`, `eval`, or the interceptor fallback. - Decide auth strategy (`PUBLIC` / `COOKIE` / `INTERCEPT` / `UI` / `LOCAL`). -- Run `opencli browser --session recon analyze ` for one-shot recon, decode response fields, design columns, scaffold with `opencli browser --session recon init`. -- Verify with `opencli browser --session recon verify /` before shipping. +- Run `opencli browser recon analyze ` for one-shot recon, decode response fields, design columns, scaffold with `opencli browser recon init`. +- Verify with `opencli browser recon verify /` before shipping. For long-lived personal commands that should live in your own Git repo, use a local plugin instead; see [Extending OpenCLI](./docs/guide/extending-opencli.md). Quick private adapters can still live at `~/.opencli/clis//.js`. Site knowledge (endpoints, field maps, fixtures) accumulates in `~/.opencli/sites//` so the next adapter for the same site starts from context instead of zero. diff --git a/README.zh-CN.md b/README.zh-CN.md index 4356a3d1d..19d9c5650 100644 --- a/README.zh-CN.md +++ b/README.zh-CN.md @@ -136,7 +136,7 @@ Agent 在内部自动处理所有 `opencli browser` 命令——你只需用自 `browser` 可用命令包括:`open`、`state`、`click`、`type`、`fill`、`select`、`keys`、`wait`、`get`、`find`、`extract`、`frames`、`screenshot`、`scroll`、`back`、`eval`、`network`、`tab list`、`tab new`、`tab select`、`tab close`、`init`、`verify`、`close`。 -`opencli browser` 命令必须显式传 `--session `。`opencli browser --session work open ` 和 `opencli browser --session work tab new [url]` 都会返回 target ID。`opencli browser --session work tab list` 用来查看当前已存在 tab 的 target ID,再通过 `--tab ` 把命令明确路由到某个 tab。`tab new` 只会新建 tab,不会改变默认浏览器目标;只有显式执行 `tab select `,才会把该 tab 设为同一 session 后续未指定 target 的默认目标。 +`opencli browser` 命令必须紧跟一个 `` 位置参数。`opencli browser work open ` 和 `opencli browser work tab new [url]` 都会返回 target ID。`opencli browser work tab list` 用来查看当前已存在 tab 的 target ID,再通过 `--tab ` 把命令明确路由到某个 tab。`tab new` 只会新建 tab,不会改变默认浏览器目标;只有显式执行 `tab select `,才会把该 tab 设为同一 session 后续未指定 target 的默认目标。 ## 核心概念 @@ -144,7 +144,7 @@ Agent 在内部自动处理所有 `opencli browser` 命令——你只需用自 `opencli browser` 命令是 AI Agent 操作网站的底层原语。你不需要手动运行这些命令——把 `opencli-adapter-author` skill 安装到你的 AI Agent 中,用自然语言描述你想做的事,Agent 会自动处理浏览器操作。 -比如你告诉 Agent:*"帮我看看小红书的通知"*——Agent 会在底层调用 `opencli browser --session open`、`state`、`click` 等命令。 +比如你告诉 Agent:*"帮我看看小红书的通知"*——Agent 会在底层调用 `opencli browser open`、`state`、`click` 等命令。 ### 内置适配器:稳定命令 @@ -158,7 +158,7 @@ Agent 在内部自动处理所有 `opencli browser` 命令——你只需用自 2. 发现目标 endpoint——network 精读、initial state、bundle 搜索、token 溯源,或 interceptor 兜底 3. 定认证策略——`PUBLIC` / `COOKIE` / `INTERCEPT` / `UI` / `LOCAL` 4. 字段解码 + 设计输出列 -5. `opencli browser --session recon analyze ` 一步侦察,再 `opencli browser --session recon init /` → 写适配器 → `opencli browser --session recon verify /` +5. `opencli browser recon analyze ` 一步侦察,再 `opencli browser recon init /` → 写适配器 → `opencli browser recon verify /` 6. 把站点知识沉到 `~/.opencli/sites//`,下次写同站点的其他命令直接吃缓存 ### CLI 枢纽与桌面端适配器 @@ -190,7 +190,7 @@ OpenCLI 不只是网站 CLI,还可以: | `OPENCLI_VERBOSE` | `false` | 启用详细日志(`-v` 也可以) | | `DEBUG_SNAPSHOT` | — | 设为 `1` 输出 DOM 快照调试信息 | -`opencli browser *` 必须显式传 `--session `,默认使用前台窗口,并保留该 session 的 tab lease,直到你手动执行 `opencli browser --session close` 或等空闲超时。浏览器型 adapter 默认使用后台 adapter 窗口并在命令结束后释放一次性 tab lease;如果需要调试最终页面,可以传 `--window foreground --keep-tab true`。 +`opencli browser *` 必须紧跟一个 `` 位置参数,默认使用前台窗口,并保留该 session 的 tab lease,直到你手动执行 `opencli browser close` 或等空闲超时。浏览器型 adapter 默认使用后台 adapter 窗口并在命令结束后释放一次性 tab lease;如果需要调试最终页面,可以传 `--window foreground --keep-tab true`。 ## 更新 @@ -510,10 +510,10 @@ opencli plugin uninstall my-tool # 卸载 在动代码前,先读 [`opencli-adapter-author` skill](./skills/opencli-adapter-author/SKILL.md)。它把整个流程串起来: - 侦察站点,选定 pattern(SPA / SSR / JSONP / Token / Streaming) -- 用 `opencli browser --session network`、`eval`、interceptor 等找到目标 endpoint +- 用 `opencli browser network`、`eval`、interceptor 等找到目标 endpoint - 定认证策略(`PUBLIC` / `COOKIE` / `INTERCEPT` / `UI` / `LOCAL`) -- 先用 `opencli browser --session recon analyze ` 一步侦察,再字段解码、设计 columns、`opencli browser --session recon init` 生成骨架 -- 交付前用 `opencli browser --session recon verify /` 验证 +- 先用 `opencli browser recon analyze ` 一步侦察,再字段解码、设计 columns、`opencli browser recon init` 生成骨架 +- 交付前用 `opencli browser recon verify /` 验证 在仓库外写的私有适配器放到 `~/.opencli/clis//.js`;每个站点的 endpoint、字段映射、抓包样本会累积在 `~/.opencli/sites//`,下次写同站点的其他命令可以直接复用。 diff --git a/docs/guide/browser-bridge.md b/docs/guide/browser-bridge.md index 7d701d10a..36c09420f 100644 --- a/docs/guide/browser-bridge.md +++ b/docs/guide/browser-bridge.md @@ -27,22 +27,22 @@ opencli doctor # Check extension + daemon connectivity ## Tab Targeting -Browser commands require an explicit `--session `. Use the same session name for a multi-step flow, and use different names to isolate parallel work. +Browser commands require an explicit `` positional immediately after `browser`. Use the same session name for a multi-step flow, and use different names to isolate parallel work. ```bash -opencli browser --session baidu open https://www.baidu.com/ -opencli browser --session baidu tab list -opencli browser --session baidu tab new https://www.baidu.com/ -opencli browser --session baidu eval --tab 'document.title' -opencli browser --session baidu tab select -opencli browser --session baidu get title -opencli browser --session baidu tab close +opencli browser baidu open https://www.baidu.com/ +opencli browser baidu tab list +opencli browser baidu tab new https://www.baidu.com/ +opencli browser baidu eval --tab 'document.title' +opencli browser baidu tab select +opencli browser baidu get title +opencli browser baidu tab close ``` Key rules: -- `opencli browser --session open ` and `opencli browser --session tab new [url]` return a `targetId`. -- `opencli browser --session tab list` prints the `targetId` values of tabs that already exist. +- `opencli browser open ` and `opencli browser tab new [url]` return a `targetId`. +- `opencli browser tab list` prints the `targetId` values of tabs that already exist. - `--tab ` routes a single browser command to that specific tab. - `tab new` creates a new tab but does not change the default browser target. - `tab select ` makes that tab the default target for later untargeted `opencli browser ...` commands. diff --git a/docs/zh/guide/browser-bridge.md b/docs/zh/guide/browser-bridge.md index 3d7180a5a..36e5e006a 100644 --- a/docs/zh/guide/browser-bridge.md +++ b/docs/zh/guide/browser-bridge.md @@ -25,22 +25,22 @@ opencli doctor # 检查扩展 + 守护进程连接 ## 多 Tab 定位 -浏览器命令必须显式传 `--session `。同一个多步骤流程使用同一个 session;并行任务使用不同 session 隔离。 +浏览器命令必须紧跟一个 `` 位置参数。同一个多步骤流程使用同一个 session;并行任务使用不同 session 隔离。 ```bash -opencli browser --session baidu open https://www.baidu.com/ -opencli browser --session baidu tab list -opencli browser --session baidu tab new https://www.baidu.com/ -opencli browser --session baidu eval --tab 'document.title' -opencli browser --session baidu tab select -opencli browser --session baidu get title -opencli browser --session baidu tab close +opencli browser baidu open https://www.baidu.com/ +opencli browser baidu tab list +opencli browser baidu tab new https://www.baidu.com/ +opencli browser baidu eval --tab 'document.title' +opencli browser baidu tab select +opencli browser baidu get title +opencli browser baidu tab close ``` 规则如下: -- `opencli browser --session open ` 和 `opencli browser --session tab new [url]` 都会返回 `targetId`。 -- `opencli browser --session tab list` 会打印当前已存在 tab 的 `targetId`。 +- `opencli browser open ` 和 `opencli browser tab new [url]` 都会返回 `targetId`。 +- `opencli browser tab list` 会打印当前已存在 tab 的 `targetId`。 - `--tab ` 会把单条 browser 命令路由到对应 tab。 - `tab new` 只会新建 tab,不会改变默认浏览器目标。 - `tab select ` 会把该 tab 设为后续未显式指定 target 的 `opencli browser ...` 命令默认目标。 diff --git a/skills/opencli-browser/SKILL.md b/skills/opencli-browser/SKILL.md index 3f25ecccd..188724be9 100644 --- a/skills/opencli-browser/SKILL.md +++ b/skills/opencli-browser/SKILL.md @@ -24,26 +24,26 @@ Until `doctor` is green, nothing else will work. Typical failures: Chrome not ru ## Session lifecycle -- `opencli browser *` commands require `--session `. Use the same session name for a multi-step flow; use a different name to isolate parallel browser work. -- Owned browser sessions keep a tab lease alive between calls. Release it with `opencli browser --session close` or let the idle timeout expire. -- `opencli browser bind --session ` binds the Chrome tab you already have open to that session. Use this for logged-in pages, SSO flows, or pages you manually positioned before handing control to the agent. +- `opencli browser *` commands require a `` positional immediately after `browser`. Use the same session name for a multi-step flow; use a different name to isolate parallel browser work. +- Owned browser sessions keep a tab lease alive between calls. Release it with `opencli browser close` or let the idle timeout expire. +- `opencli browser bind` binds the Chrome tab you already have open to that session. Use this for logged-in pages, SSO flows, or pages you manually positioned before handing control to the agent. - `--window foreground|background` (or `OPENCLI_WINDOW=foreground|background`) chooses whether OpenCLI creates/focuses a foreground browser window or uses a background browser window for owned sessions. ### Bind Tab ```bash -opencli browser bind --session gmail -opencli browser --session gmail state -opencli browser --session gmail click "Search" -opencli browser --session gmail network -opencli browser unbind --session gmail +opencli browser gmail bind +opencli browser gmail state +opencli browser gmail click "Search" +opencli browser gmail network +opencli browser gmail unbind ``` -Binding never owns the user window and never closes the user tab. It fails closed if the tab is closed or becomes non-debuggable. Re-run `bind --session ` when you switch to a different real tab. +Binding never owns the user window and never closes the user tab. It fails closed if the tab is closed or becomes non-debuggable. Re-run `opencli browser bind` when you switch to a different real tab. Navigation is allowed on bound sessions because the session now represents explicit agent ownership of that tab. Tab mutation (`tab new`, `tab select`, `tab close`) is still blocked for bound sessions. Use an owned session when you want OpenCLI to manage tab lifecycle. -`opencli browser sessions` returns `idleMsRemaining: null` for bound sessions. That means there is no OpenCLI idle-close timer; the binding lasts until `unbind`, tab close, window close, or daemon restart. +Bound sessions have no OpenCLI idle-close timer; the binding lasts until `unbind`, tab close, window close, or daemon restart. --- @@ -210,8 +210,8 @@ Default output keeps JSON/XML/plain-text and JS-like API responses, then drops o | `browser tab close [targetId]` | Close by `page`. | | `browser back` | History back on the active tab. | | `browser close` | Release the current owned browser session when done. | -| `browser bind --session ` | Bind the current Chrome tab to a browser session. | -| `browser unbind --session ` | Detach a bound session without closing the user tab/window. | +| `browser bind` | Bind the current Chrome tab to the named browser session. | +| `browser unbind` | Detach the named bound session without closing the user tab/window. | --- @@ -299,9 +299,9 @@ Rule of thumb: **one `state` per page transition, one `find` per follow-up query **Good — one shell, live session:** ```bash -opencli browser --session hn open "https://news.ycombinator.com" \ - && opencli browser --session hn state \ - && opencli browser --session hn click 3 +opencli browser hn open "https://news.ycombinator.com" \ + && opencli browser hn state \ + && opencli browser hn click 3 ``` **Bad — each line is a fresh shell, refs from call 1 are already forgotten when call 2 runs.** (Only a problem if you rely on shell-scoped state; browser refs themselves persist in-page, but interleaving unrelated shells invites races.) Prefer `&&` when the steps are meant to be atomic. @@ -315,24 +315,24 @@ opencli browser --session hn open "https://news.ycombinator.com" \ ### Fill a login form ```bash -opencli browser --session login open "https://example.com/login" -opencli browser --session login state # find [N] for email, password, submit -opencli browser --session login type 4 "me@example.com" -opencli browser --session login type 5 "hunter2" -opencli browser --session login get value 4 # verify (autocomplete can eat chars) -opencli browser --session login click 6 # submit -opencli browser --session login wait selector "[data-testid=account-menu]" --timeout 15000 -opencli browser --session login state # fresh refs on the logged-in page +opencli browser login open "https://example.com/login" +opencli browser login state # find [N] for email, password, submit +opencli browser login type 4 "me@example.com" +opencli browser login type 5 "hunter2" +opencli browser login get value 4 # verify (autocomplete can eat chars) +opencli browser login click 6 # submit +opencli browser login wait selector "[data-testid=account-menu]" --timeout 15000 +opencli browser login state # fresh refs on the logged-in page ``` ### Pick from a long dropdown ```bash -opencli browser --session form state # sidebar shows [12] +opencli browser form find --css "select[name=country]" # the compound.options_total is 137, but compound.current is "" — unselected. -opencli browser --session form select 12 "Uruguay" -opencli browser --session form get value 12 # { value: "uy", match_level: "exact" } +opencli browser form select 12 "Uruguay" +opencli browser form get value 12 # { value: "uy", match_level: "exact" } ``` ### Pick from a custom React dropdown @@ -341,13 +341,13 @@ Use this for Radix, shadcn, Material UI, Mercury-style category fields, and other controls that are not native `