Skip to content

feat: v0.10.1-alpha — OpenCLI wechat fetch + CAPTCHA detection + host-aware fetch routing#25

Merged
Daily-AC merged 1 commit into
mainfrom
feat/v0.10.1-alpha-wechat-fetch
May 27, 2026
Merged

feat: v0.10.1-alpha — OpenCLI wechat fetch + CAPTCHA detection + host-aware fetch routing#25
Daily-AC merged 1 commit into
mainfrom
feat/v0.10.1-alpha-wechat-fetch

Conversation

@Daily-AC
Copy link
Copy Markdown
Owner

Summary

v0.10.0 ship 当天发现 silent-fail bug: omnireach fetch <mp.weixin.qq.com/s/...> 在 crwl + jina 两个 backend 上都被微信"环境异常"验证码拦截, 返垃圾 markdown 但不报错。根因: 两个 backend 都不带登录态, 拿到的是 weixin.sogou.com/antispider/... 风格的验证页。

v0.10.1 补 fetch 这条线的最后一块拼图 —— mp.weixin.qq.com 走 OpenCLI 登录态 Chrome 通道 (opencli weixin download --stdout), 同时给所有 backend 加 CAPTCHA 启发式兜底, 让 silent-fail 永远变 noisy-with-warning。

Host-aware fetch routing

commands/fetch.py 新加 _resolve_backends() host routing + _fetch_via_opencli_weixin():

Host --backend auto 备注
mp.weixin.qq.com opencli only 登录态 cookie-strategy, 走 OpenCLI fork (Daily-AC/OpenCLI fe28823+)
其它 host crwl → jina (preserved v0.10) 不变

Explicit --backend X 永远赢: 用户显式 --backend crwl 在 wechat URL 上仍走 crwl (尊重用户意图), CAPTCHA 启发式兜底 surface 验证页警告。

omnireach fetch https://mp.weixin.qq.com/s/abc            # auto → opencli (登录态)
omnireach fetch https://mp.weixin.qq.com/s/abc --backend crwl  # 尊重 user, crwl + 验证页警告
omnireach fetch https://example.com/article               # auto → crwl → jina (v0.10 行为)
omnireach fetch <url> --backend opencli                   # 显式 opencli (新加 click choice)

Click choices 扩到: ["auto", "crwl", "jina", "opencli"]

OpenCLI weixin download --stdout (依赖)

依赖 OpenCLI fork commit fe28823 加的 --stdout flag, mirror web/read --stdout, 用 OpenCLI 现成的 ArticleDownloadOptions.stdout (article-download.ts 早就有, web/read 已经在用)。我们 fork 这边已经 push, 上游 PR jackwener/OpenCLI#1770 等 review (不阻塞 v0.10.1 ship — 用户装 fork 即可)。

_fetch_via_opencli_weixin() 三分支 parser (rigorous, 不靠 startswith 单字符判断):

retcode stdout 形态 路径
≠0 (stderr) raise opencli_failed
=0 JSON row with status field parse status: 含 verification → captcha_suspected; 否则 opencli_failed
=0 不是 JSON / 没 status 字段 真 markdown body (legit [作者按] ... 这种括号开头也走这条)

CAPTCHA 启发式 (crwl/jina 兜底)

_looks_like_captcha() post-hoc 检测 — crwl/jina 拿到响应后扫关键词:

CAPTCHA_KEYWORDS = (
    "环境异常", "完成验证后即可继续访问", "请输入验证码", "请完成安全验证",
    "Cloudflare", "Just a moment", "Checking your browser",
)

命中 → errors[]captcha_suspected: <backend> returned content containing verification-page keyword '<kw>'; ...Markdown 字段保留不清空 (graceful degrade, Agent 自己读 errors 决定信不信, 不强行 SystemExit)。

OpenCLI 路径不走这条 — 它直接 raise captcha_suspected: 前缀的 RuntimeError (走 errorHint 早返路径), 已经进 errors。

doctor wechat_backends 段

omnireach/doctor.pyWechatBackendStatus + run_wechat_backend_doctor(), 三状态:

  • opencli 不在 PATH → ❌ + npm i -g github:Daily-AC/OpenCLI
  • opencli 在 PATH 但 weixin download --help 不含 --stdout (老 build, < fe28823) → ❌ + 升级提示
  • opencli + --stdout 都 OK → ✅

omnireach doctor --json payload 多一个 wechat_backends key, TTY 模式多一个 Rich Table。

Tests

22 new tests:

  • tests/test_cmd_fetch.py +18: host routing, branch 1/2/3 parser, [作者按] markdown not misparsed, captcha heuristic detect/short-payload/real-article, e2e CLI wechat-url-auto-routes, e2e explicit-crwl-wins, --help shows opencli choice
  • tests/test_doctor.py +4: wechat_backend present-with-stdout / missing-stdout-flag / weixin-download-help-nonzero / opencli-missing

Full suite: 278 passing (256 → 278, +22).

Real E2E (mp.weixin.qq.com)

$ omnireach fetch https://mp.weixin.qq.com/s/<real-token> --json | jq '.backend, (.content_markdown | length), .errors'
"opencli"
25618
[]

返的 markdown 头几行:

# 一文读懂Harness Engineering:从14篇工程文章中,寻找那个让AI不再离经叛道的壳|Hao好聊趋势
> 公众号: 腾讯科技
> 发布时间: 2026年4月2日 15:22

(早上同一个 URL 在 v0.10.0 上拿到 "环境异常" 验证页, v0.10.1 干净拿到 25618 字符真 markdown — fix 完全证明起效。)

升级提示

用户升级到 v0.10.1 后必须跑 npm i -g github:Daily-AC/OpenCLI 拉最新 OpenCLI fork (含 --stdout flag), 否则 omnireach fetch <wechat-url> 会因 OpenCLI 老 build 不识 --stdout 而 fail。omnireach doctor 会 surface 这个状态。

Related

…-aware fetch routing

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Daily-AC Daily-AC merged commit 1c37612 into main May 27, 2026
@Daily-AC Daily-AC deleted the feat/v0.10.1-alpha-wechat-fetch branch May 27, 2026 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant