feat: v0.10.1-alpha — OpenCLI wechat fetch + CAPTCHA detection + host-aware fetch routing#25
Merged
Merged
Conversation
…-aware fetch routing Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
v0.10.0 ship 当天发现 silent-fail bug:
omnireach fetch <mp.weixin.qq.com/s/...>在 crwl + jina 两个 backend 上都被微信"环境异常"验证码拦截, 返垃圾 markdown 但不报错。根因: 两个 backend 都不带登录态, 拿到的是weixin.sogou.com/antispider/...风格的验证页。v0.10.1 补 fetch 这条线的最后一块拼图 ——
mp.weixin.qq.com走 OpenCLI 登录态 Chrome 通道 (opencli weixin download --stdout), 同时给所有 backend 加 CAPTCHA 启发式兜底, 让 silent-fail 永远变 noisy-with-warning。Host-aware fetch routing
commands/fetch.py新加_resolve_backends()host routing +_fetch_via_opencli_weixin():--backend auto走mp.weixin.qq.comExplicit
--backend X永远赢: 用户显式--backend crwl在 wechat URL 上仍走 crwl (尊重用户意图), CAPTCHA 启发式兜底 surface 验证页警告。Click choices 扩到:
["auto", "crwl", "jina", "opencli"]OpenCLI weixin download --stdout (依赖)
依赖 OpenCLI fork commit fe28823 加的
--stdoutflag, mirrorweb/read --stdout, 用 OpenCLI 现成的ArticleDownloadOptions.stdout(article-download.ts 早就有, web/read 已经在用)。我们 fork 这边已经 push, 上游 PR jackwener/OpenCLI#1770 等 review (不阻塞 v0.10.1 ship — 用户装 fork 即可)。_fetch_via_opencli_weixin()三分支 parser (rigorous, 不靠 startswith 单字符判断):opencli_failedstatusfieldverification→ captcha_suspected; 否则 opencli_failed[作者按] ...这种括号开头也走这条)CAPTCHA 启发式 (crwl/jina 兜底)
_looks_like_captcha()post-hoc 检测 — crwl/jina 拿到响应后扫关键词:命中 →
errors[]加captcha_suspected: <backend> returned content containing verification-page keyword '<kw>'; ...。Markdown 字段保留不清空 (graceful degrade, Agent 自己读 errors 决定信不信, 不强行 SystemExit)。OpenCLI 路径不走这条 — 它直接 raise
captcha_suspected:前缀的 RuntimeError (走errorHint早返路径), 已经进 errors。doctor wechat_backends 段
omnireach/doctor.py加WechatBackendStatus+run_wechat_backend_doctor(), 三状态:npm i -g github:Daily-AC/OpenCLIweixin download --help不含--stdout(老 build, < fe28823) → ❌ + 升级提示--stdout都 OK → ✅omnireach doctor --jsonpayload 多一个wechat_backendskey, TTY 模式多一个 Rich Table。Tests
22 new tests:
tests/test_cmd_fetch.py+18: host routing, branch 1/2/3 parser,[作者按]markdown not misparsed, captcha heuristic detect/short-payload/real-article, e2e CLI wechat-url-auto-routes, e2e explicit-crwl-wins, --help shows opencli choicetests/test_doctor.py+4: wechat_backend present-with-stdout / missing-stdout-flag / weixin-download-help-nonzero / opencli-missingFull suite: 278 passing (256 → 278, +22).
Real E2E (mp.weixin.qq.com)
返的 markdown 头几行:
(早上同一个 URL 在 v0.10.0 上拿到 "环境异常" 验证页, v0.10.1 干净拿到 25618 字符真 markdown — fix 完全证明起效。)
升级提示
用户升级到 v0.10.1 后必须跑
npm i -g github:Daily-AC/OpenCLI拉最新 OpenCLI fork (含--stdoutflag), 否则omnireach fetch <wechat-url>会因 OpenCLI 老 build 不识--stdout而 fail。omnireach doctor会 surface 这个状态。Related
docs/superpowers/specs/2026-05-27-opencli-wechat-design.md