Skip to content

Commit 512b506

Browse files
committed
feat: add jsonl output and refactor doctor command
1 parent 1e74774 commit 512b506

8 files changed

Lines changed: 638 additions & 85 deletions

File tree

CHANGELOG.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,20 @@ All notable changes to this project are documented in this file.
44

55
## [Unreleased]
66

7+
### Breaking
8+
- Removed root-level `--doctor`; diagnostics must now be invoked via `agent-fetch doctor`.
9+
10+
### Added
11+
- Added `--format` for fetch output selection: `markdown` (default) or `jsonl`.
12+
- Added token-efficient JSONL task payloads with `seq`, `url`, optional `resolved_url`, `resolved_mode`, `content`, optional `meta`, and per-task error rows.
13+
- Added tests for JSONL batch writing, metadata extraction behavior, and safeguards for unknown front matter fields.
14+
15+
### Changed
16+
- Changed metadata behavior in JSONL mode: `--meta` now emits structured `meta` fields instead of front matter injection in `content`.
17+
- Changed CLI command routing to use an internal `web` command for fetch flags while preserving shorthand (`agent-fetch <url>`), so `doctor --help` no longer shows fetch options as global flags.
18+
- Changed root help output to include a dedicated "DEFAULT WEB OPTIONS" section for shorthand discoverability from `agent-fetch -h`.
19+
- Updated README (EN/ZH) and `skills/agent-fetch/SKILL.md` for `--format jsonl` and the new doctor command shape.
20+
721
## [0.4.0] - 2026-02-21
822

923
### Added

README.md

Lines changed: 27 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -100,14 +100,17 @@ Ensure `$(go env GOPATH)/bin` (usually `~/go/bin`) is in your `PATH`.
100100

101101
```bash
102102
agent-fetch [options] <url> [url ...]
103+
agent-fetch web [options] <url> [url ...]
104+
agent-fetch doctor [options]
103105
```
104106

105107
### Flags
106108

107109
| Flag | Default | Description |
108110
|------|---------|-------------|
109111
| `--mode` | `auto` | Fetch mode: `auto` \| `static` \| `browser` \| `raw` |
110-
| `--meta` | `true` | Prepend `title`/`description` front matter (use `--meta=false` to disable) |
112+
| `--format` | `markdown` | Output format: `markdown` \| `jsonl` |
113+
| `--meta` | `true` | Include `title`/`description` metadata (`markdown`: front matter, `jsonl`: `meta` field; use `--meta=false` to disable) |
111114
| `--timeout` | `20s` | HTTP request timeout (applies to static/auto modes) |
112115
| `--browser-timeout` | `30s` | Page-load timeout (applies to browser/auto modes) |
113116
| `--network-idle` | `1200ms` | Wait time after last network activity before capturing content |
@@ -116,7 +119,6 @@ agent-fetch [options] <url> [url ...]
116119
| `--user-agent` | `agent-fetch/0.1` | User-Agent header |
117120
| `--max-body-bytes` | `8388608` | Max response bytes to read |
118121
| `--concurrency` | `4` | Max concurrent fetches for multi-URL requests |
119-
| `--doctor` | `false` | Run environment checks (runtime + headless browser readiness) and print remediation guidance |
120122
| `--browser-path` | | Browser executable path/name override for `browser` and `auto` modes |
121123

122124
### Examples
@@ -143,14 +145,17 @@ agent-fetch --header "Authorization: Bearer $TOKEN" https://example.com
143145
# Batch fetch with concurrency control
144146
agent-fetch --concurrency 4 https://example.com https://example.org
145147

148+
# Structured JSONL output
149+
agent-fetch --format jsonl https://example.com
150+
146151
# Check environment readiness
147-
agent-fetch --doctor
152+
agent-fetch doctor
148153

149154
# Check environment readiness with explicit browser path
150-
agent-fetch --doctor --browser-path /usr/bin/chromium
155+
agent-fetch doctor --browser-path /usr/bin/chromium
151156
```
152157

153-
## Multi-URL Batch
158+
## Multi-URL Batch (Markdown)
154159

155160
When multiple URLs are provided, requests run concurrently (controlled by `--concurrency`) and output is emitted in input order using task markers:
156161

@@ -165,11 +170,26 @@ When multiple URLs are provided, requests run concurrently (controlled by `--con
165170

166171
Exit codes: `0` all succeeded, `1` any task failed, `2` argument/usage error.
167172

173+
## JSONL Output Contract
174+
175+
When `--format jsonl` is used, each task emits one JSON line (no summary line):
176+
177+
```json
178+
{"seq":1,"url":"https://example.com","resolved_mode":"static","content":"...","meta":{"title":"...","description":"..."}}
179+
{"seq":2,"url":"https://bad.example","error":"http request failed: timeout"}
180+
```
181+
182+
Field notes:
183+
- `url`: input URL
184+
- `resolved_url`: emitted only when different from `url`
185+
- `resolved_mode`: one of `markdown`, `static`, `browser`, `raw`
186+
- `meta`: emitted only when `--meta=true` and metadata exists
187+
168188
## Agent Integration
169189

170190
This project ships a [SKILL.md](./skills/agent-fetch/SKILL.md) that can be used with coding agents that support skill files. Point your skill directory to `skills/agent-fetch` and the agent will be able to invoke `agent-fetch` when its built-in fetch capability is insufficient.
171191

172-
`agent-fetch` reads from the command line and writes Markdown to stdout, making it easy to integrate into any agent pipeline or shell-based tool call:
192+
`agent-fetch` reads from the command line and writes results to stdout (`markdown` or `jsonl`), making it easy to integrate into any agent pipeline or shell-based tool call:
173193

174194
```bash
175195
result=$(agent-fetch --mode static https://example.com)
@@ -200,7 +220,7 @@ The table below compares agent-fetch with the built-in web-fetch capabilities fo
200220

201221
Use `--mode static` or `--mode raw` to avoid the browser dependency entirely.
202222

203-
- Run `agent-fetch --doctor` to validate runtime/browser readiness and get guided fixes.
223+
- Run `agent-fetch doctor` to validate runtime/browser readiness and get guided fixes.
204224
- Use `--browser-path` when the browser is installed in a non-default location (common in container images).
205225

206226
## Build

README.zh.md

Lines changed: 27 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -100,14 +100,17 @@ go install github.com/firede/agent-fetch/cmd/agent-fetch@v0.3.0
100100

101101
```bash
102102
agent-fetch [options] <url> [url ...]
103+
agent-fetch web [options] <url> [url ...]
104+
agent-fetch doctor [options]
103105
```
104106

105107
### 参数
106108

107109
| 参数 | 默认值 | 说明 |
108110
|------|--------|------|
109111
| `--mode` | `auto` | 抓取模式:`auto` \| `static` \| `browser` \| `raw` |
110-
| `--meta` | `true` | 输出前附加 `title`/`description` front matter(`--meta=false` 可禁用) |
112+
| `--format` | `markdown` | 输出格式:`markdown` \| `jsonl` |
113+
| `--meta` | `true` | 附加 `title`/`description` 元数据(`markdown` 写入 front matter,`jsonl` 写入 `meta` 字段;`--meta=false` 可禁用) |
111114
| `--timeout` | `20s` | HTTP 请求超时(适用于 static/auto 模式) |
112115
| `--browser-timeout` | `30s` | 页面加载超时(适用于 browser/auto 模式) |
113116
| `--network-idle` | `1200ms` | 最后一次网络活动后等待多久再抓取页面内容 |
@@ -116,7 +119,6 @@ agent-fetch [options] <url> [url ...]
116119
| `--user-agent` | `agent-fetch/0.1` | User-Agent 请求头 |
117120
| `--max-body-bytes` | `8388608` | 最大响应读取字节数 |
118121
| `--concurrency` | `4` | 多 URL 请求时的最大并发数 |
119-
| `--doctor` | `false` | 运行环境检查(运行时 + 无头浏览器可用性),并输出修复建议 |
120122
| `--browser-path` | |`browser` / `auto` 模式指定浏览器可执行文件路径或名称 |
121123

122124
### 示例
@@ -143,14 +145,17 @@ agent-fetch --header "Authorization: Bearer $TOKEN" https://example.com
143145
# 批量抓取,控制并发
144146
agent-fetch --concurrency 4 https://example.com https://example.org
145147

148+
# 结构化 JSONL 输出
149+
agent-fetch --format jsonl https://example.com
150+
146151
# 检查环境可用性
147-
agent-fetch --doctor
152+
agent-fetch doctor
148153

149154
# 使用指定浏览器路径进行环境检查
150-
agent-fetch --doctor --browser-path /usr/bin/chromium
155+
agent-fetch doctor --browser-path /usr/bin/chromium
151156
```
152157

153-
## 多 URL 批量抓取
158+
## 多 URL 批量抓取(Markdown)
154159

155160
传入多个 URL 时,请求会并发执行(通过 `--concurrency` 控制),按输入顺序输出,使用任务标记区分各结果:
156161

@@ -165,11 +170,26 @@ agent-fetch --doctor --browser-path /usr/bin/chromium
165170

166171
退出码:全部成功为 `0`,部分或全部失败为 `1`,参数/用法错误为 `2`
167172

173+
## JSONL 输出约定
174+
175+
当使用 `--format jsonl` 时,每个任务输出一行 JSON(不输出汇总行):
176+
177+
```json
178+
{"seq":1,"url":"https://example.com","resolved_mode":"static","content":"...","meta":{"title":"...","description":"..."}}
179+
{"seq":2,"url":"https://bad.example","error":"http request failed: timeout"}
180+
```
181+
182+
字段说明:
183+
- `url`:输入 URL
184+
- `resolved_url`:仅在与 `url` 不同时输出
185+
- `resolved_mode``markdown``static``browser``raw` 之一
186+
- `meta`:仅在 `--meta=true` 且存在元数据时输出
187+
168188
## Agent 集成
169189

170190
项目附带一份 [SKILL.md](./skills/agent-fetch/SKILL.md),可供支持 skill 文件的编程 Agent 使用。将 skill 目录指向 `skills/agent-fetch`,Agent 即可在内置抓取能力不足时调用 `agent-fetch`
171191

172-
`agent-fetch` 从命令行读取参数、向 stdout 输出 Markdown,可以轻松集成到任意 Agent 管线或基于 shell 的工具调用:
192+
`agent-fetch` 从命令行读取参数、向 stdout 输出结果(`markdown``jsonl`,可以轻松集成到任意 Agent 管线或基于 shell 的工具调用:
173193

174194
```bash
175195
result=$(agent-fetch --mode static https://example.com)
@@ -200,7 +220,7 @@ result=$(agent-fetch --mode static https://example.com)
200220

201221
使用 `--mode static``--mode raw` 可完全避免浏览器依赖。
202222

203-
- 可以运行 `agent-fetch --doctor` 检查运行时/浏览器可用性,并在浏览器模式不可用时获得修复建议。
223+
- 可以运行 `agent-fetch doctor` 检查运行时/浏览器可用性,并在浏览器模式不可用时获得修复建议。
204224
- 当浏览器安装在非默认位置(例如容器镜像内自定义路径)时,使用 `--browser-path` 指定可执行文件。
205225

206226
## 构建

0 commit comments

Comments
 (0)