Skip to content

Commit 60b4973

Browse files
committed
docs: clarify use cases, install steps, and agent integration
1 parent ccb348d commit 60b4973

4 files changed

Lines changed: 318 additions & 154 deletions

File tree

README.md

Lines changed: 131 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -1,101 +1,147 @@
11
# agent-fetch
22

3-
A Go CLI that always tries to return Markdown for web pages in AI-agent workflows.
3+
Fetch web pages as clean Markdown for AI-agent workflows.
44

55
[中文](./README.zh.md) | **English**
66

7-
## Why
7+
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](./LICENSE)
8+
[![Go](https://github.com/firede/agent-fetch/actions/workflows/ci.yml/badge.svg)](https://github.com/firede/agent-fetch/actions/workflows/ci.yml)
9+
[![Release](https://img.shields.io/github/v/release/firede/agent-fetch)](https://github.com/firede/agent-fetch/releases)
810

9-
Web fetch results are often raw HTML/JS/CSS, which is noisy for LLMs. This tool wraps a fallback pipeline so agents can expect Markdown output.
11+
## Highlights
1012

11-
If you use tools like Codex or Claude Code, note that they may already include built-in HTML simplification/fetching. Whether you still need `agent-fetch` depends on your workflow.
13+
- **Markdown-first output pipeline** -- readability extraction + HTML-to-Markdown conversion, so agents receive clean text instead of noisy HTML/JS/CSS
14+
- **Headless browser fallback** -- renders JavaScript-heavy pages (SPAs, dynamic dashboards) when static extraction falls short
15+
- **Custom request headers** -- send `Authorization`, `Cookie`, or any header to access authenticated endpoints
16+
- **Multi-URL batch fetching** -- fetch multiple pages concurrently with structured, per-URL output
1217

13-
## Behavior
18+
## Quick Start
1419

15-
`agent-fetch` uses four modes:
16-
17-
- `auto` (default):
18-
- Request with `Accept: text/markdown`
19-
- If response is `text/markdown` or already looks like Markdown, return it
20-
- Else do static HTML extraction + convert to Markdown (inject `title`/`description` front matter by default)
21-
- If static result quality is too low, fallback to headless browser render and convert (also inject front matter by default)
22-
- `static`: never uses browser fallback
23-
- `browser`: always uses headless browser
24-
- `raw`: send `Accept: text/markdown`, then print that single HTTP response body as-is (no fallback/conversion)
25-
- `--meta` (default `true`): control whether non-`raw` outputs include front matter (`title`/`description`). For `auto`/`static` direct markdown responses, it may do one extra HTML request to collect metadata.
26-
- One or more URL arguments are supported. For multiple URLs, requests run concurrently (configurable via `--concurrency`) and output is emitted in input order.
27-
28-
## Runtime dependency
29-
30-
`browser` mode requires a Chrome/Chromium browser available on the host.
20+
```bash
21+
# Install
22+
go install github.com/firede/agent-fetch/cmd/agent-fetch@latest
3123

32-
`auto` mode may fall back to browser rendering, so it can also require Chrome/Chromium on some pages.
24+
# Fetch a page
25+
agent-fetch https://example.com
26+
```
3327

34-
Use `--mode static` or `--mode raw` to avoid browser dependency.
28+
Or download a prebuilt binary from [Releases](https://github.com/firede/agent-fetch/releases).
3529

36-
## Install (with Go)
30+
## How It Works
3731

38-
If Go is already installed locally:
32+
In the default `auto` mode, agent-fetch runs a three-stage fallback pipeline:
3933

40-
```bash
41-
go install github.com/firede/agent-fetch/cmd/agent-fetch@latest
34+
```
35+
Request with Accept: text/markdown
36+
|
37+
v
38+
Markdown response? --yes--> Return as-is
39+
| no
40+
v
41+
Static HTML extraction
42+
+ Markdown conversion --quality OK?--> Return
43+
| no
44+
v
45+
Headless browser render
46+
+ extraction + conversion --> Return
4247
```
4348

44-
Install a specific version:
49+
This means most pages are handled without a browser, keeping things fast, while JS-heavy pages still get rendered correctly.
4550

46-
```bash
47-
go install github.com/firede/agent-fetch/cmd/agent-fetch@v0.2.0
48-
```
51+
## Modes
4952

50-
Make sure `$(go env GOPATH)/bin` (usually `~/go/bin`) is in your `PATH`.
53+
| Mode | Behavior | Browser needed |
54+
|------|----------|----------------|
55+
| `auto` (default) | Three-stage fallback: native Markdown -> static extraction -> browser render | Only when static quality is low |
56+
| `static` | Static HTML extraction only, no browser | No |
57+
| `browser` | Always use headless Chrome/Chromium | Yes |
58+
| `raw` | Send `Accept: text/markdown`, return HTTP body verbatim | No |
5159

52-
## Install (from Releases)
60+
## Installation
5361

54-
1. Download the archive for your platform from the [GitHub Releases](https://github.com/firede/agent-fetch/releases) page.
55-
2. Extract it and make the binary executable:
62+
### From Releases
63+
64+
1. Download the archive for your platform from [GitHub Releases](https://github.com/firede/agent-fetch/releases).
65+
2. Extract and make the binary executable:
5666

5767
```bash
5868
chmod +x ./agent-fetch
5969
```
6070

61-
### macOS note
71+
3. Move the binary to a directory on your `PATH`, or run it directly:
6272

63-
Current release binaries are not notarized by Apple (no Apple Developer notarization yet), so Gatekeeper may show:
73+
```bash
74+
./agent-fetch https://example.com
75+
```
6476

65-
`“agent-fetch” cannot be opened because Apple cannot check it for malicious software.`
77+
#### macOS note
6678

67-
For local validation, remove the quarantine attribute and run:
79+
Release binaries are not yet notarized by Apple, so Gatekeeper may block execution. Remove the quarantine attribute to proceed:
6880

6981
```bash
7082
xattr -dr com.apple.quarantine ./agent-fetch
71-
./agent-fetch https://example.com
7283
```
7384

74-
## Agent Skills
85+
### With Go
86+
87+
```bash
88+
go install github.com/firede/agent-fetch/cmd/agent-fetch@latest
89+
```
7590

76-
Skill location: [`skills/agent-fetch`](./skills/agent-fetch/SKILL.md).
91+
Install a specific version:
7792

78-
It includes installation guidance and usage instructions for `agent-fetch`.
93+
```bash
94+
go install github.com/firede/agent-fetch/cmd/agent-fetch@v0.3.0
95+
```
96+
97+
Ensure `$(go env GOPATH)/bin` (usually `~/go/bin`) is in your `PATH`.
7998

8099
## Usage
81100

82101
```bash
83-
agent-fetch <url> [url ...]
102+
agent-fetch [options] <url> [url ...]
84103
```
85104

86-
Common flags:
105+
### Flags
106+
107+
| Flag | Default | Description |
108+
|------|---------|-------------|
109+
| `--mode` | `auto` | Fetch mode: `auto` \| `static` \| `browser` \| `raw` |
110+
| `--meta` | `true` | Prepend `title`/`description` front matter (use `--meta=false` to disable) |
111+
| `--timeout` | `20s` | HTTP request timeout (applies to static/auto modes) |
112+
| `--browser-timeout` | `30s` | Page-load timeout (applies to browser/auto modes) |
113+
| `--network-idle` | `1200ms` | Wait time after last network activity before capturing content |
114+
| `--wait-selector` | | CSS selector to wait for before capturing, e.g. `article` |
115+
| `--header` | | Custom request header, repeatable. e.g. `--header 'Authorization: Bearer token'` |
116+
| `--user-agent` | `agent-fetch/0.1` | User-Agent header |
117+
| `--max-body-bytes` | `8388608` | Max response bytes to read |
118+
| `--concurrency` | `4` | Max concurrent fetches for multi-URL requests |
119+
120+
### Examples
87121

88122
```bash
89-
agent-fetch --mode auto --timeout 20s --browser-timeout 30s https://example.com
123+
# Default auto mode
124+
agent-fetch https://example.com
125+
126+
# Force browser rendering for a JS-heavy page
90127
agent-fetch --mode browser --wait-selector 'article' https://example.com
128+
129+
# Static extraction without front matter
91130
agent-fetch --mode static --meta=false https://example.com
131+
132+
# Get raw HTTP response body
92133
agent-fetch --mode raw https://example.com
93-
agent-fetch --mode static --concurrency 4 https://example.com https://example.org
134+
135+
# Authenticated request
94136
agent-fetch --header 'Authorization: Bearer <token>' https://example.com
137+
138+
# Batch fetch with concurrency control
139+
agent-fetch --concurrency 4 https://example.com https://example.org
95140
```
96141

97-
Fetched content is printed to `stdout` (`raw` mode prints the single HTTP response body unprocessed).
98-
For multiple URLs, output uses task markers so each result maps back to its input URL:
142+
## Multi-URL Batch
143+
144+
When multiple URLs are provided, requests run concurrently (controlled by `--concurrency`) and output is emitted in input order using task markers:
99145

100146
```text
101147
<!-- count: 3, succeeded: 2, failed: 1 -->
@@ -106,7 +152,42 @@ For multiple URLs, output uses task markers so each result maps back to its inpu
106152
<!-- error[2]: ... -->
107153
```
108154

109-
Exit code is `0` when all tasks succeed, `1` when any task fails, and `2` for argument/usage errors.
155+
Exit codes: `0` all succeeded, `1` any task failed, `2` argument/usage error.
156+
157+
## Agent Integration
158+
159+
This project ships a [SKILL.md](./skills/agent-fetch/SKILL.md) that can be used with coding agents that support skill files. Point your skill directory to `skills/agent-fetch` and the agent will be able to invoke `agent-fetch` when its built-in fetch capability is insufficient.
160+
161+
`agent-fetch` reads from the command line and writes Markdown to stdout, making it easy to integrate into any agent pipeline or shell-based tool call:
162+
163+
```bash
164+
result=$(agent-fetch --mode static https://example.com)
165+
```
166+
167+
## When Do You Need This?
168+
169+
The table below compares agent-fetch with the built-in web-fetch capabilities found in some coding agents. Actual built-in capabilities vary by product and version.
170+
171+
| Scenario | Built-in web fetch | agent-fetch |
172+
|----------|:------------------:|:-----------:|
173+
| Basic page fetch with HTML simplification | Yes | Yes |
174+
| JavaScript-rendered pages (SPAs) | Varies | Yes (headless browser) |
175+
| Custom headers (auth, cookies) | Varies | Yes (`--header`) |
176+
| No AI summarization (outputs extracted body as-is) | Varies | Yes (subject to `--max-body-bytes`) |
177+
| Batch fetch multiple URLs concurrently | Varies | Yes (`--concurrency`) |
178+
| CSS selector-based wait/extraction | Varies | Yes (`--wait-selector`) |
179+
| Works outside coding agents (CLI, CI/CD) | N/A | Yes (standalone CLI) |
180+
181+
**How built-in web fetch typically works:** Tools like Claude Code's WebFetch and Codex's built-in fetch retrieve a page over HTTP, convert the HTML to Markdown, and then pass the content through an AI model that may summarize or truncate it to fit the context window. This pipeline is fast and sufficient for most pages, but it usually does not execute JavaScript (so SPA or JS-rendered pages may return incomplete content), does not support custom request headers, and processes one URL at a time.
182+
183+
- **No built-in web fetch available** (other agent frameworks, CLI pipelines, CI/CD) -- use agent-fetch as your primary fetch tool.
184+
- **Built-in web fetch available** -- use agent-fetch as a complement for JS-heavy pages, authenticated endpoints, batch fetching, or when you need the extracted content without summarization.
185+
186+
## Runtime Dependencies
187+
188+
`browser` and `auto` modes may require Chrome or Chromium on the host.
189+
190+
Use `--mode static` or `--mode raw` to avoid the browser dependency entirely.
110191

111192
## Build
112193

0 commit comments

Comments
 (0)