You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Web fetch results are often raw HTML/JS/CSS, which is noisy for LLMs. This tool wraps a fallback pipeline so agents can expect Markdown output.
11
+
## Highlights
10
12
11
-
If you use tools like Codex or Claude Code, note that they may already include built-in HTML simplification/fetching. Whether you still need `agent-fetch` depends on your workflow.
13
+
-**Markdown-first output pipeline** -- readability extraction + HTML-to-Markdown conversion, so agents receive clean text instead of noisy HTML/JS/CSS
14
+
-**Headless browser fallback** -- renders JavaScript-heavy pages (SPAs, dynamic dashboards) when static extraction falls short
15
+
-**Custom request headers** -- send `Authorization`, `Cookie`, or any header to access authenticated endpoints
- If response is `text/markdown` or already looks like Markdown, return it
20
-
- Else do static HTML extraction + convert to Markdown (inject `title`/`description` front matter by default)
21
-
- If static result quality is too low, fallback to headless browser render and convert (also inject front matter by default)
22
-
-`static`: never uses browser fallback
23
-
-`browser`: always uses headless browser
24
-
-`raw`: send `Accept: text/markdown`, then print that single HTTP response body as-is (no fallback/conversion)
25
-
-`--meta` (default `true`): control whether non-`raw` outputs include front matter (`title`/`description`). For `auto`/`static` direct markdown responses, it may do one extra HTML request to collect metadata.
26
-
- One or more URL arguments are supported. For multiple URLs, requests run concurrently (configurable via `--concurrency`) and output is emitted in input order.
27
-
28
-
## Runtime dependency
29
-
30
-
`browser` mode requires a Chrome/Chromium browser available on the host.
20
+
```bash
21
+
# Install
22
+
go install github.com/firede/agent-fetch/cmd/agent-fetch@latest
31
23
32
-
`auto` mode may fall back to browser rendering, so it can also require Chrome/Chromium on some pages.
24
+
# Fetch a page
25
+
agent-fetch https://example.com
26
+
```
33
27
34
-
Use `--mode static` or `--mode raw` to avoid browser dependency.
28
+
Or download a prebuilt binary from [Releases](https://github.com/firede/agent-fetch/releases).
35
29
36
-
## Install (with Go)
30
+
## How It Works
37
31
38
-
If Go is already installed locally:
32
+
In the default `auto` mode, agent-fetch runs a three-stage fallback pipeline:
39
33
40
-
```bash
41
-
go install github.com/firede/agent-fetch/cmd/agent-fetch@latest
34
+
```
35
+
Request with Accept: text/markdown
36
+
|
37
+
v
38
+
Markdown response? --yes--> Return as-is
39
+
| no
40
+
v
41
+
Static HTML extraction
42
+
+ Markdown conversion --quality OK?--> Return
43
+
| no
44
+
v
45
+
Headless browser render
46
+
+ extraction + conversion --> Return
42
47
```
43
48
44
-
Install a specific version:
49
+
This means most pages are handled without a browser, keeping things fast, while JS-heavy pages still get rendered correctly.
45
50
46
-
```bash
47
-
go install github.com/firede/agent-fetch/cmd/agent-fetch@v0.2.0
48
-
```
51
+
## Modes
49
52
50
-
Make sure `$(go env GOPATH)/bin` (usually `~/go/bin`) is in your `PATH`.
53
+
| Mode | Behavior | Browser needed |
54
+
|------|----------|----------------|
55
+
|`auto` (default) | Three-stage fallback: native Markdown -> static extraction -> browser render | Only when static quality is low |
56
+
|`static`| Static HTML extraction only, no browser | No |
57
+
|`browser`| Always use headless Chrome/Chromium | Yes |
58
+
|`raw`| Send `Accept: text/markdown`, return HTTP body verbatim | No |
51
59
52
-
## Install (from Releases)
60
+
## Installation
53
61
54
-
1. Download the archive for your platform from the [GitHub Releases](https://github.com/firede/agent-fetch/releases) page.
55
-
2. Extract it and make the binary executable:
62
+
### From Releases
63
+
64
+
1. Download the archive for your platform from [GitHub Releases](https://github.com/firede/agent-fetch/releases).
65
+
2. Extract and make the binary executable:
56
66
57
67
```bash
58
68
chmod +x ./agent-fetch
59
69
```
60
70
61
-
### macOS note
71
+
3. Move the binary to a directory on your `PATH`, or run it directly:
62
72
63
-
Current release binaries are not notarized by Apple (no Apple Developer notarization yet), so Gatekeeper may show:
73
+
```bash
74
+
./agent-fetch https://example.com
75
+
```
64
76
65
-
`“agent-fetch” cannot be opened because Apple cannot check it for malicious software.`
77
+
#### macOS note
66
78
67
-
For local validation, remove the quarantine attribute and run:
79
+
Release binaries are not yet notarized by Apple, so Gatekeeper may block execution. Remove the quarantine attribute to proceed:
68
80
69
81
```bash
70
82
xattr -dr com.apple.quarantine ./agent-fetch
71
-
./agent-fetch https://example.com
72
83
```
73
84
74
-
## Agent Skills
85
+
### With Go
86
+
87
+
```bash
88
+
go install github.com/firede/agent-fetch/cmd/agent-fetch@latest
Fetched content is printed to `stdout` (`raw` mode prints the single HTTP response body unprocessed).
98
-
For multiple URLs, output uses task markers so each result maps back to its input URL:
142
+
## Multi-URL Batch
143
+
144
+
When multiple URLs are provided, requests run concurrently (controlled by `--concurrency`) and output is emitted in input order using task markers:
99
145
100
146
```text
101
147
<!-- count: 3, succeeded: 2, failed: 1 -->
@@ -106,7 +152,42 @@ For multiple URLs, output uses task markers so each result maps back to its inpu
106
152
<!-- error[2]: ... -->
107
153
```
108
154
109
-
Exit code is `0` when all tasks succeed, `1` when any task fails, and `2` for argument/usage errors.
155
+
Exit codes: `0` all succeeded, `1` any task failed, `2` argument/usage error.
156
+
157
+
## Agent Integration
158
+
159
+
This project ships a [SKILL.md](./skills/agent-fetch/SKILL.md) that can be used with coding agents that support skill files. Point your skill directory to `skills/agent-fetch` and the agent will be able to invoke `agent-fetch` when its built-in fetch capability is insufficient.
160
+
161
+
`agent-fetch` reads from the command line and writes Markdown to stdout, making it easy to integrate into any agent pipeline or shell-based tool call:
The table below compares agent-fetch with the built-in web-fetch capabilities found in some coding agents. Actual built-in capabilities vary by product and version.
170
+
171
+
| Scenario | Built-in web fetch | agent-fetch |
172
+
|----------|:------------------:|:-----------:|
173
+
| Basic page fetch with HTML simplification | Yes | Yes |
**How built-in web fetch typically works:** Tools like Claude Code's WebFetch and Codex's built-in fetch retrieve a page over HTTP, convert the HTML to Markdown, and then pass the content through an AI model that may summarize or truncate it to fit the context window. This pipeline is fast and sufficient for most pages, but it usually does not execute JavaScript (so SPA or JS-rendered pages may return incomplete content), does not support custom request headers, and processes one URL at a time.
182
+
183
+
-**No built-in web fetch available** (other agent frameworks, CLI pipelines, CI/CD) -- use agent-fetch as your primary fetch tool.
184
+
-**Built-in web fetch available** -- use agent-fetch as a complement for JS-heavy pages, authenticated endpoints, batch fetching, or when you need the extracted content without summarization.
185
+
186
+
## Runtime Dependencies
187
+
188
+
`browser` and `auto` modes may require Chrome or Chromium on the host.
189
+
190
+
Use `--mode static` or `--mode raw` to avoid the browser dependency entirely.
0 commit comments