url2md

url2md is an agent Skill which converts web pages to clean, readable Markdown using a small Python script. Handy for documentation, archiving articles, batch exports, or any workflow where you want HTML turned into .md without pulling in third-party packages. Git repository: url2md. Contributions are welcome—open an issue or pull request for bug reports, ideas, or improvements.

Requirements

Python 3 (uses only the standard library: urllib, html.parser)

Quick start

Single URL (prints Markdown to stdout):

python3 scripts/url2md.py https://example.com/article

Save to a file:

python3 scripts/url2md.py https://example.com/article -o article.md

Batch conversion — put one URL per line in a text file, then write each page into a directory:

python3 scripts/url2md.py -f urls.txt -d ./markdown_files/

Features

No dependencies beyond the Python standard library
Reader-style scope — removes script/style/noscript/template, then prefers <article> or <main> (otherwise the full <body>) so Markdown resembles “main article” extraction
Title extraction — prefers Open Graph / Twitter card title when present, else <title>; optional leading # heading
YAML Frontmatter — extracts structured metadata (title, author, published, description, category, source URL) from <meta> tags and Schema.org JSON-LD for knowledge-base workflows
Template system — customize output format with variables like {{title}}, {{content}}, {{author}}, {{published}}, {{date}}, etc.
Link resolution — relative URLs are turned into absolute ones
Basic formatting — headings, paragraphs, lists, links, images, fenced code with optional language, GFM-style tables, bold/italic
Noise removal — skips nav, aside, footer, forms, and similar chrome within the chosen fragment

CLI reference

Option	Description
`url`	Single URL to convert
`-o`, `--output`	Output file (default: stdout)
`-f`, `--file`	File containing URLs (one per line)
`-d`, `--dir`	Output directory for batch mode
`--no-title`	Do not add the page title as H1
`--full-page`	Use full `<body>` instead of preferring `<article>` / `<main>`
`--timeout`	Request timeout in seconds (default: 30)
`--frontmatter`	Add YAML frontmatter with extracted metadata
`-t`, `--template`	Path to a template file for customizing output
`--filename-template`	Batch mode filename pattern (e.g. `{{date}}-{{title}}.md`)
`--download-images`	Download remote images to a local folder (relative to the output file, e.g. `../assets`)
`-v`, `--version`	Show version

More examples:

python3 scripts/url2md.py https://docs.python.org/3
python3 scripts/url2md.py https://docs.python.org/3 -o python-docs.md
python3 scripts/url2md.py -f urls.txt -d ./output/ --timeout 60
python3 scripts/url2md.py https://example.com --no-title
python3 scripts/url2md.py https://example.com/deep-page --full-page -o full.md

# YAML frontmatter output (great for Obsidian / PKM workflows)
python3 scripts/url2md.py https://example.com/article --frontmatter -o article.md

# Custom template
python3 scripts/url2md.py https://example.com/article -t article.tpl -o article.md

# Batch with smart filenames
python3 scripts/url2md.py -f urls.txt -d ./output/ --filename-template "{{date}}-{{title}}.md"
python3 scripts/url2md.py -f urls.txt -d ./output/ --filename-template "{{index}}-{{title}}.md"

# Download images locally (relative to the output Markdown file)
python3 scripts/url2md.py https://example.com/article -o article.md --download-images assets
python3 scripts/url2md.py -f urls.txt -d ./output/ --download-images assets

Template example

Create article.tpl:

---
title: "{{title}}"
author: {{author}}
published: {{published}}
source: "{{source}}"
clipped: {{date}}
---

# {{title}}

> {{description}}

{{content}}

---
Original: [{{source}}]({{url}})

Available variables: {{title}}, {{content}}, {{url}}, {{source}}, {{author}}, {{published}}, {{description}}, {{category}}, {{site_name}}, {{date}}, {{datetime}}.

Filename template variables (batch mode only): {{title}}, {{date}}, {{datetime}}, {{author}}, {{published}}, {{site_name}}, {{url}}, {{index}}.

When to use it

Turn documentation pages into Markdown for local reference
Archive articles as plain text files
Batch a list of URLs into separate files
Build a knowledge base with structured metadata (frontmatter / templates)
Prefer a script when interactive browser or fetch tools are not the right fit

Limitations

Only static HTML is converted; JavaScript is not executed
Complex layouts (multi-column, heavy CSS) may not map cleanly to Markdown
Login or paywalled pages need your own auth or cookies; the script does not log you in
Rate limits or blocking by the remote site still apply to repeated requests

License

MIT-0 (MIT No Attribution).

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

url2md

Requirements

Quick start

Features

CLI reference

Template example

When to use it

Limitations

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

url2md

Requirements

Quick start

Features

CLI reference

Template example

When to use it

Limitations

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages