Skip to content

Commit 160dcff

Browse files
committed
move to sanitization.md and document for CLI
1 parent cddb247 commit 160dcff

6 files changed

Lines changed: 109 additions & 72 deletions

File tree

.spell-dict

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@ munge
7878
namespace
7979
NanoDOM
8080
Neale
81+
nh3
8182
nosetests
8283
OrderedDict
8384
OrderedDicts

docs/cli.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,12 +35,19 @@ For example:
3535
echo "Some **Markdown** text." | python -m markdown > output.html
3636
```
3737

38-
Use the `--help` option for a list all available options and arguments:
38+
Use the `--help` option for a list of all available options and arguments:
3939

4040
```bash
4141
python -m markdown --help
4242
```
4343

44+
!!! warning
45+
46+
The Python-Markdown library does ***not*** sanitize its HTML output. If
47+
you are processing Markdown input from an untrusted source, it is your
48+
responsibility to ensure that it is properly sanitized. For more
49+
information see [Sanitizing HTML Output](sanitization.md).
50+
4451
If you don't want to call the python executable directly (using the `-m` flag),
4552
follow the instructions below to use a wrapper script:
4653

docs/reference.md

Lines changed: 17 additions & 69 deletions
Original file line numberDiff line numberDiff line change
@@ -31,36 +31,10 @@ method appropriately ([see below](#convert)).
3131

3232
The Python-Markdown library does ***not*** sanitize its HTML output. If
3333
you are processing Markdown input from an untrusted source, it is your
34-
responsibility to ensure that it is properly sanitized. See [Markdown and
35-
XSS] for an overview of some of the dangers and [Improper markup
36-
sanitization in popular software] for notes on best practices to ensure
37-
HTML is properly sanitized.
38-
39-
The developers of Python-Markdown recommend using [JustHTML] as a
40-
sanitizer on the output of `markdown.markdown`. JustHTML includes a
41-
built-in HTML sanitizer. When you pass the HTML output through JustHTML
42-
(`JustHTML(markdown.markdown(text), fragment=True).to_html())`), it
43-
is sanitized by default according to a strict [allow list policy]. The
44-
policy can be [customized] if necessary.
45-
46-
If you cannot use JustHTML for some reason, some alternatives include
47-
[`nh3`][nh3] or [`bleach`][bleach][^1]. However, be aware that those
48-
libraries will not be sufficient in themselves and will require
49-
customization. Some useful lists of allowed tags and attributes can be
50-
found in the [`bleach-allowlist`][bleach-allowlist] library, which should
51-
work with either sanitizer.
52-
53-
54-
[Markdown and XSS]: https://michelf.ca/blog/2010/markdown-and-xss/
55-
[Improper markup sanitization in popular software]: https://github.com/ChALkeR/notes/blob/master/Improper-markup-sanitization.md
56-
[JustHTML]: https://emilstenstrom.github.io/justhtml/
57-
[allow list policy]: https://emilstenstrom.github.io/justhtml/html-cleaning.html#default-sanitization-policy
58-
[customized]: https://emilstenstrom.github.io/justhtml/html-cleaning.html#use-a-custom-sanitization-policy
59-
[nh3]: https://nh3.readthedocs.io/en/latest/
60-
[bleach]: http://bleach.readthedocs.org/en/latest/
61-
[bleach-allowlist]: https://github.com/yourcelf/bleach-allowlist
62-
[^1]: Note that the [bleach] project has been [deprecated](https://github.com/mozilla/bleach/issues/698).
63-
However, it may be the only option for some users.
34+
responsibility to ensure that it is properly sanitized. For more
35+
information see [Sanitizing HTML Output].
36+
37+
[Sanitizing HTML Output]: sanitization.md
6438

6539
The following options are available on the `markdown.markdown` function:
6640

@@ -216,17 +190,12 @@ __tab_length__{: #tab_length }:
216190

217191
!!! warning
218192

219-
The Python-Markdown library does ***not*** sanitize its HTML output. If
220-
you are processing Markdown input from an untrusted source, it is your
221-
responsibility to ensure that it is properly sanitized. See [Markdown and
222-
XSS] for an overview of some of the dangers and [Improper markup
223-
sanitization in popular software] for notes on best practices to ensure
224-
HTML is properly sanitized.
225-
226-
As `markdown.markdownFromFile` writes directly to the file system, there
227-
is no easy way to sanitize the output from Python code. Therefore, it is
193+
The Python-Markdown library does ***not*** sanitize its HTML output. As
194+
`markdown.markdownFromFile` writes directly to the file system, there is
195+
no easy way to sanitize the output from Python code. Therefore, it is
228196
recommended that the `markdown.markdownFromFile` function not be used on
229-
input from an untrusted source.
197+
input from an untrusted source. For more information see [Sanitizing HTML
198+
Output].
230199

231200
With a few exceptions, `markdown.markdownFromFile` accepts the same options as
232201
`markdown.markdown`. It does **not** accept a `text` (or Unicode) string.
@@ -284,24 +253,8 @@ string must be passed to one of two instance methods.
284253

285254
The Python-Markdown library does ***not*** sanitize its HTML output. If
286255
you are processing Markdown input from an untrusted source, it is your
287-
responsibility to ensure that it is properly sanitized. See [Markdown and
288-
XSS] for an overview of some of the dangers and [Improper markup
289-
sanitization in popular software] for notes on best practices to ensure
290-
HTML is properly sanitized.
291-
292-
The developers of Python-Markdown recommend using [JustHTML] as a
293-
sanitizer on the output of `Markdown.convert`. JustHTML includes a
294-
built-in HTML sanitizer. When you pass the HTML output through JustHTML
295-
(`JustHTML(md.convert(text), fragment=True).to_html())`), it
296-
is sanitized by default according to a strict [allow list policy]. The
297-
policy can be [customized] if necessary.
298-
299-
If you cannot use JustHTML for some reason, some alternatives include
300-
[`nh3`][nh3] or [`bleach`][bleach][^1]. However, be aware that those
301-
libraries will not be sufficient in themselves and will require
302-
customization. Some useful lists of allowed tags and attributes can be
303-
found in the [`bleach-allowlist`][bleach-allowlist] library, which should
304-
work with either sanitizer.
256+
responsibility to ensure that it is properly sanitized. For more
257+
information see [Sanitizing HTML Output].
305258

306259
The `source` text must meet the same requirements as the [`text`](#text)
307260
argument of the [`markdown.markdown`](#markdown) function.
@@ -334,17 +287,12 @@ html3 = md.reset().convert(text3)
334287

335288
!!! warning
336289

337-
The Python-Markdown library does ***not*** sanitize its HTML output. If
338-
you are processing Markdown input from an untrusted source, it is your
339-
responsibility to ensure that it is properly sanitized. See [Markdown and
340-
XSS] for an overview of some of the dangers and [Improper markup
341-
sanitization in popular software] for notes on best practices to ensure
342-
HTML is properly sanitized.
343-
344-
As `Markdown.convertFile` writes directly to the file system, there
345-
is no easy way to sanitize the output from Python code. Therefore, it is
346-
recommended that the `Markdown.convertFile` method not be used on
347-
input from an untrusted source.
290+
The Python-Markdown library does ***not*** sanitize its HTML output. As
291+
`Markdown.convertFile` writes directly to the file system, there is no
292+
easy way to sanitize the output from Python code. Therefore, it is
293+
recommended that the `Markdown.convertFile` method not be used on input
294+
from an untrusted source. For more information see [Sanitizing HTML
295+
Output].
348296

349297
The arguments of this method are identical to the arguments of the same
350298
name on the `markdown.markdownFromFile` function ([`input`](#input),

docs/sanitization.md

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
title: Sanitization and Security
2+
3+
# Sanitizing HTML Output
4+
5+
The Python-Markdown library does ***not*** sanitize its HTML output. If you
6+
are processing Markdown input from an untrusted source, it is your
7+
responsibility to ensure that it is properly sanitized. See _[Markdown and
8+
XSS]_ for an overview of some of the dangers and _[Improper markup sanitization
9+
in popular software]_ for notes on best practices to ensure HTML is properly
10+
sanitized. With those concerns in mind, some recommendations are provided
11+
below to ensure that any input from an untrusted source is properly
12+
sanitized.
13+
14+
That said, if you fully trust the source of your input, you may choose to do
15+
nothing. Conversely, you may find solutions other than those suggested here.
16+
However, you do so at your own risk.
17+
18+
## Using JustHTML
19+
20+
[JustHTML] is recommended as a sanitizer on the output of `markdown.markdown`
21+
or `Markdown.convert`. When you pass HTML output through JustHTML, it is
22+
sanitized by default according to a strict [allow list policy]. The policy
23+
can be [customized] if necessary.
24+
25+
``` python
26+
import markdown
27+
from justhtml import JustHTML
28+
29+
html = markdown.markdown(text)
30+
safe_html = JustHTML(html, fragment=True).to_html()
31+
```
32+
33+
## Using nh3 or bleach
34+
35+
If you cannot use JustHTML for some reason, some alternatives include [nh3] or
36+
[bleach][^1]. However, be aware that these libraries will not be sufficient
37+
in themselves and will require customization. Some useful lists of allowed
38+
tags and attributes can be found in the [`bleach-allowlist`]
39+
[bleach-allowlist] library, which should work with both nh3 and bleach as nh3
40+
mirrors bleach's API.
41+
42+
``` python
43+
import markdown
44+
import bleach
45+
from bleach_allowlist import markdown_tags, markdown_attrs
46+
47+
html = markdown.markdown(text)
48+
safe_html = bleach.clean(html, markdown_tags, markdown_attrs)
49+
```
50+
51+
[^1]: The [bleach] project has been [deprecated](https://github.com/mozilla/bleach/issues/698).
52+
However, it may be the only option for some users as [nh3] is a set of Python bindings to a Rust library.
53+
54+
## Sanitizing on the Command Line
55+
56+
Both Python-Markdown and JustHTML provide command line interfaces which read
57+
from STDIN and write to STDOUT. Therefore, they can be used togeather to
58+
ensure that the output from untrusted input is properly sanitized.
59+
60+
```sh
61+
echo "Some **Markdown** text." | python -m markdown | justhtml - --fragment > safe_output.html
62+
```
63+
64+
For more information on JustHTML's Command Line Interface, see the
65+
[documentation][JustHTML_CLI]. Use the `--help` option for a list of all available
66+
options and arguments to the `markdown` command.
67+
68+
[Markdown and XSS]: https://michelf.ca/blog/2010/markdown-and-xss/
69+
[Improper markup sanitization in popular software]: https://github.com/ChALkeR/notes/blob/master/Improper-markup-sanitization.md
70+
[JustHTML]: https://emilstenstrom.github.io/justhtml/
71+
[allow list policy]: https://emilstenstrom.github.io/justhtml/html-cleaning.html#default-sanitization-policy
72+
[customized]: https://emilstenstrom.github.io/justhtml/html-cleaning.html#use-a-custom-sanitization-policy
73+
[nh3]: https://nh3.readthedocs.io/en/latest/
74+
[bleach]: http://bleach.readthedocs.org/en/latest/
75+
[bleach-allowlist]: https://github.com/yourcelf/bleach-allowlist
76+
[JustHTML_CLI]: https://emilstenstrom.github.io/justhtml/cli.html

markdown/__main__.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -49,10 +49,14 @@ def parse_options(args=None, values=None):
4949
usage = """%prog [options] [INPUTFILE]
5050
(STDIN is assumed if no INPUTFILE is given)"""
5151
desc = "A Python implementation of John Gruber's Markdown. " \
52-
"https://Python-Markdown.github.io/"
52+
"https://python-markdown.github.io/"
5353
ver = "%%prog %s" % markdown.__version__
54+
epilog = "WARNING: The Python-Markdown library does NOT sanitize its HTML output. If " \
55+
"you are processing Markdown input from an untrusted source, it is your " \
56+
"responsibility to ensure that it is properly sanitized. For more " \
57+
"information see <https://python-markdown.github.io/sanitization/>."
5458

55-
parser = optparse.OptionParser(usage=usage, description=desc, version=ver)
59+
parser = optparse.OptionParser(usage=usage, description=desc, version=ver, epilog=epilog)
5660
parser.add_option("-f", "--file", dest="filename", default=None,
5761
help="Write output to OUTPUT_FILE. Defaults to STDOUT.",
5862
metavar="OUTPUT_FILE")

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ nav:
2222
- Installation: install.md
2323
- Library Reference: reference.md
2424
- Command Line: cli.md
25+
- Sanitization and Security: sanitization.md
2526
- Extensions: extensions/index.md
2627
- Officially Supported Extensions:
2728
- Abbreviations: extensions/abbreviations.md

0 commit comments

Comments
 (0)