@@ -2,97 +2,112 @@ PEP: 784
22Title: Adding Zstandard to the standard library
33Author: Emma Harper Smith <emma@python.org>
44Sponsor: Gregory P. Smith <greg@krypto.org>
5+ Discussions-To: https://discuss.python.org/t/87377
56Status: Draft
67Type: Standards Track
78Created: 06-Apr-2025
89Python-Version: 3.14
10+ Post-History:
11+ `07-Apr-2025 <https://discuss.python.org/t/87377 >`__,
12+
913
1014Abstract
1115========
1216
13- `Zstandard <https://facebook.github.io/zstd/ >`_ is a widely adopted, mature,
14- and highly efficient compression standard. This PEP proposes adding a new
15- module to the Python standard library containing a Python wrapper around Meta's
16- ``zstd `` library, the default implementation. Additionally, to avoid name
17- collisions with packages on PyPI and to present a unified interface to Python
18- users, compression modules in the standard library will be moved under a
19- ``compression.* `` namespace package.
17+ `Zstandard `_ is a widely adopted, mature, and highly efficient compression
18+ standard. This PEP proposes adding a new module to the Python standard library
19+ containing a Python wrapper around Meta's |zstd | library, the default
20+ implementation. Additionally, to avoid name collisions with packages on PyPI
21+ and to present a unified interface to Python users, compression modules in the
22+ standard library will be moved under a ``compression.* `` package.
23+
24+ .. |zstd | replace :: ``zstd ``
25+ .. _zstd : https://facebook.github.io/zstd/
26+ .. _Zstandard : https://facebook.github.io/zstd/
27+
2028
2129Motivation
2230==========
2331
24- CPython has modules for several different compression formats, such as `zlib
25- (DEFLATE) <https://docs.python.org/3/library/zlib.html> `_,
26- `bzip2 <https://docs.python.org/3/library/bz2.html >`_,
27- and `lzma <https://docs.python.org/3/library/lzma.html >`_, each widely used.
28- Including popular compression algorithms matches Python's "batteries included"
29- philosophy of incorporating widely useful standards and utilities. The last
30- compression module added to the language was ``lzma ``, added in Python 3.3.
32+ CPython has modules for several different compression formats, such as
33+ :mod: `zlib (DEFLATE) <zlib> `, :mod: `bzip2 <bz2> `, and :mod: `lzma <lzma> `,
34+ each widely used. Including popular compression algorithms matches Python's
35+ "batteries included" philosophy of incorporating widely useful standards and
36+ utilities. :mod: `!lzma ` is the most recent such module, added in Python 3.3.
3137
32- Since then, Zstandard has become the modern de facto preferred compression
38+ Since then, Zstandard has become the modern * de facto * preferred compression
3339library for both high performance compression and decompression attaining high
3440compression ratios at reasonable CPU and memory cost. Zstandard achieves a much
3541higher compression ratio than bzip2 or zlib (DEFLATE) while decompressing
3642significantly faster than LZMA.
3743
38- Zstandard has seen `widespread adoption in many different areas of computing
39- <https://facebook.github.io/zstd/#references> `_. The numerous hardware
40- implementations demonstrate long-term commitment to Zstandard and an
41- expectation that Zstandard will stay the de facto choice for compression for
42- years to come. Zstandard compression is also implemented in both the ZFS and
43- Btrfs filesystems.
44+ Zstandard has seen `widespread adoption `_ in many different areas of computing.
45+ The numerous hardware implementations demonstrate long-term commitment to
46+ Zstandard and an expectation that Zstandard will stay the *de facto * choice for
47+ compression for years to come. Zstandard compression is also implemented in
48+ both the ZFS _ and Btrfs _ filesystems.
4449
4550Zstandard's highly efficient compression has supplanted other modern
46- compression formats, such as brotli, lzo , and ucl due to its highly efficient
47- compression. While `LZ4 < https://lz4.org/ > `_ is still used in very high
48- throughput scenarios, Zstandard can also be used in some of these contexts.
51+ compression formats, such as brotli _, lzo _ , and ucl _ due to its highly
52+ efficient compression. While `LZ4 `_ is still used in very high throughput
53+ scenarios, Zstandard can also be used in some of these contexts.
4954While inclusion of LZ4 is out of scope, it would be a compelling future
5055addition to the ``compression `` namespace introduced by this PEP.
5156
5257There are several bindings to Zstandard for Python available on PyPI, each with
5358different APIs and choices of how to bind the ``zstd `` library. One goal with
5459introducing an official module in the standard library is to reduce confusion
5560for Python users who want simple compression/decompression APIs for Zstandard.
56- The existing packages can continue providing extended APIs and bindings for
57- other Python implementations such as PyPy or integrate features from newer
58- Zstandard versions.
61+ The existing packages can continue providing extended APIs or integrate
62+ features from newer Zstandard versions.
5963
6064Another reason to add Zstandard support to the standard library is to resolve
61- a long standing `open issue <https://github.com/python/cpython/issues/81276 >`_
62- requesting Zstandard support in the ``tarfile `` module. This issue has the 5th
63- most "thumbs up" of open issues on the CPython tracker, and has garnered a
64- significant amount of discussion and interest. Additionally, the `ZIP format
65- standardizes a Zstandard compression format ID
66- <https://pkwaredownloads.blob.core.windows.net/pkware-general/Documentation/APPNOTE-6.3.8.TXT> `_,
67- and integration with ``zipfile `` would allow opening ZIP archives using
68- Zstandard compression. The reference implementation for this PEP contains
69- integration with the ``zipfile ``, ``tarfile ``, and ``shutil `` modules.
65+ a long standing open issue (`python/cpython#81276 `_) requesting Zstandard
66+ support in the :mod: `tarfile ` module. This issue has the 5th most "thumbs up"
67+ of open issues on the CPython tracker, and has garnered a significant amount of
68+ discussion and interest. Additionally, the ZIP format standardizes a
69+ `Zstandard compression format ID `_, and integration with the :mod: `zipfile `
70+ module would allow opening ZIP archives using Zstandard compression. The
71+ reference implementation for this PEP contains integration with the
72+ :mod: `!zipfile `, :mod: `!tarfile `, and :mod: `shutil ` modules.
7073
7174Zstandard compression could also be used to make Python wheel packages smaller
7275and significantly faster to install. Anaconda found a sizeable speedup when
73- adopting Zstandard for the conda package format
76+ adopting Zstandard for the conda package format:
7477
7578.. epigraph ::
7679
7780 Conda's download sizes are reduced ~30-40%, and extraction is dramatically faster.
7881 [...]
7982 We see approximately a 2.5x overall speedup, almost all thanks to the dramatically faster extraction speed of the zstd compression used in the new file format.
8083
81- -- `Anaconda blog on Zstandard adoption < https://www.anaconda.com/blog/how-we-made-conda-faster-4-7 > `_
84+ -- `Anaconda blog on Zstandard adoption `_
8285
83- `According to lzbench <https://github.com/inikep/lzbench?tab=readme-ov-file#benchmarks >`_,
84- a comprehensive benchmark of many different compression libraries and formats,
8586Zstandard has a significantly higher compression ratio compared to wheel's
86- existing zlib-based compression. While this PEP does *not * prescribe any
87- changes to the wheel format or other packaging standards, having Zstandard
88- bindings in the standard library would enable a future PEP to improve the user
89- experience for Python wheel packages.
87+ existing zlib-based compression, `according to lzbench `_, a comprehensive
88+ benchmark of many different compression libraries and formats.
89+ While this PEP does *not * prescribe any changes to the wheel format or other
90+ packaging standards, having Zstandard bindings in the standard library would
91+ enable a future PEP to improve the user experience for Python wheel packages.
92+
93+ .. _widespread adoption : https://facebook.github.io/zstd/#references
94+ .. _ZFS : https://en.wikipedia.org/wiki/ZFS
95+ .. _Btrfs : https://btrfs.readthedocs.io/
96+ .. _brotli : https://brotli.org/
97+ .. _lzo : https://www.oberhumer.com/opensource/lzo/
98+ .. _ucl : https://www.oberhumer.com/opensource/ucl/
99+ .. _LZ4 : https://lz4.org/
100+ .. _python/cpython#81276 : https://github.com/python/cpython/issues/81276
101+ .. _Zstandard compression format ID : https://pkwaredownloads.blob.core.windows.net/pkware-general/Documentation/APPNOTE-6.3.8.TXT
102+ .. _according to lzbench : https://github.com/inikep/lzbench#benchmarks
103+ .. _Anaconda blog on Zstandard adoption : https://www.anaconda.com/blog/how-we-made-conda-faster-4-7
104+
90105
91106Rationale
92107=========
93108
94- Introduction of a ``compression `` namespace
95- -------------------------------------------
109+ Introduction of a ``compression `` package
110+ -----------------------------------------
96111
97112Both the ``zstd `` and ``zstandard `` import names are claimed by projects on
98113PyPI. To avoid breaking users of one of the existing bindings, this PEP
@@ -130,13 +145,17 @@ name otherwise.
130145Implementation based on ``pyzstd ``
131146----------------------------------
132147
133- The implementation for this PEP is based on the `pyzstd project < https://github.com/Rogdham/pyzstd > `_.
134- This project was chosen as the code was `originally written to be upstreamed < https://github.com/python/cpython/issues/81276#issuecomment-1093824963 > `_
135- to CPython by Ma Lin, who also wrote the `output buffer implementation used in
136- the standard library today <https://github.com/python/cpython/commit/f9bedb630e8a0b7d94e1c7e609b20dfaa2b22231> `_ .
148+ The implementation for this PEP is based on the `pyzstd project `_.
149+ This project was chosen as the code was `originally written to be upstreamed `_
150+ to CPython by Ma Lin, who also wrote the `output buffer implementation `_ used in
151+ the standard library today.
137152The project has since been taken over by Rogdham and is published to PyPI. The
138153APIs in ``pyzstd `` are similar to the APIs for other compression modules in the
139- standard library such as ``bz2 `` and ``lzma ``.
154+ standard library such as :mod: `!bz2 ` and :mod: `!lzma `.
155+
156+ .. _pyzstd project : https://github.com/Rogdham/pyzstd
157+ .. _originally written to be upstreamed : https://github.com/python/cpython/issues/81276#issuecomment-1093824963
158+ .. _output buffer implementation : https://github.com/python/cpython/commit/f9bedb630e8a0b7d94e1c7e609b20dfaa2b22231
140159
141160Minimum supported Zstandard version
142161-----------------------------------
@@ -149,13 +168,14 @@ compatibility with existing LTS Linux distributions, but a newer Zstandard
149168version could likely be chosen given that newer Python releases are generally
150169packaged as part of newer distribution releases.
151170
171+
152172Specification
153173=============
154174
155175The ``compression `` namespace
156176-----------------------------
157177
158- A new namespace package for compression modules will be introduced named
178+ A new namespace for compression modules will be introduced named
159179``compression ``. The top-level module for this package will be empty to begin
160180with, but a standard API for interacting with compression routines may be
161181added in the future to the toplevel.
@@ -167,17 +187,18 @@ A new module, ``compression.zstd`` will be introduced with Zstandard
167187compression APIs that match other compression modules in the standard library,
168188namely
169189
170- * ``compress `` / ``decompress `` - APIs for one-shot compression/decompression
171- * ``ZstdFile `` / ``open `` - APIs for interacting with streams and file-like
172- objects
173- * ``ZstdCompressor `` / ``ZstdDecompressor `` - APIs for incremental compression/
174- decompression
190+ * :func: `!compress ` / :func: `!decompress ` - APIs for one-shot compression
191+ or decompression
192+ * :class: `!ZstdFile ` / :func: `!open ` - APIs for interacting with streams
193+ and file-like objects
194+ * :class: `!ZstdCompressor ` / :class: `!ZstdDecompressor ` - APIs for incremental
195+ compression or decompression
175196
176- It will also contain some Zstandard-specific functionality
197+ It will also contain some Zstandard-specific functionality:
177198
178- * `` ZstdDict `` / `` train_dict `` / `` finalize_dict `` - APIs for interacting with
179- Zstandard dictionaries, which are useful for compressing many small chunks of
180- similar data
199+ * :class: ` ! ZstdDict ` / :func: ` ! train_dict ` / :func: ` ! finalize_dict ` - APIs for
200+ interacting with Zstandard dictionaries, which are useful for compressing
201+ many small chunks of similar data
181202
182203``libzstd `` optional dependency
183204-------------------------------
@@ -222,11 +243,12 @@ Backwards Compatibility
222243
223244The main compatibility concern is usage of existing standard library
224245compression APIs with the existing import names. These names will be
225- deprecated in 3.19 and will be removed in 3.24. Given the long coexistance of
246+ deprecated in 3.19 and will be removed in 3.24. Given the long coexistence of
226247the modules and a 5 year deprecation period, most users will likely migrate to
227248the new import names well before then. Additionally, a libCST codemod can be
228249provided to automatically rewrite imports, easing the migration.
229250
251+
230252Security Implications
231253=====================
232254
@@ -241,13 +263,15 @@ Taking on a new dependency also always has security risks, but the ``zstd``
241263library is mature, fuzzed on each commit, and `participates in Meta's bug bounty
242264program <https://github.com/facebook/zstd/blob/dev/SECURITY.md> `_.
243265
266+
244267How to Teach This
245268=================
246269
247270Documentation for the new module is in the reference implementation branch. The
248271documentation for other modules will be updated to discuss the deprecation of
249272their existing import names, and how to migrate.
250273
274+
251275Reference Implementation
252276========================
253277
@@ -258,6 +282,7 @@ integration added. It also contains the re-exports of other compression
258282modules. Deprecations for the existing import names will be added once a
259283decision is reached regarding the open issues.
260284
285+
261286Rejected Ideas
262287==============
263288
@@ -273,6 +298,7 @@ import name ``lz4``. Instead of solving this issue for each compression format,
273298it is better to solve it once and for all by using the already-claimed
274299``compression `` namespace.
275300
301+
276302Copyright
277303=========
278304
0 commit comments