Skip to content

Commit 7339b8a

Browse files
committed
fix(docs): correct details of padding and <~ ~> as implemented
1 parent 2e2dffe commit 7339b8a

File tree

1 file changed

+40
-25
lines changed

1 file changed

+40
-25
lines changed

Doc/library/base64.rst

Lines changed: 40 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -191,19 +191,27 @@ POST request.
191191
Base85 Encodings
192192
-----------------
193193

194-
Base85 encoding is not formally specified but rather a de facto standard,
195-
thus different systems perform the encoding differently.
194+
Base85 encoding is a family of algorithms which represent four input
195+
bytes using five ASCII characters. Originally implemented in the Unix
196+
``btoa(1)`` utility, a version of it was later adopted by Adobe in the
197+
PostScript language and is standardized as ISO 32000-2:2020 (PDF 2.0),
198+
section 7.4.3. This version, in both its ``btoa`` and PDF variants,
199+
is implemented by :func:`a85encode`.
196200

197-
The :func:`a85encode` and :func:`b85encode` functions in this module are two implementations of
198-
the de facto standard. You should call the function with the Base85
199-
implementation used by the software you intend to work with.
201+
A separate version, using a different output character set, was
202+
defined as an April Fool's joke in :rfc:`1924` but is now used by Git
203+
and other software. This version is implemented by :func:`b85encode`.
200204

201-
The two functions present in this module differ in how they handle the following:
205+
Finally, a third version, using yet another output character set
206+
designed for safe inclusion in programming language strings, is
207+
defined by ZeroMQ and implemented here by :func:`z85encode`.
202208

203-
* Whether to include enclosing ``<~`` and ``~>`` markers
204-
* Whether to include newline characters
209+
The functions present in this module differ in how they handle the following:
210+
211+
* Whether to include and expect enclosing ``<~`` and ``~>`` markers
212+
* Whether to fold the input into multiple lines
205213
* The set of ASCII characters used for encoding
206-
* Handling of null bytes
214+
* The encoding of zero-padding bytes applied to the input
207215

208216
Refer to the documentation of the individual functions for more information.
209217

@@ -214,17 +222,22 @@ Refer to the documentation of the individual functions for more information.
214222

215223
*foldspaces* is an optional flag that uses the special short sequence 'y'
216224
instead of 4 consecutive spaces (ASCII 0x20) as supported by 'btoa'. This
217-
feature is not supported by the "standard" Ascii85 encoding.
225+
feature is not supported by the standard encoding used in PDF.
218226

219227
*wrapcol* controls whether the output should have newline (``b'\n'``)
220228
characters added to it. If this is non-zero, each output line will be
221229
at most this many characters long, excluding the trailing newline.
222230

223-
*pad* controls whether the input is padded to a multiple of 4
224-
before encoding. Note that the ``btoa`` implementation always pads.
231+
*pad* controls whether zero-padding applied to the end of the input
232+
is fully retained in the output encoding, as done by ``btoa``,
233+
producing an exact multiple of 5 bytes of output. This is not part
234+
of the standard encoding used in PDF, as it does not preserve the
235+
length of the data.
225236

226-
*adobe* controls whether the encoded byte sequence is framed with ``<~``
227-
and ``~>``, which is used by the Adobe implementation.
237+
*adobe* controls whether the encoded byte sequence is framed with
238+
``<~`` and ``~>``, as in a PostScript base-85 string literal. Note
239+
that PDF streams *must not* use a leading ``<~``, but they *must* be
240+
terminated with ``~>``.
228241

229242
.. versionadded:: 3.4
230243

@@ -236,10 +249,12 @@ Refer to the documentation of the individual functions for more information.
236249

237250
*foldspaces* is a flag that specifies whether the 'y' short sequence
238251
should be accepted as shorthand for 4 consecutive spaces (ASCII 0x20).
239-
This feature is not supported by the "standard" Ascii85 encoding.
252+
This feature is not supported by the standard Ascii85 encoding used in
253+
PDF and PostScript.
240254

241-
*adobe* controls whether the input sequence is in Adobe Ascii85 format
242-
(i.e. is framed with <~ and ~>).
255+
*adobe* controls whether the ``<~`` and ``~>`` markers are
256+
present. While the leading ``<~`` is not required, the input must
257+
end with ``~>``, or a :exc:`ValueError` is raised.
243258

244259
*ignorechars* should be a :term:`bytes-like object` or ASCII string
245260
containing characters to ignore
@@ -254,35 +269,35 @@ Refer to the documentation of the individual functions for more information.
254269
Encode the :term:`bytes-like object` *b* using base85 (as used in e.g.
255270
git-style binary diffs) and return the encoded :class:`bytes`.
256271

257-
If *pad* is true, the input is padded with ``b'\0'`` so its length is a
258-
multiple of 4 bytes before encoding.
272+
The input is padded with ``b'\0'`` so its length is a multiple of 4
273+
bytes before encoding. If *pad* is true, all the resulting
274+
characters are retained in the output, which will be a multiple of
275+
5 bytes, and thus the length of the data may not be preserved on
276+
decoding.
259277

260278
.. versionadded:: 3.4
261279

262280

263281
.. function:: b85decode(b)
264282

265283
Decode the base85-encoded :term:`bytes-like object` or ASCII string *b* and
266-
return the decoded :class:`bytes`. Padding is implicitly removed, if
267-
necessary.
284+
return the decoded :class:`bytes`.
268285

269286
.. versionadded:: 3.4
270287

271288

272289
.. function:: z85encode(s)
273290

274291
Encode the :term:`bytes-like object` *s* using Z85 (as used in ZeroMQ)
275-
and return the encoded :class:`bytes`. See `Z85 specification
276-
<https://rfc.zeromq.org/spec/32/>`_ for more information.
292+
and return the encoded :class:`bytes`.
277293

278294
.. versionadded:: 3.13
279295

280296

281297
.. function:: z85decode(s)
282298

283299
Decode the Z85-encoded :term:`bytes-like object` or ASCII string *s* and
284-
return the decoded :class:`bytes`. See `Z85 specification
285-
<https://rfc.zeromq.org/spec/32/>`_ for more information.
300+
return the decoded :class:`bytes`.
286301

287302
.. versionadded:: 3.13
288303

0 commit comments

Comments
 (0)