Skip to content

Bug: BodyPartReader.filename and read() leak bytearray instead of str/bytes, violating API contract and breaking JSON serialization #12404

@lxc902

Description

@lxc902

Describe the bug

The multipart reader surfaces bytearray values in places the
https://docs.aiohttp.org/en/stable/multipart_reference.html#aiohttp.BodyPartReader.filename and type hints promise str or
bytes. In aiohttp 3.13.x this leads to TypeError: Object of type bytearray is not JSON serializable whenever downstream code
pipes the filename (or a decoded text field named filename) through a JSON-based store or API response.

Per the docs:

BodyPartReader.filename — A str with the file name specified in the Content-Disposition header, or None if not specified.

And:

BodyPartReader.read(*, decode=False) — Reads body part data. … Returns: bytes.

However, in 3.13.x both entry points can surface bytearray:

  1. BodyPartReader.read(decode=True) returns the internal accumulation buffer unchanged when no
    Content-Transfer-Encoding/Content-Encoding header is present. That buffer is a bytearray, not bytes, so the annotated return
    type is violated.
  2. Reading a form field named filename via await part.read(decode=True) then chaining .strip() keeps it as bytearray, which
    silently flows into user dictionaries.
  3. When such a value is fed into json.dump, the encoder partially writes the opening of the object ({"filename": ) to the
    output stream before raising TypeError, which leaves applications with truncated, invalid JSON sidecar files on disk — a
    second-order corruption that is hard to diagnose after the fact.

This is not a None-or-missing case and it is not a charset/encoding edge case — the type itself is wrong.

To Reproduce


Environment

  • aiohttp 3.13.4 (observed); cpython 3.12.3 on Ubuntu 22.04 / x86_64
  • Default server configuration (aiohttp web.Application, default middleware)
  • No custom multipart parser, no subclassing of BodyPartReader
  • Issue was traced through standard request.multipart() → reader.next() → part.read(...)

Steps to Reproduce

Minimal self-contained server:

repro_server.py

import json
from aiohttp import web

async def handle(request):
reader = await request.multipart()
while True:
part = await reader.next()
if part is None:
break
if part.name == "filename":
data = await part.read(decode=True)
# Documented return type: bytes. Observed: bytearray.
print("type =", type(data).name, "value =", data)
try:
json.dumps({"filename": data})
except TypeError as e:
return web.json_response(
{"error": f"TypeError: {e}"}, status=500)
return web.json_response({"ok": True})
return web.json_response({"error": "no filename part"}, status=400)

app = web.Application()
app.router.add_post("/", handle)
web.run_app(app, port=8080)

Client:

curl -X POST http://localhost:8080/
-F 'filename=P1_submission.diff'

Observed stdout:

type = bytearray value = bytearray(b'P1_submission.diff')

Observed HTTP response:

500
{"error": "TypeError: Object of type bytearray is not JSON serializable"}

Expected behavior


Actual vs Expected

┌──────────────────────────────────┬─────────────────────┬─────────────────────────────────────────────────────────────────┐
│ │ Expected (per docs │ Observed (3.13.4) │
│ │ & type hints) │ │
├──────────────────────────────────┼─────────────────────┼─────────────────────────────────────────────────────────────────┤
│ │ │ str in most cases, but values originating from upstream parsing │
│ BodyPartReader.filename │ Optional[str] │ paths can surface as bytearray when fed through certain │
│ │ │ header-parsing branches │
├──────────────────────────────────┼─────────────────────┼─────────────────────────────────────────────────────────────────┤
│ BodyPartReader.read(decode=True) │ bytes │ bytearray whenever no Content-Transfer-Encoding / │
│ │ │ Content-Encoding header is present (i.e. the common case) │
├──────────────────────────────────┼─────────────────────┼─────────────────────────────────────────────────────────────────┤
│ Downstream json.dump on the │ succeeds │ raises TypeError, after already emitting a partial object to │
│ value │ │ the output stream │
└──────────────────────────────────┴─────────────────────┴─────────────────────────────────────────────────────────────────┘

Even if a decode step legitimately cannot produce a clean str (e.g. a
pathological filename), the API should preserve its declared return type
— for example by applying errors="replace" when converting to str,
or by explicitly returning bytes via bytes(buffer) rather than
leaking the internal bytearray. Users should not have to defensively
wrap str(...) / bytes(...) around every multipart read call just to
get the type the documentation promises.

Logs/tracebacks

Triggering the same bug against a real production handler (our own
  handle_upload_file), the exact traceback was:

  File "backend/server.py", line 1705, in handle_upload_file
      json.dump(meta, f)
    File ".../json/__init__.py", line 179, in dump
      for chunk in iterable:
    File ".../json/encoder.py", line 432, in _iterencode
      yield from _iterencode_dict(o, _current_indent_level)
    File ".../json/encoder.py", line 406, in _iterencode_dict
      yield from chunks
    File ".../json/encoder.py", line 439, in _iterencode
      o = _default(o)
    File ".../json/encoder.py", line 180, in default
      raise TypeError(f'Object of type {o.__class__.__name__} '
  TypeError: Object of type bytearray is not JSON serializable

  Over 225 identical occurrences were observed in a single production log
  file, which translated to an upstream 39 % HTTP 500 rate on the affected
  endpoint (observed via access-log status code tally).

Python Version

$ python --version
                                                                                                                                  
  Python 3.12.13

aiohttp Version

$ python -m pip show aiohttp
                                                                                                                                  
  Name: aiohttp
  Version: 3.13.4

multidict Version

$ python -m pip show multidict

 
  Name: multidict
  Version: 6.7.1

propcache Version

$ python -m pip show propcache
 
  Name: propcache
  Version: 0.4.1

yarl Version

$ python -m pip show yarl
 
  Name: yarl
  Version: 1.23.0

OS

OS: Ubuntu 24.04.2 LTS (Noble Numbat), x86_64 — kernel 5.15.0-79-generic

Runtime: cpython 3.12.13 built from the Anaconda distribution, statically-linked installer at bin/python3

Related component

Server

Additional context


Impact / Final Note

This is a type-contract regression that affects every user who pipes
multipart field data into any serializer with strict type requirements —
most prominently json.dump, but the same class of failure occurs in:

  • msgpack.packb (TypeError: can not serialize 'bytearray' object)
  • SQLAlchemy / ORM column binding for String columns
  • HTTP libraries that type-check body arguments (httpx, urllib3)
  • Pydantic / dataclass validation decorated with str fields

Because json.dump partially writes before raising, the failure mode is
particularly damaging: sidecar files / persistent metadata can end up
truncated mid-key on disk (we observed 47 corpses of exactly
{"filename": / 13 bytes), which does not surface as a parse error
until some future consumer reads them. Fixing the type contract upstream
in aiohttp is strictly safer than asking every downstream user to add
defensive _coerce_str(...) helpers after every multipart read.

Suggested fix directions (any one resolves the observed symptom):

  1. In BodyPartReader.read, return bytes(data) unconditionally rather
    than handing back the internal bytearray buffer.
  2. In BodyPartReader.decode, wrap the passthrough branch in bytes().
  3. Tighten the return annotation to bytes and add a CI test that asserts
    isinstance(result, bytes) on decode=True results.
  4. For filename, ensure the return is always Optional[str] even when
    header parsing falls back to a raw buffer representation — preferably
    with errors="replace" so that exotic filenames degrade gracefully
    instead of breaking the type contract.

Happy to submit a PR for (1) + (3) if that direction is agreeable.

Code of Conduct

  • I agree to follow the aio-libs Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugneeds-infoIssue is lacking sufficient information and will be closed if not provided

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions