Bug: BodyPartReader.filename and read() leak bytearray instead of str/bytes, violating API contract and breaking JSON serialization

### Describe the bug

The multipart reader surfaces bytearray values in places the
  https://docs.aiohttp.org/en/stable/multipart_reference.html#aiohttp.BodyPartReader.filename and type hints promise str or
  bytes. In aiohttp 3.13.x this leads to TypeError: Object of type bytearray is not JSON serializable whenever downstream code
  pipes the filename (or a decoded text field named filename) through a JSON-based store or API response.

  Per the docs:

  BodyPartReader.filename — A str with the file name specified in the Content-Disposition header, or None if not specified.

  And:

  BodyPartReader.read(*, decode=False) — Reads body part data. … Returns: bytes.

  However, in 3.13.x both entry points can surface bytearray:

  1. BodyPartReader.read(decode=True) returns the internal accumulation buffer unchanged when no
  Content-Transfer-Encoding/Content-Encoding header is present. That buffer is a bytearray, not bytes, so the annotated return
  type is violated.
  2. Reading a form field named filename via await part.read(decode=True) then chaining .strip() keeps it as bytearray, which
  silently flows into user dictionaries.
  3. When such a value is fed into json.dump, the encoder partially writes the opening of the object ({"filename": ) to the
  output stream before raising TypeError, which leaves applications with truncated, invalid JSON sidecar files on disk — a
  second-order corruption that is hard to diagnose after the fact.

  This is not a None-or-missing case and it is not a charset/encoding edge case — the type itself is wrong.


### To Reproduce


  ---
  Environment

  - aiohttp 3.13.4 (observed); cpython 3.12.3 on Ubuntu 22.04 / x86_64
  - Default server configuration (aiohttp web.Application, default middleware)
  - No custom multipart parser, no subclassing of BodyPartReader
  - Issue was traced through standard request.multipart() → reader.next() → part.read(...)

  ---
  Steps to Reproduce

  Minimal self-contained server:

  # repro_server.py
  import json
  from aiohttp import web

  async def handle(request):
      reader = await request.multipart()
      while True:
          part = await reader.next()
          if part is None:
              break
          if part.name == "filename":
              data = await part.read(decode=True)
              # Documented return type: bytes. Observed: bytearray.
              print("type =", type(data).__name__, "value =", data)
              try:
                  json.dumps({"filename": data})
              except TypeError as e:
                  return web.json_response(
                      {"error": f"TypeError: {e}"}, status=500)
              return web.json_response({"ok": True})
      return web.json_response({"error": "no filename part"}, status=400)

  app = web.Application()
  app.router.add_post("/", handle)
  web.run_app(app, port=8080)

  Client:

  curl -X POST http://localhost:8080/ \
       -F 'filename=P1_submission.diff'

  Observed stdout:

  type = bytearray value = bytearray(b'P1_submission.diff')

  Observed HTTP response:

  500
  {"error": "TypeError: Object of type bytearray is not JSON serializable"}



### Expected behavior


  ---
  Actual vs Expected

  ┌──────────────────────────────────┬─────────────────────┬─────────────────────────────────────────────────────────────────┐
  │                                  │ Expected (per docs  │                        Observed (3.13.4)                        │
  │                                  │    & type hints)    │                                                                 │
  ├──────────────────────────────────┼─────────────────────┼─────────────────────────────────────────────────────────────────┤
  │                                  │                     │ str in most cases, but values originating from upstream parsing │
  │ BodyPartReader.filename          │ Optional[str]       │  paths can surface as bytearray when fed through certain        │
  │                                  │                     │ header-parsing branches                                         │
  ├──────────────────────────────────┼─────────────────────┼─────────────────────────────────────────────────────────────────┤
  │ BodyPartReader.read(decode=True) │ bytes               │ bytearray whenever no Content-Transfer-Encoding /               │
  │                                  │                     │ Content-Encoding header is present (i.e. the common case)       │
  ├──────────────────────────────────┼─────────────────────┼─────────────────────────────────────────────────────────────────┤
  │ Downstream json.dump on the      │ succeeds            │ raises TypeError, after already emitting a partial object to    │
  │ value                            │                     │ the output stream                                               │
  └──────────────────────────────────┴─────────────────────┴─────────────────────────────────────────────────────────────────┘

  Even if a decode step legitimately cannot produce a clean str (e.g. a
  pathological filename), the API should preserve its declared return type
  — for example by applying errors="replace" when converting to str,
  or by explicitly returning bytes via bytes(buffer) rather than
  leaking the internal bytearray. Users should not have to defensively
  wrap str(...) / bytes(...) around every multipart read call just to
  get the type the documentation promises.


### Logs/tracebacks

```python-traceback
Triggering the same bug against a real production handler (our own
  handle_upload_file), the exact traceback was:

  File "backend/server.py", line 1705, in handle_upload_file
      json.dump(meta, f)
    File ".../json/__init__.py", line 179, in dump
      for chunk in iterable:
    File ".../json/encoder.py", line 432, in _iterencode
      yield from _iterencode_dict(o, _current_indent_level)
    File ".../json/encoder.py", line 406, in _iterencode_dict
      yield from chunks
    File ".../json/encoder.py", line 439, in _iterencode
      o = _default(o)
    File ".../json/encoder.py", line 180, in default
      raise TypeError(f'Object of type {o.__class__.__name__} '
  TypeError: Object of type bytearray is not JSON serializable

  Over 225 identical occurrences were observed in a single production log
  file, which translated to an upstream 39 % HTTP 500 rate on the affected
  endpoint (observed via access-log status code tally).
```

### Python Version

```console
$ python --version
                                                                                                                                  
  Python 3.12.13
```

### aiohttp Version

```console
$ python -m pip show aiohttp
                                                                                                                                  
  Name: aiohttp
  Version: 3.13.4
```

### multidict Version

```console
$ python -m pip show multidict

 
  Name: multidict
  Version: 6.7.1
```

### propcache Version

```console
$ python -m pip show propcache
 
  Name: propcache
  Version: 0.4.1
```

### yarl Version

```console
$ python -m pip show yarl
 
  Name: yarl
  Version: 1.23.0
```

### OS


  OS: Ubuntu 24.04.2 LTS (Noble Numbat), x86_64 — kernel 5.15.0-79-generic
 
  Runtime: cpython 3.12.13 built from the Anaconda distribution, statically-linked installer at bin/python3  

### Related component

Server

### Additional context


  ---
  Impact / Final Note

  This is a type-contract regression that affects every user who pipes
  multipart field data into any serializer with strict type requirements —
  most prominently json.dump, but the same class of failure occurs in:

  - msgpack.packb (TypeError: can not serialize 'bytearray' object)
  - SQLAlchemy / ORM column binding for String columns
  - HTTP libraries that type-check body arguments (httpx, urllib3)
  - Pydantic / dataclass validation decorated with str fields

  Because json.dump partially writes before raising, the failure mode is
  particularly damaging: sidecar files / persistent metadata can end up
  truncated mid-key on disk (we observed 47 corpses of exactly
  {"filename":  / 13 bytes), which does not surface as a parse error
  until some future consumer reads them. Fixing the type contract upstream
  in aiohttp is strictly safer than asking every downstream user to add
  defensive _coerce_str(...) helpers after every multipart read.

  Suggested fix directions (any one resolves the observed symptom):

  1. In BodyPartReader.read, return bytes(data) unconditionally rather
  than handing back the internal bytearray buffer.
  2. In BodyPartReader.decode, wrap the passthrough branch in bytes().
  3. Tighten the return annotation to bytes and add a CI test that asserts
  isinstance(result, bytes) on decode=True results.
  4. For filename, ensure the return is always Optional[str] even when
  header parsing falls back to a raw buffer representation — preferably
  with errors="replace" so that exotic filenames degrade gracefully
  instead of breaking the type contract.

  Happy to submit a PR for (1) + (3) if that direction is agreeable.


### Code of Conduct

- [x] I agree to follow the aio-libs Code of Conduct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: BodyPartReader.filename and read() leak bytearray instead of str/bytes, violating API contract and breaking JSON serialization #12404

Describe the bug

To Reproduce

repro_server.py

Expected behavior

Logs/tracebacks

Python Version

aiohttp Version

multidict Version

propcache Version

yarl Version

OS

Related component

Additional context

Code of Conduct

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Bug: BodyPartReader.filename and read() leak bytearray instead of str/bytes, violating API contract and breaking JSON serialization #12404

Description

Describe the bug

To Reproduce

repro_server.py

Expected behavior

Logs/tracebacks

Python Version

aiohttp Version

multidict Version

propcache Version

yarl Version

OS

Related component

Additional context

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions