Skip to content

CMYK source images crash PIL engine when transcoding to PNG #10

@jensens

Description

@jensens

Problem

CMYK-mode source images (typically JPEGs from print-industry workflows) crash Thumbor's PIL engine when it tries to save as PNG. Observed on aaf-6 prod (2026-04-20):

```
KeyError: 'CMYK'
File ".../PIL/PngImagePlugin.py", line 1307, in _save
rawmode, bit_depth, color_type = _OUTMODES[outmode]

OSError: cannot write mode CMYK as PNG
File ".../PIL/Image.py", line 2568, in save
save_handler(self, fp, filename)
File ".../PIL/PngImagePlugin.py", line 1310, in _save
raise OSError(msg) from e

thumbor:ERROR [BaseHander.finish_request] cannot write mode CMYK as PNG
thumbor:WARNING Error while trying to fetch the image: cannot write mode CMYK as PNG


Result: the image produces a 400/500 for every request. With the error microcache from #5 (shipped in [0.4.2](https://github.com/bluedynamics/zodb-pgjsonb-thumborblobloader/releases/tag/v0.4.2)) the failure no longer pins in Varnish for a year, but individual CMYK images still consistently fail.

## Root cause

PIL's PNG encoder only accepts L/LA/I/P/RGB/RGBA modes. CMYK JPEGs must be converted to RGB before any PNG save attempt. Thumbor's default PIL engine does not do this conversion.

The code path (from the traceback):

\`thumbor/engines/pil.py:428\` → \`self.image.save(img_buffer, self.image.format, **options)\` — `self.image.format` is still "JPEG" here, so the first save attempt works. The failure is at the **second** save attempt at \`pil.py:433\` where the extension is forced (auto-PNG for transparency support, WebP negotiation, etc.). At that point \`self.image.mode == "CMYK"\` and PIL's PNG encoder refuses.

## Fix options

### Option A — Thumbor engine extension (preferred)

Ship a thin \`PilEngine\` subclass that normalises CMYK to RGB on load, before any downstream save touches it. One \`if image.mode == "CMYK": image = image.convert("RGB")\` hook. This lives in this loader package since the Docker image ships a known Thumbor config.

### Option B — Loader-side preprocessing

In \`loader.py\`, after reading bytes from PG/S3, parse the JPEG header, detect CMYK (`APP14` marker / `Adobe` block with transform = 0), decode-to-RGB-re-encode. Heavier: requires decoding the image twice.

### Option C — Document the limitation

Add an upstream Thumbor config flag to filter out CMYK sources at the loader stage (return `upstream` error). Not really a fix, just avoids the crash loop.

### Option D — Pillow's ImageCms profile-aware CMYK→RGB

If the source has an embedded ICC profile, use \`ImageCms\` to do a colorimetrically correct conversion instead of PIL's naive channel math. Slower but more accurate for print-derived files. Reasonable as a follow-up once Option A is in place.

## Recommendation

**Option A** for this issue. The fix is ~10 lines in a small engine-extension module, configurable via \`ENGINE = '<pkg>.engine'\` in the Docker image's \`thumbor.conf\`. The microcache from #5 already contains the blast radius; Option A eliminates the underlying error entirely.

## Test plan

- Fixture: a small CMYK JPEG (can be generated with PIL: \`Image.new("CMYK", (10, 10)).save("cmyk.jpg")\`).
- Unit: feed the CMYK image through the engine, assert resulting image mode is RGB.
- Integration: run the Docker image against a PG blob with CMYK source, assert PNG transcoding succeeds.

## Context

Surfaced alongside #5 (long-TTL error cache poisoning) and #6 (S3 pool saturation) during the aaf-6 cloud-vinyl rollout. Prevalence in the corpus is low (handful of print-origin images out of ~250k blobs) but each one consistently fails, so affected URLs always return errors until the source is re-uploaded or converted upstream.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions