Fix PIL image cache key collisions across dimensions by 3em0 · Pull Request #9359 · modelscope/ms-swift

3em0 · 2026-05-16T07:56:36Z

PR type

Bug Fix
New Feature
Document Updates
More Models or Datasets Support

PR information

Template._save_pil_image() used only image.tobytes() to build the image cache hash. Pillow's flattened byte stream does not include mode, width, or height, so images with identical raw bytes but different dimensions could reuse the same cached PNG path.

This PR includes the image mode and dimensions in the hash input before the raw pixel bytes. It also adds a regression test that creates two images from the same bytes with dimensions 120x80 and 80x120, verifies they get different cache paths, and checks that both cached files keep their correct dimensions.

Experiment results

/home/dem0/miniconda3/envs/LLM-HASH/bin/python -m pytest tests/llm/test_template.py::TestTemplate::test_save_pil_image_uses_dimensions_in_cache_key -q
.                                                                        [100%]
1 passed in 3.22s

gemini-code-assist

Code Review

This pull request updates the _save_pil_image method to include image metadata (mode, width, and height) in the SHA256 hash calculation, ensuring unique cache keys for images with identical raw bytes but different dimensions. A unit test has been added to verify that different image dimensions result in distinct paths. Review feedback recommends using incremental hash updates to improve memory efficiency when handling large images.

gemini-code-assist · 2026-05-16T07:58:56Z

+        img_meta = f'{image.mode}:{image.width}:{image.height}:'.encode()
+        img_hash = hashlib.sha256(img_meta + image.tobytes()).hexdigest()


For better memory efficiency, especially with large images, you can update the hash object incrementally instead of creating a new concatenated bytes object. This avoids allocating extra memory for the combined metadata and image bytes.

Suggested change

img_meta = f'{image.mode}:{image.width}:{image.height}:'.encode()

img_hash = hashlib.sha256(img_meta + image.tobytes()).hexdigest()

hasher = hashlib.sha256()

hasher.update(f'{image.mode}:{image.width}:{image.height}:'.encode())

hasher.update(image.tobytes())

img_hash = hasher.hexdigest()

fix: include image metadata in cache hash

0a921cc

gemini-code-assist Bot reviewed May 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix PIL image cache key collisions across dimensions#9359

Fix PIL image cache key collisions across dimensions#9359
3em0 wants to merge 1 commit into
modelscope:mainfrom
3em0:fix-image-cache-dimension-hash

3em0 commented May 16, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		img_meta = f'{image.mode}:{image.width}:{image.height}:'.encode()
		img_hash = hashlib.sha256(img_meta + image.tobytes()).hexdigest()

-        img_meta = f'{image.mode}:{image.width}:{image.height}:'.encode()
-        img_hash = hashlib.sha256(img_meta + image.tobytes()).hexdigest()
+        hasher = hashlib.sha256()
+        hasher.update(f'{image.mode}:{image.width}:{image.height}:'.encode())
+        hasher.update(image.tobytes())
+        img_hash = hasher.hexdigest()

Uh oh!

Conversation

3em0 commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR type

PR information

Experiment results

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

3em0 commented May 16, 2026 •

edited

Loading