Skip to content

UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. #818

@tobemo

Description

@tobemo

🐛 Bug

When optimizing a dataset of numpy arrays I get the stated warning that the array is not writable and that this can cause issues down the line. The warning gets printed once we start iterating a StreamingDataLoader.
Triggered by torch\utils\data_utils\collate.py:288
Note that this warning is not raised when casting the numpy arrays to torch tensors before optimizing, so maybe this is a non-issue for this library.

To Reproduce

I simplified the example given in this repo by having random_images not return PIL Image but the underlying numpy instead.

import numpy as np
import litdata as ld
import torch


def random_images(index):
    fake_images = np.random.randint(0, 256, (32, 32, 3), dtype=np.uint8)
    # ! not a PIL image, but numpy array
    fake_labels = np.random.randint(10)

    # fake_images = torch.from_numpy(fake_images)
    # no warning when working with torch
    data = {"index": index, "image": fake_images, "class": fake_labels}

    return data


if __name__ == "__main__":
    ld.optimize(
        fn=random_images,
        inputs=list(range(1000)),
        output_dir="tmp/fast_data",
        num_workers=4,
        chunk_bytes="64MB",
    )

    # ---

    dataset = ld.StreamingDataset('tmp/fast_data', shuffle=True, drop_last=True)

    # ! no collate function
    dataloader = ld.StreamingDataLoader(dataset)
    for sample in dataloader:
        img, cls = sample["image"], sample["class"]
    # here the warning prints

Expected behavior

Arrays being writable, I suppose.

Additional context

Environment detail
  • PyTorch Version : 2.11.0
  • OS (e.g., Linux): Windows 10
  • How you installed PyTorch: pip
  • Build command you used (if compiling from source): NA
  • Python version: 3.13.13

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghelp wantedExtra attention is needed

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions