🐛 Bug
When optimizing a dataset of numpy arrays I get the stated warning that the array is not writable and that this can cause issues down the line. The warning gets printed once we start iterating a StreamingDataLoader.
Triggered by torch\utils\data_utils\collate.py:288
Note that this warning is not raised when casting the numpy arrays to torch tensors before optimizing, so maybe this is a non-issue for this library.
To Reproduce
I simplified the example given in this repo by having random_images not return PIL Image but the underlying numpy instead.
import numpy as np
import litdata as ld
import torch
def random_images(index):
fake_images = np.random.randint(0, 256, (32, 32, 3), dtype=np.uint8)
# ! not a PIL image, but numpy array
fake_labels = np.random.randint(10)
# fake_images = torch.from_numpy(fake_images)
# no warning when working with torch
data = {"index": index, "image": fake_images, "class": fake_labels}
return data
if __name__ == "__main__":
ld.optimize(
fn=random_images,
inputs=list(range(1000)),
output_dir="tmp/fast_data",
num_workers=4,
chunk_bytes="64MB",
)
# ---
dataset = ld.StreamingDataset('tmp/fast_data', shuffle=True, drop_last=True)
# ! no collate function
dataloader = ld.StreamingDataLoader(dataset)
for sample in dataloader:
img, cls = sample["image"], sample["class"]
# here the warning prints
Expected behavior
Arrays being writable, I suppose.
Additional context
Environment detail
- PyTorch Version : 2.11.0
- OS (e.g., Linux): Windows 10
- How you installed PyTorch: pip
- Build command you used (if compiling from source): NA
- Python version: 3.13.13
🐛 Bug
When optimizing a dataset of numpy arrays I get the stated warning that the array is not writable and that this can cause issues down the line. The warning gets printed once we start iterating a StreamingDataLoader.
Triggered by torch\utils\data_utils\collate.py:288
Note that this warning is not raised when casting the numpy arrays to torch tensors before optimizing, so maybe this is a non-issue for this library.
To Reproduce
I simplified the example given in this repo by having random_images not return PIL Image but the underlying numpy instead.
Expected behavior
Arrays being writable, I suppose.
Additional context
Environment detail