`.batch()` error on formatted datasets

The `.batch()` method currently assumes the input (batch) is always a dictionary, which causes errors when it isn't. This can happen with formatted datasets, since formats like `"pyarrow"`, `"pandas"` (only affects `IterableDataset`), and `"polars"` return tables/dataframes instead of dictionaries.

For example:
```python
from datasets import IterableDataset, Dataset
list(IterableDataset.from_dict({"a": [1, 2, 3, 4]}).with_format("pyarrow").batch(2))
# AttributeError: 'pyarrow.lib.Table' object has no attribute 'items'
```

Ideally, the result should be the same whether the format is applied before or after batching, i.e., the following should hold for all the format types:
```python
assert list(IterableDataset.from_dict({"a": [1, 2, 3, 4]}).with_format(format_type).batch(2)) == list(IterableDataset.from_dict({"a": [1, 2, 3, 4]}).batch(2).with_format(format_type))
assert list(Dataset.from_dict({"a": [1, 2, 3, 4]}).with_format(format_type).batch(2)) == list(Dataset.from_dict({"a": [1, 2, 3, 4]}).batch(2).with_format(format_type))
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`.batch()` error on formatted datasets #8075

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

.batch() error on formatted datasets #8075

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`.batch()` error on formatted datasets #8075