Skip to content

Upgrade to datasets 4 (for torchcodec + List)#3207

Merged
lhoestq merged 3 commits into
mainfrom
datasets-4
Jul 7, 2025
Merged

Upgrade to datasets 4 (for torchcodec + List)#3207
lhoestq merged 3 commits into
mainfrom
datasets-4

Conversation

@lhoestq

@lhoestq lhoestq commented Jul 3, 2025

Copy link
Copy Markdown
Member

cc @severo for viz :)

It does introduce the List(...) type (instead of Sequence (...)and [...] ), and it translates to a new type of Feature in the OpenAPI spec

@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

) -> Optional[SupportedColumns]:
if isinstance(dataset_feature, list) or (
isinstance(dataset_feature, dict) and dataset_feature.get("_type") in ("LargeList", "Sequence")
isinstance(dataset_feature, dict) and dataset_feature.get("_type") in ("LargeList", "List", "Sequence")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we keep testing "Sequence" here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because of existing datasets: in the dataset-viewer cache they are Sequence

for feature_name, feature in features.items():
if isinstance(feature, list) or (
isinstance(feature, dict) and feature.get("_type") in ("LargeList", "Sequence")
isinstance(feature, dict) and feature.get("_type") in ("LargeList", "List", "Sequence")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we keep testing "Sequence" here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same)

@lhoestq lhoestq merged commit 057b8af into main Jul 7, 2025
29 of 30 checks passed
@lhoestq lhoestq deleted the datasets-4 branch July 7, 2025 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants