Skip to content

Update datasets requirement from <5,>=2.18 to >=2.18,<6#447

Closed
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/pip/datasets-gte-2.18-and-lt-6
Closed

Update datasets requirement from <5,>=2.18 to >=2.18,<6#447
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/pip/datasets-gte-2.18-and-lt-6

Conversation

@dependabot
Copy link
Copy Markdown
Contributor

@dependabot dependabot Bot commented on behalf of github Jun 8, 2026

Updates the requirements on datasets to permit the latest version.

Release notes

Sourced from datasets's releases.

5.0.0

Datasets Features

Agent traces

  • Parse Agent traces messages for SFT using teich by @​lhoestq in huggingface/datasets#8232

    • Agent traces from claude_code/pi/codex and others can now be loaded with load_dataset
    • Using the teich library (new optional dependency), traces are parsed to messages to enable training on traces using e.g. trl
    • Load the data:
    >>> from datasets import load_dataset
    >>> ds = load_dataset("lhoestq/agent-traces-example", split="train")
    >>> ds[0]["messages"]
    [{'role': 'user', 'content': 'Download a random dataset from Hugging Face, use DuckDB to inspect it, and come back with a short report about it. Be concise and include: dataset name, what files/format you found, row count or rough size if you can determine it,...'
     ...]
    • Train on agent traces:
    trl sft --dataset-name lhoestq/agent-traces-example ...

Next-level shuffling in streaming mode

  • Use multiple input shards for shuffle buffer by @​lhoestq in huggingface/datasets#8194

    ds = load_dataset(..., streaming=True)
    ds = ds.shuffle(seed=42)
    # or configure local buffer shuffling manually, default is:
    ds = ds.shuffle(seed=42, buffer_size=1000, max_buffer_input_shards=10)

    before👎:

    after✨:

    toy example comparison

    from datasets import IterableDataset
    ds = IterableDataset.from_dict({"i": range(123_456_789)}, num_shards=1024)
    ds = ds.shuffle(seed=42)
    print("Cold start ids:")

... (truncated)

Commits
  • 68ac1a9 Release: 5.0.0 (#8239)
  • cfe4492 Support composed splits in streaming datasets (#8220)
  • fd67320 Keep None as a real null in Json() columns instead of the string "null" (#8231)
  • 10cdc81 Fix iterable skip over full Arrow blocks (#8236)
  • b7c064d Parse agent traces messages for SFT using teich (#8232)
  • 31e92f1 fix: embed_external_files=True for mesh support (#8224)
  • d168d5f feat: add TsFile (Apache IoTDB) packaged builder with per-device wide format ...
  • 992f3cf fix(map): fix progress bar exceeding total when load_from_cache_file=False (#...
  • 8474a91 Fix single lance file form pylance 7.0 (#8225)
  • d4284e9 feat: add 3D mesh support and MeshFolder builder (#8055)
  • Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Updates the requirements on [datasets](https://github.com/huggingface/datasets) to permit the latest version.
- [Release notes](https://github.com/huggingface/datasets/releases)
- [Commits](huggingface/datasets@2.18.0...5.0.0)

---
updated-dependencies:
- dependency-name: datasets
  dependency-version: 5.0.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot Bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels Jun 8, 2026
@dependabot dependabot Bot requested a review from a team as a code owner June 8, 2026 06:28
@dependabot dependabot Bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels Jun 8, 2026
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@casenave casenave closed this Jun 8, 2026
@dependabot @github
Copy link
Copy Markdown
Contributor Author

dependabot Bot commented on behalf of github Jun 8, 2026

OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, let me know by commenting @dependabot ignore this major version or @dependabot ignore this minor version. You can also ignore all major, minor, or patch releases for a dependency by adding an ignore condition with the desired update_types to your config file.

If you change your mind, just re-open this PR and I'll resolve any conflicts on it.

@dependabot dependabot Bot deleted the dependabot/pip/datasets-gte-2.18-and-lt-6 branch June 8, 2026 08:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python Pull requests that update python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants