Skip to content

datasets.load_from_disk progress bar optional manual control #7939

@Tommigun1980

Description

@Tommigun1980

Feature request

This is tangentially related to #7918.

When loading a dataset with > 16 files a progress bar is shown (unless stdout is redirected or #7919 is merged).

However, if you use multiple processes with data sharding, where each core loads the dataset, you get multiple copies of the progress bar (all fighting each other). It would be greatly appreciated if datasets.load_from_disk accepted an argument for whether to show a progress bar; default could be None which would retain current functionality (i.e. show if > 16 files in dataset), but user could also force the progress bar on or off as needed. Essentially just expose the progress bar visibility argument to the method's argument so that user can control it instead of it being hardcoded, where None would be default argument and retain current functionality.

Motivation

Progress bar could be forced off in all processes than one, to avoid progress bar fighting and log spam.
Progress bar could also be manually forced on and off for other use cases.

Your contribution

Possibly do a PR if this is accepted.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions