Skip to content

Init shortcut job#3346

Merged
lhoestq merged 17 commits into
mainfrom
init-shortcut-job
May 27, 2026
Merged

Init shortcut job#3346
lhoestq merged 17 commits into
mainfrom
init-shortcut-job

Conversation

@lhoestq
Copy link
Copy Markdown
Member

@lhoestq lhoestq commented May 21, 2026

Add the first jobs that allows shortcuts, this will be useful to accelerate the viewer's preparation for Parquet datasets (run all jobs at once)

Details

  • Job runners can now yield multiple job results, including ShortcutJobResult objects that shortcut other runners
  • The AfterJobPlan takes this into account and creates new jobs based on the job result and the shortcut job results
  • there is a new "dataset-init" job runner that shortcuts "dataset-config-names" to start with
  • in a second step (future PR), "dataset-init" will also be able to shortcut Parquet-related job runners
  • later it can also be useful to group jobs with low difficutly (e.g. info aggregation jobs when there are many configs and splits) and shortcut them as well, greatly reducing the number of jobs in the queue

Thanks to open models for the help coding this :p

struggling a bit to merge that one due to tests failing bc of a mix of network/github/hub-ci/app-tokens issues

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@lhoestq lhoestq merged commit f3763e2 into main May 27, 2026
26 of 28 checks passed
@lhoestq lhoestq deleted the init-shortcut-job branch May 27, 2026 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants