Skip to content

Commit 44c602a

Browse files
authored
Update README.md
1 parent 3099b46 commit 44c602a

1 file changed

Lines changed: 2 additions & 2 deletions

File tree

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,10 +39,10 @@ Since WebDatasets are just tar files, you can use many different tools to create
3939
If your data is already laid out like that on the file system, you can use `tar --sorted`:
4040

4141
```Shell
42-
$ tar --sorted name -cf - dataset > dataset.tar
42+
$ tar --sort=name -cf - dataset > dataset.tar
4343
```
4444

45-
You can also use the `tarp create` command (at [github.com/tmbdev/tarp](http://github.com/tmbdev/tarp)) with a recipe file.
45+
You can also use the `tarp create` command (at [github.com/tmbdev/tarp](http://github.com/tmbdev/tarp)) with a recipe file, use `tarp split` to split large datasets into multiple shards, and `tarp shuffle` to shuffle datasets.
4646

4747
And you can use Python or Julia scripts to write such files directly. For example, [makeshards.py](https://github.com/tmbdev/webdataset-lightning/blob/main/makeshards.py) uses some existing PyTorch code to quickly convert Imagenet data into sharded tar files.
4848

0 commit comments

Comments
 (0)