Skip to content

Commit b6d1b82

Browse files
Add notice for multiworker shuffling behaviors
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
1 parent b7bd4cb commit b6d1b82

1 file changed

Lines changed: 10 additions & 0 deletions

File tree

caikit_nlp/toolkit/data_stream_wrapper.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,16 @@
3434
class SimpleIterableStreamWrapper(IterableDataset):
3535
"""DataStream wrapper as an iterable PyTorch dataset; we use this to add
3636
compatability with PyTorch data loaders.
37+
38+
NOTE: this wrapper does support shuffling iterable datasets with multiple
39+
workers as a true partition, but for it to work correctly, you must
40+
set persistent_workers=True when initializing your dataloader. Otherwise,
41+
your workers will be destroyed, causing them to have the same shuffle
42+
seed every time.
43+
44+
To verify that multiworker shuffling is working properly, you can turn on
45+
debug logs and verify that the logged shuffle seed changes as you iterate
46+
through your dataset.
3747
"""
3848

3949
def __init__(

0 commit comments

Comments
 (0)