Skip to content

Commit ec6f0d9

Browse files
justadogistakenbaojiangnan
andauthored
docs: run build_eagle3_dataset.py with torchrun (#234)
Co-authored-by: baojiangnan <baojiangnan@kuaishou.com>
1 parent 3f0ae30 commit ec6f0d9

1 file changed

Lines changed: 1 addition & 3 deletions

File tree

scripts/build_eagle3_dataset.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@
77
import os
88
from pathlib import Path
99

10-
import torch
1110
from datasets import load_dataset
1211
from transformers import AutoTokenizer
1312

@@ -32,13 +31,12 @@ def main():
3231
Separated script to build eagle3 dataset from the training.
3332
3433
Usage:
35-
python ./scripts/build_eagle3_dataset.py \
34+
python ./scripts/build_eagle3_dataset.py \
3635
--data-path "cache/dataset/sharegpt.jsonl" \
3736
--model-path /shared/public/models/meta-llama/Meta-Llama-3.1-8B-Instruct \
3837
--chat-template llama3
3938
"""
4039
args = parse_args()
41-
torch.distributed.init_process_group(backend="nccl")
4240
assert os.path.exists(
4341
args.data_path
4442
), f"Dataset path {args.data_path} does not exist"

0 commit comments

Comments
 (0)