Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions examples/speculative_decoding/eagle_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -211,6 +211,9 @@ def on_log(self, args, state, control, **kwargs):

# log to wandb
if wandb and is_master():
logs = kwargs.get("logs") or {}
if logs:
wandb.log({k: v for k, v in logs.items() if v is not None}, step=state.global_step)
for i, draft_acc in enumerate(average_acc):
for j, step_acc in enumerate(draft_acc):
wandb.log(
Expand Down
49 changes: 29 additions & 20 deletions examples/speculative_decoding/launch_train.sh
Original file line number Diff line number Diff line change
Expand Up @@ -62,14 +62,6 @@ while [ $# -gt 0 ]; do
if [[ "$1" != *=* ]]; then shift; fi
TRAIN_BS="${1#*=}"
;;
--medusa_num_heads*)
if [[ "$1" != *=* ]]; then shift; fi
MEDUSA_NUM_HEADS="${1#*=}"
;;
--medusa_num_layers*)
if [[ "$1" != *=* ]]; then shift; fi
MEDUSA_NUM_LAYERS="${1#*=}"
;;
--eagle_config*)
if [[ "$1" != *=* ]]; then shift; fi
EAGLE_CONFIG="${1#*=}"
Expand Down Expand Up @@ -110,6 +102,14 @@ while [ $# -gt 0 ]; do
if [[ "$1" != *=* ]]; then shift; fi
DRAFT_VOCAB_CACHE="${1#*=}"
;;
--num_nodes*)
if [[ "$1" != *=* ]]; then shift; fi
NUM_NODES="${1#*=}"
;;
--head_node_ip*)
if [[ "$1" != *=* ]]; then shift; fi
HEAD_NODE_IP="${1#*=}"
;;
*)
>&2 printf "Error: Invalid argument ${1#*=}\n"
exit 1
Expand All @@ -120,10 +120,12 @@ done

set -x

# Get the default value for save_steps based on the available number of GPUs
GPU_COUNT=$(python -c "import torch; print(torch.cuda.device_count())")
NUM_NODES=${NUM_NODES:-1}
GPU_PER_NODE=${GPU_PER_NODE:-$(nvidia-smi --query-gpu=name --format=csv,noheader | wc -l)}
TOTAL_GPU=$((NUM_NODES * GPU_PER_NODE))
echo "Total GPUs: $TOTAL_GPU (NUM_NODES: $NUM_NODES, GPU_PER_NODE: $GPU_PER_NODE)"
# Calculate save_steps
DEFAULT_SAVE_STEPS=$((8192 / GPU_COUNT))
DEFAULT_SAVE_STEPS=$((8192 / TOTAL_GPU))

Comment on lines +123 to 129
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard against zero GPUs and zero save_steps.

If nvidia-smi returns 0 GPUs (or fails), TOTAL_GPU becomes 0 and the script will divide by zero. Also, large TOTAL_GPU values can drive DEFAULT_SAVE_STEPS to 0, which is invalid for trainers.

🛡️ Suggested fix
 NUM_NODES=${NUM_NODES:-1}
 GPU_PER_NODE=${GPU_PER_NODE:-$(nvidia-smi --query-gpu=name --format=csv,noheader | wc -l)}
 TOTAL_GPU=$((NUM_NODES * GPU_PER_NODE))
+if (( TOTAL_GPU <= 0 )); then
+  echo "No GPUs detected. Set GPU_PER_NODE/NUM_NODES explicitly."
+  exit 1
+fi
 echo "Total GPUs: $TOTAL_GPU (NUM_NODES: $NUM_NODES, GPU_PER_NODE: $GPU_PER_NODE)"
 # Calculate save_steps
 DEFAULT_SAVE_STEPS=$((8192 / TOTAL_GPU))
+if (( DEFAULT_SAVE_STEPS < 1 )); then
+  DEFAULT_SAVE_STEPS=1
+fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
NUM_NODES=${NUM_NODES:-1}
GPU_PER_NODE=${GPU_PER_NODE:-$(nvidia-smi --query-gpu=name --format=csv,noheader | wc -l)}
TOTAL_GPU=$((NUM_NODES * GPU_PER_NODE))
echo "Total GPUs: $TOTAL_GPU (NUM_NODES: $NUM_NODES, GPU_PER_NODE: $GPU_PER_NODE)"
# Calculate save_steps
DEFAULT_SAVE_STEPS=$((8192 / GPU_COUNT))
DEFAULT_SAVE_STEPS=$((8192 / TOTAL_GPU))
NUM_NODES=${NUM_NODES:-1}
GPU_PER_NODE=${GPU_PER_NODE:-$(nvidia-smi --query-gpu=name --format=csv,noheader | wc -l)}
TOTAL_GPU=$((NUM_NODES * GPU_PER_NODE))
if (( TOTAL_GPU <= 0 )); then
echo "No GPUs detected. Set GPU_PER_NODE/NUM_NODES explicitly."
exit 1
fi
echo "Total GPUs: $TOTAL_GPU (NUM_NODES: $NUM_NODES, GPU_PER_NODE: $GPU_PER_NODE)"
# Calculate save_steps
DEFAULT_SAVE_STEPS=$((8192 / TOTAL_GPU))
if (( DEFAULT_SAVE_STEPS < 1 )); then
DEFAULT_SAVE_STEPS=1
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/speculative_decoding/launch_train.sh` around lines 123 - 129, The
script currently risks divide-by-zero and producing DEFAULT_SAVE_STEPS=0 when
nvidia-smi returns 0 or when TOTAL_GPU is large; update the logic around
NUM_NODES, GPU_PER_NODE, TOTAL_GPU and DEFAULT_SAVE_STEPS so GPU_PER_NODE falls
back to 1 if nvidia-smi fails or reports 0, ensure TOTAL_GPU is computed as
max(1, NUM_NODES * GPU_PER_NODE), and then compute DEFAULT_SAVE_STEPS using that
safe TOTAL_GPU and finally clamp DEFAULT_SAVE_STEPS to a minimum of 1 so the
trainer never receives 0 (refer to the variables NUM_NODES, GPU_PER_NODE,
TOTAL_GPU, DEFAULT_SAVE_STEPS).

MODEL=${MODEL:-"TinyLlama/TinyLlama-1.1B-Chat-v1.0"}
MODE=${MODE:-"eagle3"}
Expand All @@ -135,8 +137,6 @@ NUM_EPOCHS=${NUM_EPOCHS:-1}
SAVE_STEPS=${SAVE_STEPS:-$DEFAULT_SAVE_STEPS}
LR=${LR:-"1e-4"}
TRAIN_BS=${TRAIN_BS:-1}
MEDUSA_NUM_HEADS=${MEDUSA_NUM_HEADS:-1}
MEDUSA_NUM_LAYERS=${MEDUSA_NUM_LAYERS:-1}
TRAINING_SEQ_LEN=${TRAINING_SEQ_LEN:-2048}
OFFLINE_DATA_PATH=${OFFLINE_DATA_PATH:-""}
DISABLE_TQDM=${DISABLE_TQDM:-False}
Expand All @@ -145,20 +145,19 @@ VLM_IMG_DIR=${VLM_IMG_DIR:-}
AR_VALIDATE_STEPS=${AR_VALIDATE_STEPS:-1000}
ESTIMATE_AR=${ESTIMATE_AR:-False}
CP_SIZE=${CP_SIZE:-1}
DP_SHARD_SIZE=${DP_SHARD_SIZE:-$((GPU_COUNT/CP_SIZE))}
DP_SHARD_SIZE=${DP_SHARD_SIZE:-$((TOTAL_GPU/CP_SIZE))}
LOG_STEPS=${LOG_STEPS:-100}
Comment on lines 147 to 149
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate CP_SIZE so DP_SHARD_SIZE is never 0 or truncated.

DP_SHARD_SIZE is computed via integer division. If CP_SIZE doesn’t evenly divide TOTAL_GPU (or exceeds it), DP_SHARD_SIZE becomes incorrect or 0, which will break training.

✅ Suggested fix
 CP_SIZE=${CP_SIZE:-1}
-DP_SHARD_SIZE=${DP_SHARD_SIZE:-$((TOTAL_GPU/CP_SIZE))}
+if (( TOTAL_GPU % CP_SIZE != 0 )); then
+  echo "CP_SIZE ($CP_SIZE) must evenly divide TOTAL_GPU ($TOTAL_GPU)."
+  exit 1
+fi
+DP_SHARD_SIZE=${DP_SHARD_SIZE:-$((TOTAL_GPU/CP_SIZE))}
+if (( DP_SHARD_SIZE < 1 )); then
+  echo "DP_SHARD_SIZE must be >= 1."
+  exit 1
+fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
CP_SIZE=${CP_SIZE:-1}
DP_SHARD_SIZE=${DP_SHARD_SIZE:-$((GPU_COUNT/CP_SIZE))}
DP_SHARD_SIZE=${DP_SHARD_SIZE:-$((TOTAL_GPU/CP_SIZE))}
LOG_STEPS=${LOG_STEPS:-100}
CP_SIZE=${CP_SIZE:-1}
if (( TOTAL_GPU % CP_SIZE != 0 )); then
echo "CP_SIZE ($CP_SIZE) must evenly divide TOTAL_GPU ($TOTAL_GPU)."
exit 1
fi
DP_SHARD_SIZE=${DP_SHARD_SIZE:-$((TOTAL_GPU/CP_SIZE))}
if (( DP_SHARD_SIZE < 1 )); then
echo "DP_SHARD_SIZE must be >= 1."
exit 1
fi
LOG_STEPS=${LOG_STEPS:-100}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/speculative_decoding/launch_train.sh` around lines 147 - 149,
Validate CP_SIZE after it's set: ensure it's a positive integer, <= TOTAL_GPU,
and that TOTAL_GPU is divisible by CP_SIZE (so DP_SHARD_SIZE isn't
zero/truncated). Replace the current DP_SHARD_SIZE assignment with either a
strict check that exits with a clear error when TOTAL_GPU % CP_SIZE != 0, or
compute DP_SHARD_SIZE using integer ceiling (e.g., DP_SHARD_SIZE=$(((TOTAL_GPU +
CP_SIZE - 1) / CP_SIZE))) and ensure DP_SHARD_SIZE >= 1; check variables
CP_SIZE, DP_SHARD_SIZE, and TOTAL_GPU in the same block and produce a clear
error message and exit on invalid values.

DRAFT_VOCAB_CACHE=${DRAFT_VOCAB_CACHE:-""}

if [[ "$MODE" == "medusa" ]]; then
SPECULATIVE_ARGS="--medusa_num_heads $MEDUSA_NUM_HEADS --medusa_num_layers $MEDUSA_NUM_LAYERS"
elif [[ "$MODE" == "eagle1" || "$MODE" == "eagle3" ]]; then

if [[ "$MODE" == "eagle3" ]]; then
if [[ -n "$EAGLE_CONFIG" ]]; then
SPECULATIVE_ARGS="--eagle_config $EAGLE_CONFIG"
else
SPECULATIVE_ARGS=""
fi
else
echo "Only medusa, eagle1, eagle3 supported for now!"
echo "Only eagle3 supported for now!"
exit 1
fi

Expand All @@ -180,7 +179,7 @@ else
VLM_ARGS=""
fi

if [[ "$GPU_COUNT" -gt 1 ]]; then
if [[ "$TOTAL_GPU" -gt 1 ]]; then
#Use FSDP2 when multi GPU available
FSDP_ARGS="--fsdp 'full_shard' --fsdp_config fsdp_config.json"
else
Expand All @@ -195,10 +194,20 @@ else
DRAFT_VOCAB_CACHE_ARGS=""
fi

if [[ "$NUM_NODES" != 1 ]]; then
MULTI_NODE_ARGS="--num_processes $TOTAL_GPU \
--num_machines $NUM_NODES \
--machine_rank $SLURM_PROCID \
--rdzv_backend c10d \
--main_process_ip $HEAD_NODE_IP \
--main_process_port 29500"
Comment on lines +197 to +203
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fail fast when required multi-node vars are missing.

When NUM_NODES != 1, the script relies on HEAD_NODE_IP and SLURM_PROCID. If either is empty, the accelerate rendezvous will fail with confusing errors.

🚦 Suggested fix
 if [[ "$NUM_NODES" != 1 ]]; then
+  if [[ -z "$HEAD_NODE_IP" ]]; then
+    echo "HEAD_NODE_IP is required when NUM_NODES > 1."
+    exit 1
+  fi
+  if [[ -z "$SLURM_PROCID" ]]; then
+    echo "SLURM_PROCID is required when NUM_NODES > 1."
+    exit 1
+  fi
   MULTI_NODE_ARGS="--num_processes $TOTAL_GPU \
                    --num_machines $NUM_NODES \
                    --machine_rank $SLURM_PROCID \
                    --rdzv_backend c10d \
                    --main_process_ip $HEAD_NODE_IP \
                    --main_process_port 29500"
 else
   MULTI_NODE_ARGS=""
 fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if [[ "$NUM_NODES" != 1 ]]; then
MULTI_NODE_ARGS="--num_processes $TOTAL_GPU \
--num_machines $NUM_NODES \
--machine_rank $SLURM_PROCID \
--rdzv_backend c10d \
--main_process_ip $HEAD_NODE_IP \
--main_process_port 29500"
if [[ "$NUM_NODES" != 1 ]]; then
if [[ -z "$HEAD_NODE_IP" ]]; then
echo "HEAD_NODE_IP is required when NUM_NODES > 1."
exit 1
fi
if [[ -z "$SLURM_PROCID" ]]; then
echo "SLURM_PROCID is required when NUM_NODES > 1."
exit 1
fi
MULTI_NODE_ARGS="--num_processes $TOTAL_GPU \
--num_machines $NUM_NODES \
--machine_rank $SLURM_PROCID \
--rdzv_backend c10d \
--main_process_ip $HEAD_NODE_IP \
--main_process_port 29500"
else
MULTI_NODE_ARGS=""
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/speculative_decoding/launch_train.sh` around lines 197 - 203, When
NUM_NODES != 1, add explicit validation for required multi-node environment
variables (HEAD_NODE_IP and SLURM_PROCID) before building MULTI_NODE_ARGS:
verify HEAD_NODE_IP is non-empty and SLURM_PROCID is set (and numeric if
desired), and if either is missing print a clear error message referencing the
missing variable(s) and exit non-zero so the script fails fast; update the block
that constructs MULTI_NODE_ARGS to run only after these checks succeed (use the
existing NUM_NODES, TOTAL_GPU, MULTI_NODE_ARGS names to locate where to insert
the checks).

else
MULTI_NODE_ARGS=""
fi

# Disable tokenizers parallelism to avoid warning
export TOKENIZERS_PARALLELISM=False
CMD="accelerate launch --mixed_precision bf16 main.py \
CMD="accelerate launch $MULTI_NODE_ARGS --mixed_precision bf16 main.py \
--mode $MODE \
--eagle_decoder_type $EAGLE_DECODER_TYPE \
--model_name_or_path $MODEL \
Expand Down
57 changes: 57 additions & 0 deletions examples/speculative_decoding/slurm.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
#!/bin/bash

# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

#SBATCH -A {account}
#SBATCH --job-name={job_name}
#SBATCH --nodes={num_nodes} --ntasks-per-node=1 --gpus-per-node={num_gpus_per_node}
#SBATCH -p {partition}
#SBATCH -t {time_limit}

CONTAINER_IMAGE={container_image}
WORK_DIR={path_to_modelopt}

CONTAINER_MOUNT="${WORK_DIR}:/modelopt"

OUTPUT_DIR={path_to_output_dir}
MODEL={path_to_model_dir}
DATA={path_to_data_dir}
OFFLINE_DATA={path_to_offline_data_dir}

CMD="./launch_train.sh --model $MODEL \
--output_dir $OUTPUT_DIR \
--data $DATA \
--num_epochs 1 \
--train_bs 1 \
--lr 1e-4 \
--eagle_config eagle_config.json \
--training_seq_len 4096 \
--save_steps 1000 \
--estimate_ar True \
--disable_tqdm True \
--offline-data $OFFLINE_DATA \
--num_nodes $SLURM_NNODES \
--head_node_ip $head_node_ip \
Comment on lines +24 to +47
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Define HEAD_NODE_IP (and use consistent casing) before passing it.

--head_node_ip $head_node_ip references an undefined variable, so multi-node runs will pass an empty IP and fail to rendezvous. Please define it (placeholder or derive from SLURM) and use a consistent variable name.

🔧 Suggested fix
 CONTAINER_IMAGE={container_image}
 WORK_DIR={path_to_modelopt}
+HEAD_NODE_IP={head_node_ip}

 CONTAINER_MOUNT="${WORK_DIR}:/modelopt"

 OUTPUT_DIR={path_to_output_dir}
 MODEL={path_to_model_dir}
 DATA={path_to_data_dir}
 OFFLINE_DATA={path_to_offline_data_dir}
+
+if [[ "${SLURM_NNODES:-1}" -gt 1 && -z "$HEAD_NODE_IP" ]]; then
+  echo "HEAD_NODE_IP is required for multi-node runs."
+  exit 1
+fi

 CMD="./launch_train.sh --model $MODEL \
             --output_dir $OUTPUT_DIR \
             --data $DATA \
             --num_epochs 1 \
             --train_bs 1 \
             --lr 1e-4 \
             --eagle_config eagle_config.json \
             --training_seq_len 4096 \
             --save_steps 1000 \
             --estimate_ar True \
             --disable_tqdm True \
             --offline-data $OFFLINE_DATA \
             --num_nodes $SLURM_NNODES \
-            --head_node_ip $head_node_ip \
+            --head_node_ip $HEAD_NODE_IP \
 "
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
CONTAINER_IMAGE={container_image}
WORK_DIR={path_to_modelopt}
CONTAINER_MOUNT="${WORK_DIR}:/modelopt"
OUTPUT_DIR={path_to_output_dir}
MODEL={path_to_model_dir}
DATA={path_to_data_dir}
OFFLINE_DATA={path_to_offline_data_dir}
CMD="./launch_train.sh --model $MODEL \
--output_dir $OUTPUT_DIR \
--data $DATA \
--num_epochs 1 \
--train_bs 1 \
--lr 1e-4 \
--eagle_config eagle_config.json \
--training_seq_len 4096 \
--save_steps 1000 \
--estimate_ar True \
--disable_tqdm True \
--offline-data $OFFLINE_DATA \
--num_nodes $SLURM_NNODES \
--head_node_ip $head_node_ip \
CONTAINER_IMAGE={container_image}
WORK_DIR={path_to_modelopt}
HEAD_NODE_IP={head_node_ip}
CONTAINER_MOUNT="${WORK_DIR}:/modelopt"
OUTPUT_DIR={path_to_output_dir}
MODEL={path_to_model_dir}
DATA={path_to_data_dir}
OFFLINE_DATA={path_to_offline_data_dir}
if [[ "${SLURM_NNODES:-1}" -gt 1 && -z "$HEAD_NODE_IP" ]]; then
echo "HEAD_NODE_IP is required for multi-node runs."
exit 1
fi
CMD="./launch_train.sh --model $MODEL \
--output_dir $OUTPUT_DIR \
--data $DATA \
--num_epochs 1 \
--train_bs 1 \
--lr 1e-4 \
--eagle_config eagle_config.json \
--training_seq_len 4096 \
--save_steps 1000 \
--estimate_ar True \
--disable_tqdm True \
--offline-data $OFFLINE_DATA \
--num_nodes $SLURM_NNODES \
--head_node_ip $HEAD_NODE_IP \
🧰 Tools
🪛 Shellcheck (0.11.0)

[warning] 9-9: This { is literal. Check expression (missing ;/\n?) or quote it.

(SC1083)


[warning] 9-9: This } is literal. Check expression (missing ;/\n?) or quote it.

(SC1083)


[warning] 10-10: This { is literal. Check expression (missing ;/\n?) or quote it.

(SC1083)


[warning] 10-10: This } is literal. Check expression (missing ;/\n?) or quote it.

(SC1083)


[warning] 14-14: This { is literal. Check expression (missing ;/\n?) or quote it.

(SC1083)


[warning] 14-14: This } is literal. Check expression (missing ;/\n?) or quote it.

(SC1083)


[warning] 15-15: This { is literal. Check expression (missing ;/\n?) or quote it.

(SC1083)


[warning] 15-15: This } is literal. Check expression (missing ;/\n?) or quote it.

(SC1083)


[warning] 16-16: This { is literal. Check expression (missing ;/\n?) or quote it.

(SC1083)


[warning] 16-16: This } is literal. Check expression (missing ;/\n?) or quote it.

(SC1083)


[warning] 17-17: This { is literal. Check expression (missing ;/\n?) or quote it.

(SC1083)


[warning] 17-17: This } is literal. Check expression (missing ;/\n?) or quote it.

(SC1083)


[warning] 32-32: head_node_ip is referenced but not assigned.

(SC2154)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/speculative_decoding/slurm.sh` around lines 9 - 32, The CMD uses an
undefined lowercased variable head_node_ip; define a HEAD_NODE_IP variable (or
derive it from SLURM, e.g., primary node IP) before constructing CMD and then
use that exact variable name in CMD (replace --head_node_ip $head_node_ip with
--head_node_ip $HEAD_NODE_IP) to ensure consistent casing and a non-empty value
during multi-node rendezvous; update any other references to use HEAD_NODE_IP
consistently (symbols: CMD, head_node_ip, HEAD_NODE_IP, SLURM_NNODES).

"

srun -l \
--mpi=pmix \
--output=%x_%j_$DATETIME.log \
--container-workdir "/modelopt/examples/speculative_decoding" \
--container-image ${CONTAINER_IMAGE} --container-mounts ${CONTAINER_MOUNT} \
bash -lc "$CMD"

set +x
74 changes: 0 additions & 74 deletions tests/examples/speculative_decoding/test_medusa.py

This file was deleted.