[EAGLE] Configurable number of TTT steps (#1042)

benchislett · web-flow · commit c76633ac9def · 2026-03-18T09:32:17.000-07:00
### What does this PR do? Type of change: new CLI option for existing option  - Added num_ttt_steps CLI flag - Changed num_ttt_steps default from 4 to 3 for consistency. Num_spec_tokens == 3 or == 7 are most common in practice, so rounding down to 3 and allowing users to increment higher on-demand. Will also improve training efficiency for the OOTB experience. ### Usage Users can now pass `--num_ttt_steps 7` to `launch_train.sh` when training an EAGLE3 model for extended speculation lengths. ### Testing N/A ### Before your PR is "*Ready for review*" Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md) and your commits are signed (`git commit -s -S`). Make sure you read and follow the [Security Best Practices](https://github.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md#security-coding-practices-for-contributors) (e.g. avoiding hardcoded `trust_remote_code=True`, `torch.load(..., weights_only=False)`, `pickle`, etc.). - Is this change backward compatible?: ✅ - If you copied code from any other sources or added a new PIP dependency, did you follow guidance in `CONTRIBUTING.md`: N/A - Did you write any new necessary tests?: N/A - Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?: N/A  ## Summary by CodeRabbit * **New Features** * Added ability to configure train-time-test steps for speculative decoding training via command-line argument. * Updated default train-time-test steps value from 4 to 3.  Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
diff --git a/examples/speculative_decoding/launch_train.sh b/examples/speculative_decoding/launch_train.sh
@@ -86,6 +86,10 @@ while [ $# -gt 0 ]; do
       if [[ "$1" != *=* ]]; then shift; fi
       AR_VALIDATE_STEPS="${1#*=}"
       ;;
+    --num_ttt_steps*)
+      if [[ "$1" != *=* ]]; then shift; fi
+      NUM_TTT_STEPS="${1#*=}"
+      ;;
     --cp_size*)
       if [[ "$1" != *=* ]]; then shift; fi
       CP_SIZE="${1#*=}"
@@ -154,6 +158,7 @@ DP_SHARD_SIZE=${DP_SHARD_SIZE:-$((TOTAL_GPU/CP_SIZE))}
 LOG_STEPS=${LOG_STEPS:-100}
 DRAFT_VOCAB_CACHE=${DRAFT_VOCAB_CACHE:-""}
 MIX_HIDDEN_STATES=${MIX_HIDDEN_STATES:-"False"}
+NUM_TTT_STEPS=${NUM_TTT_STEPS:-3}
 
 
 if [[ "$MODE" == "eagle3" ]]; then
@@ -247,6 +252,7 @@ CMD="accelerate launch $MULTI_NODE_ARGS --mixed_precision bf16 ${SCRIPT_DIR}/mai
     $FSDP_ARGS \
     --cp_size $CP_SIZE \
     --dp_shard_size $DP_SHARD_SIZE \
+    --num_ttt_steps $NUM_TTT_STEPS \
 "
 
 start_time=$(date +%s)
diff --git a/examples/speculative_decoding/main.py b/examples/speculative_decoding/main.py
@@ -130,6 +130,10 @@ class EagleArguments:
         default=False,
         metadata={"help": "Whether to mix hidden states from previous TTT step."},
     )
+    num_ttt_steps: int = field(
+        default=3,
+        metadata={"help": "Number of train-time-test steps to use during training."},
+    )
 
 
 def train():
@@ -208,6 +212,7 @@ def train():
                 "eagle_decoder_type": eagle_args.eagle_decoder_type,
                 "eagle_offline": use_offline_training,
                 "eagle_mix_hidden_states": eagle_args.mix_hidden_states,
+                "eagle_ttt_steps": eagle_args.num_ttt_steps,
                 "eagle_architecture_config": custom_config,
             }
 
diff --git a/modelopt/torch/speculative/config.py b/modelopt/torch/speculative/config.py
@@ -101,7 +101,7 @@ class EagleConfig(ModeloptBaseConfig):
     )
 
     eagle_ttt_steps: int = ModeloptField(
-        default=4, description=("The number of train-time-test steps in training.")
+        default=3, description=("The number of train-time-test steps in training.")
     )
 
     eagle_mix_hidden_states: bool = ModeloptField(

Original file line number	Diff line number	Diff line change
`@@ -101,7 +101,7 @@ class EagleConfig(ModeloptBaseConfig):`
`101`	`101`	`)`
`102`	`102`
`103`	`103`	`eagle_ttt_steps: int = ModeloptField(`
`104`		`- default=4, description=("The number of train-time-test steps in training.")`
	`104`	`+ default=3, description=("The number of train-time-test steps in training.")`
`105`	`105`	`)`
`106`	`106`
`107`	`107`	`eagle_mix_hidden_states: bool = ModeloptField(`