Skip to content

Commit ec61f24

Browse files
yeyu-nvidiaclaude
andcommitted
Free cached GPU memory before AR validation to avoid OOM
With LoRA co-training the model carries extra parameters and optimizer states (LoRA A/B + Adam moments), reducing the headroom available for the validation forward passes. Call torch.cuda.empty_cache() before validate_ar() to release unused cached allocations without affecting any live tensors (parameters, optimizer states, gradients). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Ye Yu <yeyu@nvidia.com>
1 parent 5498e9f commit ec61f24

1 file changed

Lines changed: 1 addition & 0 deletions

File tree

examples/speculative_decoding/eagle_utils.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -260,6 +260,7 @@ def on_step_end(self, args, state, control, **kwargs):
260260
return control
261261
if state.global_step % self.ar_validate_steps == 0 and state.global_step > 0:
262262
print_rank_0("Running AR validation...")
263+
torch.cuda.empty_cache()
263264
try:
264265
ars = validate_ar(
265266
model=kwargs["model"],

0 commit comments

Comments
 (0)