Skip to content

Commit 800a3dc

Browse files
HydrogenSulfateChiahsinChu
authored andcommitted
fix(pt/pd): fix eta computation (deepmodeling#4886)
fix eta computation code <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Improved ETA accuracy in training/validation progress logs by adapting calculations to recent step intervals, reducing misleading estimates early in runs. * Consistent behavior across both backends, providing more reliable remaining-time estimates without changing any public interfaces. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
1 parent 7e106d9 commit 800a3dc

2 files changed

Lines changed: 6 additions & 2 deletions

File tree

deepmd/pd/train/training.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -918,7 +918,9 @@ def log_loss_valid(_task_key="Default"):
918918
self.t0 = current_time
919919
if self.rank == 0 and self.timing_in_training:
920920
eta = int(
921-
(self.num_steps - display_step_id) / self.disp_freq * train_time
921+
(self.num_steps - display_step_id)
922+
/ min(self.disp_freq, display_step_id - self.start_step)
923+
* train_time
922924
)
923925
log.info(
924926
format_training_message(

deepmd/pt/train/training.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1046,7 +1046,9 @@ def log_loss_valid(_task_key="Default"):
10461046
self.t0 = current_time
10471047
if self.rank == 0 and self.timing_in_training:
10481048
eta = int(
1049-
(self.num_steps - display_step_id) / self.disp_freq * train_time
1049+
(self.num_steps - display_step_id)
1050+
/ min(self.disp_freq, display_step_id - self.start_step)
1051+
* train_time
10501052
)
10511053
log.info(
10521054
format_training_message(

0 commit comments

Comments
 (0)