Skip to content

support eagle3 offline training with per_device_train_batch_size>1#264

Merged
irisliu10 merged 8 commits into
Tencent:mainfrom
dawnranger:main
Mar 19, 2026
Merged

support eagle3 offline training with per_device_train_batch_size>1#264
irisliu10 merged 8 commits into
Tencent:mainfrom
dawnranger:main

Conversation

@dawnranger
Copy link
Copy Markdown
Contributor

@dawnranger dawnranger commented Mar 16, 2026

  1. optimize hidden_state generation by avoid redundant CPU/CPU data transfer
  2. fix #263
  3. optimize training log: merge acc/ploss log with base log && add remaining_time log
  4. auto deduce lm_head_key/embed_weight_key/chat_template_type from config
  5. remove batch hidden generate scripts
  6. optimize training time by precompute vocab_mapping in generate_hidden and load it in offline training

@dawnranger
Copy link
Copy Markdown
Contributor Author

dawnranger commented Mar 17, 2026

about log optimize

  • before
image
  • after
image

yghstill
yghstill previously approved these changes Mar 18, 2026
Copy link
Copy Markdown
Collaborator

@irisliu10 irisliu10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@irisliu10 irisliu10 merged commit 676ae7d into Tencent:main Mar 19, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RuntimeError: The size of tensor a (3) must match the size of tensor b (12) at non-singleton dimension 0

3 participants