You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<!-- markdownlint-disable -->
## Purpose
P-EAGLE inherits EAGLE-3's architecture but introduces parallel
prediction. The approach uses the same lightweight decoder architecture
as EAGLE-3 but generates multiple token predictions in parallel through
Conditional-On-Distribution (COD) sampling, rather than sequential
test-time training steps. This PR implements the P-EAGLE algorithm based
on existing EAGLE3 infrastructure.
<!--- Why your changes are needed -->
## Description
- Model definition: `speculators/models/peagle/core.py` — P-EAGLE draft
model with parallel multi-token prediction via COD sampling, flex
attention for cross-depth masking, and a learnable `mask_hidden`
parameter for padding unsampled positions
- Metrics: Extracted loss and accuracy computation into
`peagle/metrics.py`. Includes count-based normalization for correct
distributed averaging when COD causes different ranks to sample
different depths.
- Trainer: Added `normalize_counted_metrics` to `train/utils.py` to
handle per-position accuracy averaging across ranks, with minimal
changes to the shared trainer.
- Training script: Added P-EAGLE config/args to `scripts/train.py` and
an example training script for Qwen3-8B
## Related Issue
#292
## Tests
- Added a training example for Qwen3-8B with ShareGPT 5k samples
- Will post more validation results.
I have filled in:
- [x] The purpose of the PR, such as "Fix some issue (link existing
issues this PR will resolve)".
- [x] The test plan/results, such as providing test command and pasting
the results.
- [ ] (Optional) The necessary documentation update.
- [x] I (a human) have written or reviewed the code in this pr to the
best of my ability.
---------
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
Co-authored-by: Megan Flynn <mflynn@redhat.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
0 commit comments