You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Having a **kwargs in model.forward leads to some odd complications with
accelerate, where it sums rather than averages loss across parallel
processes.
Also does some other fixes in the amplify model since we'll need to push
a new version to the HF hub
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* New Features
* Data collator now supports a seed option for deterministic masking.
* Refactor
* Standardized dtype handling to a single dtype setting across
embeddings, norms, and layers.
* Ensured intermediate size is always defined when activation is not
swiglu.
* Simplified model forward APIs by removing unused keyword passthroughs.
* Tests
* Added loss verification tests for pretrained and reinitialized models
across implementations.
* Chores
* Updated development container to use a prebuilt image, increased
shared memory, and simplified dependency installation.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
0 commit comments