[do not merge] probe CP packed SFT larger model#4399
Closed
cuichenx wants to merge 5 commits into
Closed
Conversation
…d of #3839) Re-applies #3839 (reverted in #4363 after an accidental merge) with one fix. #3839's last commit added `cfg.checkpoint.load = None` to the CP+packing functional test (test_sft_example_runs_with_cp_and_packing). With pretrained_checkpoint also None, finetune() then fails its precondition (finetune.py:50) with "Finetuning requires a loading from a pretrained checkpoint or resuming from a checkpoint". This drops that line, restoring the pre-#3839 behavior (inherit the recipe's default load) so the test runs. The `use_distributed_optimizer=False` setting added to that test is kept: it works around an NCCL watchdog hang seen only under the distributed optimizer + context parallelism in this test (root-cause tracked separately; the THD-FLOPS code itself is inert under CP>1, taking the BSHD fallback). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Contributor
Author
|
/ok to test 91694820dc322b7569b9334198722976088d944a2 |
Contributor
Author
|
/ok to test 9169482 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Draft CI probe for PR #4366.
This branch keeps the final #4366 runtime FLOPs code, but restores the CP packed SFT functional-test helper to the larger main-sized tiny model (
hidden_size=256,ffn_hidden_size=1024,kv_channels=64) instead of the extra-small128/512/32shape.The goal is to verify whether the final THD-only FLOPs all-reduce fix is enough for CI memory, without relying on the extra 128-hidden shrink. The tiny in-memory SQuAD fixture is kept to avoid dataset-cache behavior confounding the memory signal.
Validation before opening:
uv run --no-sync python -m py_compile tests/functional_tests/test_groups/training/test_seqpacking_cp_example.pyuv run --no-sync --with pre-commit pre-commit run --files tests/functional_tests/test_groups/training/test_seqpacking_cp_example.py