Skip to content

[do not merge] probe CP packed SFT larger model#4399

Closed
cuichenx wants to merge 5 commits into
mainfrom
chcui/thd-flops-ci-probe-cp-sft-256
Closed

[do not merge] probe CP packed SFT larger model#4399
cuichenx wants to merge 5 commits into
mainfrom
chcui/thd-flops-ci-probe-cp-sft-256

Conversation

@cuichenx

Copy link
Copy Markdown
Contributor

Draft CI probe for PR #4366.

This branch keeps the final #4366 runtime FLOPs code, but restores the CP packed SFT functional-test helper to the larger main-sized tiny model (hidden_size=256, ffn_hidden_size=1024, kv_channels=64) instead of the extra-small 128/512/32 shape.

The goal is to verify whether the final THD-only FLOPs all-reduce fix is enough for CI memory, without relying on the extra 128-hidden shrink. The tiny in-memory SQuAD fixture is kept to avoid dataset-cache behavior confounding the memory signal.

Validation before opening:

  • uv run --no-sync python -m py_compile tests/functional_tests/test_groups/training/test_seqpacking_cp_example.py
  • uv run --no-sync --with pre-commit pre-commit run --files tests/functional_tests/test_groups/training/test_seqpacking_cp_example.py

cuichenx and others added 5 commits June 16, 2026 11:31
…d of #3839)

Re-applies #3839 (reverted in #4363 after an accidental merge) with one fix.

#3839's last commit added `cfg.checkpoint.load = None` to the CP+packing
functional test (test_sft_example_runs_with_cp_and_packing). With
pretrained_checkpoint also None, finetune() then fails its precondition
(finetune.py:50) with "Finetuning requires a loading from a pretrained
checkpoint or resuming from a checkpoint". This drops that line, restoring
the pre-#3839 behavior (inherit the recipe's default load) so the test runs.

The `use_distributed_optimizer=False` setting added to that test is kept:
it works around an NCCL watchdog hang seen only under the distributed
optimizer + context parallelism in this test (root-cause tracked separately;
the THD-FLOPS code itself is inert under CP>1, taking the BSHD fallback).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
@copy-pr-bot

copy-pr-bot Bot commented Jun 16, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@cuichenx

Copy link
Copy Markdown
Contributor Author

/ok to test 91694820dc322b7569b9334198722976088d944a2

@cuichenx

Copy link
Copy Markdown
Contributor Author

/ok to test 9169482

@cuichenx cuichenx added tracking Tracking issue for an ongoing project with smaller steps dummy-pr Filed this PR to run tests, not going to merge and removed tracking Tracking issue for an ongoing project with smaller steps labels Jun 16, 2026
@cuichenx cuichenx changed the title test(training): probe CP packed SFT larger model [do not merge] probe CP packed SFT larger model Jun 16, 2026
@cuichenx cuichenx closed this Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dummy-pr Filed this PR to run tests, not going to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant