Skip to content

Add dev-feature preservation gate and change schedule#4773

Open
Phlip79 wants to merge 1 commit into
NVIDIA:mainfrom
Phlip79:feat/nightly-sync-dev-preservation-gate
Open

Add dev-feature preservation gate and change schedule#4773
Phlip79 wants to merge 1 commit into
NVIDIA:mainfrom
Phlip79:feat/nightly-sync-dev-preservation-gate

Conversation

@Phlip79
Copy link
Copy Markdown
Member

@Phlip79 Phlip79 commented May 13, 2026

Two changes to the main-to-dev nightly sync workflow:

  1. New workflow .github/workflows/nightly-sync-dev-preservation-gate.yml that runs deterministically on every push to a main2dev/* PR (no LLM in the loop). For each non-exempt file the sync touched, it computes:

    (lines on origin/dev) - (lines on origin/main) - (lines in tree)
    

    and posts a sticky PR comment listing every line that satisfies all three. The workflow fails if any non-exempt file has a non-empty result, blocking the PR from being marked ready.

    Catches the most common sync regression: feature lands on dev at T0, lands on main at T1>T0, sync runs in between, -X theirs drops dev's feature wherever main happened to touch nearby lines. Recent examples this would have caught:

    • _forward_mlp_router(input_ids=None) in transformer_layer.py
    • num_sms_preprocessing_api=... kwarg in token_dispatcher.py
    • self._maybe_record_overload_factor(...) call in moe_layer.py
    • parse_and_validate_args import in gpt_dynamic_inference_with_coordinator.py
    • args.dynamic_context_parallel references in data_samplers.py / utils.py / training.py
    • "Packing Scheduler" section in datasets/readme.md

    Files in the skill's "Files to Override from Main" list
    (training.py, utils.py, data_samplers.py, initialize.py,
    layer_wise_optimizer.py) report as warning rather than error,
    matching the skill's intent that main may legitimately win there.
    pyproject.toml / uv.lock / docker/Dockerfile.ci.dev and CODEOWNERS
    are skipped entirely (always dev's by skill rule).

    The job also publishes a prompt-addendum (on workflow_dispatch only)
    that can be pasted into the sync-bot prompt so the agent fixes
    violations proactively and the deterministic gate stays green.

  2. Schedule change in .github/workflows/nightly-sync-main-to-dev.yml: from daily at 21:00 UTC to twice-weekly (Monday + Thursday) at 15:00 UTC, which is 8 AM PDT (7 AM PST in winter, since GitHub Actions cron is UTC-only and does not follow DST).

…hedule

Two changes to the main-to-dev nightly sync workflow:

1. New workflow `.github/workflows/nightly-sync-dev-preservation-gate.yml`
   that runs deterministically on every push to a `main2dev/*` PR (no
   LLM in the loop). For each non-exempt file the sync touched, it
   computes:

       (lines on origin/dev) - (lines on origin/main) - (lines in tree)

   and posts a sticky PR comment listing every line that satisfies all
   three. The workflow fails if any non-exempt file has a non-empty
   result, blocking the PR from being marked ready.

   Catches the most common sync regression: feature lands on dev at T0,
   lands on main at T1>T0, sync runs in between, `-X theirs` drops
   dev's feature wherever main happened to touch nearby lines. Recent
   examples this would have caught:
   - `_forward_mlp_router(input_ids=None)` in transformer_layer.py
   - `num_sms_preprocessing_api=...` kwarg in token_dispatcher.py
   - `self._maybe_record_overload_factor(...)` call in moe_layer.py
   - `parse_and_validate_args` import in
     gpt_dynamic_inference_with_coordinator.py
   - `args.dynamic_context_parallel` references in
     data_samplers.py / utils.py / training.py
   - "Packing Scheduler" section in datasets/readme.md

   Files in the skill's "Files to Override from Main" list
   (training.py, utils.py, data_samplers.py, initialize.py,
   layer_wise_optimizer.py) report as `warning` rather than `error`,
   matching the skill's intent that main may legitimately win there.
   pyproject.toml / uv.lock / docker/Dockerfile.ci.dev and CODEOWNERS
   are skipped entirely (always dev's by skill rule).

   The job also publishes a prompt-addendum (on workflow_dispatch only)
   that can be pasted into the sync-bot prompt so the agent fixes
   violations proactively and the deterministic gate stays green.

2. Schedule change in
   `.github/workflows/nightly-sync-main-to-dev.yml`: from daily at
   21:00 UTC to twice-weekly (Monday + Thursday) at 15:00 UTC, which
   is 8 AM PDT (7 AM PST in winter, since GitHub Actions cron is
   UTC-only and does not follow DST).
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 13, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@Phlip79 Phlip79 force-pushed the feat/nightly-sync-dev-preservation-gate branch from d309b08 to ac2e39b Compare May 13, 2026 07:07
@Phlip79
Copy link
Copy Markdown
Member Author

Phlip79 commented May 13, 2026

/ok to test ac2e39b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants