Skip to content

[SPARK-57002][INFRA] Enforce Upstream-First policy in merge_spark_pr.py cherry-pick prompts#56058

Open
viirya wants to merge 3 commits into
apache:masterfrom
viirya:infra-merge-script-upstream-first-policy
Open

[SPARK-57002][INFRA] Enforce Upstream-First policy in merge_spark_pr.py cherry-pick prompts#56058
viirya wants to merge 3 commits into
apache:masterfrom
viirya:infra-merge-script-upstream-first-policy

Conversation

@viirya
Copy link
Copy Markdown
Member

@viirya viirya commented May 22, 2026

What changes were proposed in this pull request?

When a committer manually types branch-M.N at the cherry-pick prompt while branch-M.x exists and has not yet received the commit, the script now surfaces the Upstream-First policy and offers to pick into both branches in one step (the policy-compliant default). The committer can still pick only branch-M.N if the commit is genuinely a branch-M.N-only maintenance bugfix, or abort.

Implementation notes:

  • Split cherry_pick into _do_cherry_pick (fetch + cherry-pick + push) and cherry_pick (prompt + policy check). The policy wrapper returns a list of refs so the main loop can advance its remaining-branches list correctly when one prompt consumes two branches.
  • Replace the branch_iter iterator with a mutable remaining_branches list in the main cherry-pick loop, so picks consumed by the two-branch path are accounted for in the next prompt's default.
  • Add an already_picked parameter to cherry_pick so the policy check skips its prompt when branch-M.x is in the set of refs already touched this session (e.g. when the PR was merged into branch-M.x and the loop is now picking into branch-M.N).

Why are the changes needed?

The Upstream-First backporting policy (documented in the header comment of dev/merge_spark_pr.py) requires non-bugfix commits to flow through branch-M.x before reaching branch-M.N. The merge script already orders branch-M.x ahead of branch-M.N as the cherry-pick default. However, when a committer types branch-M.N at the prompt, the script silently proceeds and branch-M.x is never revisited.

This has led to commits landing on branch-4.2 but missing branch-4.x. Six such commits observed on the current branches (as of 2026-05-22):

All six landed on master and branch-4.2 but were not cherry-picked to branch-4.x, requiring follow-up backports.

Does this PR introduce any user-facing change?

Yes for committers using dev/merge_spark_pr.py. When the typed cherry-pick target is branch-M.N and branch-M.x exists and is not yet picked, an additional prompt asks whether to pick into both. Accepting the default ("both") preserves prior behavior plus an extra cherry-pick to branch-M.x.

No change when the committer accepts the default branch-M.x target, or when picking into branch-M.x first and branch-M.N second (the typical policy-compliant flow).

How was this patch tested?

  • python3 -m doctest dev/merge_spark_pr.py passes (34/34, all pre-existing tests — none cover the new policy logic).
  • New cherry_pick policy logic was reviewed for behavior but not exercised end-to-end: actually running merge_spark_pr.py requires committer privileges and a live open PR to merge. Edge cases were traced by reading the code (PR target = master with manual branch-M.N entry; PR target = branch-M.x with default branch-M.N pick; multiple iterations after a two-branch pick).
  • Reviewers familiar with the merge flow are encouraged to verify behavior on first real use, especially the abort path and the interaction with manual conflict resolution inside _do_cherry_pick.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.7)

…py cherry-pick prompts

### What changes were proposed in this pull request?

When a committer manually types `branch-M.N` at the cherry-pick prompt while
`branch-M.x` exists and has not yet received the commit, the script now
surfaces the Upstream-First policy and offers to pick into both branches in
one step (the policy-compliant default). The committer can still pick only
`branch-M.N` if the commit is genuinely a `branch-M.N`-only maintenance
bugfix, or abort.

Implementation notes:

- Split `cherry_pick` into `_do_cherry_pick` (fetch + cherry-pick + push) and
  `cherry_pick` (prompt + policy check). The policy wrapper returns a list of
  refs so the main loop can advance its remaining-branches list correctly
  when one prompt consumes two branches.
- Replace the `branch_iter` iterator with a mutable `remaining_branches` list
  in the main cherry-pick loop, so picks consumed by the two-branch path are
  accounted for in the next prompt's default.
- Add an `already_picked` parameter to `cherry_pick` so the policy check
  skips its prompt when `branch-M.x` is in the set of refs already touched
  this session (e.g. when the PR was merged into `branch-M.x` and the loop
  is now picking into `branch-M.N`).

### Why are the changes needed?

The Upstream-First backporting policy (documented in the header comment)
requires non-bugfix commits to flow through `branch-M.x` before reaching
`branch-M.N`. The merge script already orders `branch-M.x` ahead of
`branch-M.N` as the cherry-pick default. However, when a committer types
`branch-M.N` at the prompt, the script silently proceeds and `branch-M.x` is
never revisited.

This has led to commits landing on `branch-4.2` but missing `branch-4.x`.
Six such commits observed on the current branches (as of 2026-05-22) are
linked from SPARK-57002.

### Does this PR introduce _any_ user-facing change?

Yes for committers using `dev/merge_spark_pr.py`. When the typed cherry-pick
target is `branch-M.N` and `branch-M.x` exists and is not yet picked, an
additional prompt asks whether to pick into both. Accepting the default
("both") preserves prior behavior plus an extra cherry-pick to `branch-M.x`.

No change when the committer accepts the default `branch-M.x` target, or
when picking into `branch-M.x` first and `branch-M.N` second (the typical
policy-compliant flow).

### How was this patch tested?

- `python3 -c "import py_compile; py_compile.compile('dev/merge_spark_pr.py', doraise=True)"` passes.
- `python3 -m doctest dev/merge_spark_pr.py` passes (34/34).

Co-authored-by: Claude Code
@viirya viirya requested a review from cloud-fan May 22, 2026 07:57
Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR closes a real gap — six recent commits silently bypassed branch-4.x because the merge script accepts a typed branch-M.N target without checking the policy. The factoring (_do_cherry_pick for git ops, cherry_pick for prompt+policy, remaining_branches for loop state, list-typed return) is clean.

Main feedback is one design refinement: the policy is gated on the cherry-pick target (pick_ref matching branch-M.N + sibling unpicked), but the right gate is the PR base. When target_ref is branch-M.N (separated backport PR), the author has explicitly chosen per-branch scope and the policy shouldn't fire. When target_ref is branch-M.x, the merge itself lands on branch-M.x so the bypass can't happen. The only case the policy is meaningful is target_ref == "master". See inline.

Plus smaller nits: redundant clean_up() in the abort branch, dead candidate != pick_ref check, "=" * 76 vs the "=" * 80 used elsewhere in this file, and an optional refactor to make the policy decision doctest-able (the PR notes no tests cover the new logic).

Comment thread dev/merge_spark_pr.py Outdated
# offer to pick both in one step. Skipping when sibling_x is the merge target or already
# in already_picked avoids a redundant prompt / failing re-cherry-pick.
sibling_x = None
m = re.match(r"^branch-(\d+)\.(\d+)$", pick_ref)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The policy gates on the cherry-pick target, but conceptually it should gate on the PR base (target_ref). The committer can only skip branch-M.x when the PR was opened against master:

  • target_ref == "master" → policy is meaningful (committer might type branch-M.N and bypass branch-M.x).
  • target_ref == "branch-M.x" → moot; the merge itself lands on branch-M.x so it can't be skipped.
  • target_ref == "branch-M.N" → wrong to fire; the author already chose per-branch scope by opening a separated backport PR.

The current code happens to suppress the M.x case via already_picked (because merged_refs is pre-seeded with target_ref), but that's incidental — already_picked's real job is in-loop suppression within a master-based merge (committer picks branch-M.x first, then branch-M.N, don't re-prompt). The M.N case isn't covered at all and will produce false-positive prompts on cross-branch-M.N cherry-picks.

target_ref is already in scope at both call sites (set at line 946). Suggest plumbing it as a parameter and short-circuiting when it isn't "master":

def cherry_pick(pr_num, merge_hash, default_branch, branch_names, target_ref, already_picked=()):
    ...
    sibling_x = None
    if target_ref == "master":
        m = re.match(r"^branch-(\d+)\.(\d+)$", pick_ref)
        if m:
            candidate = "branch-%s.x" % m.group(1)
            if candidate in branch_names and candidate not in already_picked:
                sibling_x = candidate
    ...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — the previous gating was incidental (the already_picked set happened to cover the branch-M.x merge target only because we seeded it with target_ref). Switched to explicit target_ref == "master" gating as you suggested. Plumbed target_ref into cherry_pick and into the new _upstream_first_sibling helper so the check is self-contained.

Addressed in the follow-up commit.

Comment thread dev/merge_spark_pr.py Outdated
Comment on lines +528 to +532
if (
candidate in branch_names
and candidate != pick_ref
and candidate not in already_picked
):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

candidate != pick_ref is unreachable as false: the regex on line 525 constrains pick_ref to branch-M.N where N is a digit, while candidate is branch-M.x (literal x). They can never be equal.

Suggested change
if (
candidate in branch_names
and candidate != pick_ref
and candidate not in already_picked
):
if (
candidate in branch_names
and candidate not in already_picked
):

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — removed. The regex already pins pick_ref to branch-M.N with digit minor, so candidate = branch-M.x can never equal pick_ref. Dead check.

Comment thread dev/merge_spark_pr.py
Comment on lines +560 to +562
elif choice in ("a", "abort"):
clean_up()
fail("Aborted by user at Upstream-First policy prompt.")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fail() already calls clean_up() (line 365), so this produces a double "Restoring head pointer to..." print. The else: fail(...) branch directly below doesn't pre-call cleanup; align them.

Suggested change
elif choice in ("a", "abort"):
clean_up()
fail("Aborted by user at Upstream-First policy prompt.")
elif choice in ("a", "abort"):
fail("Aborted by user at Upstream-First policy prompt.")

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — dropped the explicit clean_up() so the abort branch matches the else: fail(...) branch below. Both now rely on fail()'s own clean_up() call, no more double "Restoring head pointer to ..." print.

Comment thread dev/merge_spark_pr.py Outdated

if sibling_x is not None:
print()
print("=" * 76)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other visual dividers in this file use "=" * 80 (lines 933, 935, 943, 945). Use 80 here (and at line 545) for consistency.

Suggested change
print("=" * 76)
print("=" * 80)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aligned both dividers to "=" * 80 for consistency with the rest of the file.

Comment thread dev/merge_spark_pr.py Outdated
Comment on lines +524 to +533
sibling_x = None
m = re.match(r"^branch-(\d+)\.(\d+)$", pick_ref)
if m:
candidate = "branch-%s.x" % m.group(1)
if (
candidate in branch_names
and candidate != pick_ref
and candidate not in already_picked
):
sibling_x = candidate
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR notes no tests cover the new policy logic. The check here is pure (no I/O), so it could move into a small helper that's doctest-able without mocking bold_input:

def _upstream_first_sibling(pick_ref, branch_names, already_picked):
    """Return the sibling branch-M.x if Upstream-First should prompt, else None.

    >>> _upstream_first_sibling("branch-4.2", ["branch-4.x", "branch-4.2"], ())
    'branch-4.x'
    >>> _upstream_first_sibling("branch-4.2", ["branch-4.x", "branch-4.2"], ("branch-4.x",))
    >>> _upstream_first_sibling("branch-4.x", ["branch-4.x"], ())
    >>> _upstream_first_sibling("branch-4.99", ["branch-4.2"], ())
    """
    m = re.match(r"^branch-(\d+)\.(\d+)$", pick_ref)
    if not m:
        return None
    candidate = "branch-%s.x" % m.group(1)
    if candidate in branch_names and candidate not in already_picked:
        return candidate
    return None

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — extracted _upstream_first_sibling(target_ref, pick_ref, branch_names, already_picked) exactly along these lines, with doctests covering: PR target master (positive case + already-picked suppression), PR target branch-M.x and branch-M.N (both gated out), and absent branch-M.x sibling. Combined with the target_ref gating from comment #1, the helper is self-contained and pure; python3 -m doctest dev/merge_spark_pr.py now reports 40/40 (was 34/34).

- Gate the Upstream-First policy check on `target_ref == "master"` (the only
  case where bypassing branch-M.x is possible); pass `target_ref` to
  `cherry_pick` and short-circuit otherwise.
- Extract `_upstream_first_sibling` as a pure helper with doctests covering
  the gating cases (PR target master / branch-M.x / branch-M.N).
- Drop the unreachable `candidate != pick_ref` check (regex constrains
  pick_ref to branch-M.N with digits; candidate is always branch-M.x).
- Drop the redundant `clean_up()` call before `fail()` in the abort branch;
  `fail()` already calls `clean_up()`.
- Align the divider width (`=` * 80) with the rest of the file.

Co-authored-by: Claude Code
@viirya
Copy link
Copy Markdown
Member Author

viirya commented May 22, 2026

Thanks for the review @cloud-fan! Pushed follow-up 188d6c56ab8 addressing all five comments:

  1. Gating on target_ref == "master" — the previous already_picked-based suppression was incidental and missed the cross-branch-M.N cherry-pick case you flagged. Now cherry_pick takes target_ref and short-circuits when it isn't master.
  2. Removed unreachable candidate != pick_ref check — regex constrains pick_ref to digit minor, so it can never equal branch-M.x.
  3. Removed redundant clean_up() before fail() in the abort branchfail() already cleans up; the explicit call caused a double "Restoring head pointer to ..." print.
  4. Aligned divider width to "=" * 80 to match the rest of the file.
  5. Extracted _upstream_first_sibling as a pure helper with doctests covering: PR target master (positive + already_picked suppression), PR target branch-M.x and branch-M.N (both gated out), absent branch-M.x sibling, non-matching pick_ref.

python3 -m doctest dev/merge_spark_pr.py now passes 40/40 (was 34/34). The new policy logic is now covered by doctests; the end-to-end interactive flow still requires committer access to a live PR to exercise (noted in the PR description).

PTAL.

Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Followup looks good — all 5 comments addressed cleanly, and the 6 new doctests for _upstream_first_sibling give the policy logic the coverage it was missing. python3 -m doctest dev/merge_spark_pr.py passes 40/40.

One opportunistic cleanup I missed in the first round: _do_cherry_pick lines 495-496 still have the same clean_up(); fail("Exception while pushing: %s" % e) antipattern you just fixed in the new abort branch — pre-existing (unchanged context in the diff), but the same double "Restoring head pointer to..." print on push failure. Optional, since you're in the file.

Comment thread dev/merge_spark_pr.py
`_upstream_first_sibling`: when the PR was merged into master and the committer
types a branch-M.N target while branch-M.x is also a known release branch AND
has not already received this commit, prompt to confirm whether to pick into
BOTH (the policy-compliant default) or branch-M.N only (treated as a
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "has not already received this commit" reads as if it's checking git state, but the helper actually checks the already_picked set passed by the caller — i.e., refs picked in this script invocation. If a committer ran the script earlier (or it was reinvoked after a partial pick) and branch-4.x already has the commit on the remote, the policy will still prompt to pick both. Worth tightening to something like "has not already been picked in this script run" so the contract matches what the code checks.

…il() in _do_cherry_pick push-failure branch

Same double "Restoring head pointer to ..." print antipattern as the abort branch
fixed in the previous follow-up: `fail()` already calls `clean_up()`, so the
explicit call before it is redundant. Pre-existing in this function before the
refactor (was lines 495-496 of the original `cherry_pick`), but cleaning it up
opportunistically while in this file.

Co-authored-by: Claude Code
@viirya
Copy link
Copy Markdown
Member Author

viirya commented May 22, 2026

Thanks for the approval @cloud-fan! And good catch on the push-failure branch — same antipattern, same double-print, applied the same fix in 2c32cfd854a. Pre-existing in _do_cherry_pick (the function was just factored out of the original cherry_pick, so the clean_up(); fail(...) lines pre-date this PR), but it's natural to tidy here while we're in the file.

Final diff is now: new policy logic + 4 cleanups (3 from the previous follow-up + this one).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants