Skip to content

Commit a9d2468

Browse files
committed
[SPARK-57002][INFRA] Enforce Upstream-First policy in merge_spark_pr.py cherry-pick prompts
### What changes were proposed in this pull request? When a committer manually types `branch-M.N` at the cherry-pick prompt while `branch-M.x` exists and has not yet received the commit, the script now surfaces the Upstream-First policy and offers to pick into both branches in one step (the policy-compliant default). The committer can still pick only `branch-M.N` if the commit is genuinely a `branch-M.N`-only maintenance bugfix, or abort. Implementation notes: - Split `cherry_pick` into `_do_cherry_pick` (fetch + cherry-pick + push) and `cherry_pick` (prompt + policy check). The policy wrapper returns a list of refs so the main loop can advance its remaining-branches list correctly when one prompt consumes two branches. - Replace the `branch_iter` iterator with a mutable `remaining_branches` list in the main cherry-pick loop, so picks consumed by the two-branch path are accounted for in the next prompt's default. - Add an `already_picked` parameter to `cherry_pick` so the policy check skips its prompt when `branch-M.x` is in the set of refs already touched this session (e.g. when the PR was merged into `branch-M.x` and the loop is now picking into `branch-M.N`). ### Why are the changes needed? The Upstream-First backporting policy (documented in the header comment of `dev/merge_spark_pr.py`) requires non-bugfix commits to flow through `branch-M.x` before reaching `branch-M.N`. The merge script already orders `branch-M.x` ahead of `branch-M.N` as the cherry-pick default. However, when a committer types `branch-M.N` at the prompt, the script silently proceeds and `branch-M.x` is never revisited. This has led to commits landing on `branch-4.2` but missing `branch-4.x`. Six such commits observed on the current branches (as of 2026-05-22): - SPARK-56700 (#55651) - SPARK-56676 (#55623) - SPARK-56838 (#55836) - SPARK-56650 (#55589) - SPARK-56856 (#55969) - SPARK-56977 (#56023) All six landed on master and `branch-4.2` but were not cherry-picked to `branch-4.x`, requiring follow-up backports. ### Does this PR introduce _any_ user-facing change? Yes for committers using `dev/merge_spark_pr.py`. When the typed cherry-pick target is `branch-M.N` and `branch-M.x` exists and is not yet picked, an additional prompt asks whether to pick into both. Accepting the default ("both") preserves prior behavior plus an extra cherry-pick to `branch-M.x`. No change when the committer accepts the default `branch-M.x` target, or when picking into `branch-M.x` first and `branch-M.N` second (the typical policy-compliant flow). ### How was this patch tested? - `python3 -m doctest dev/merge_spark_pr.py` passes (34/34, all pre-existing tests — none cover the new policy logic). - New `cherry_pick` policy logic was reviewed for behavior but **not exercised end-to-end**: actually running `merge_spark_pr.py` requires committer privileges and a live open PR to merge. Edge cases were traced by reading the code (PR target = master with manual branch-M.N entry; PR target = branch-M.x with default branch-M.N pick; multiple iterations after a two-branch pick). - Reviewers familiar with the merge flow are encouraged to verify behavior on first real use, especially the abort path and the interaction with manual conflict resolution inside `_do_cherry_pick`. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (Opus 4.7) Closes #56058 from viirya/infra-merge-script-upstream-first-policy. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
1 parent 9e13442 commit a9d2468

1 file changed

Lines changed: 102 additions & 11 deletions

File tree

dev/merge_spark_pr.py

Lines changed: 102 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -470,11 +470,8 @@ def merge_pr(pr_num, target_ref, title, body, pr_repo_desc, pr_author, co_author
470470
return merge_hash
471471

472472

473-
def cherry_pick(pr_num, merge_hash, default_branch):
474-
pick_ref = bold_input("Enter a branch name [%s]: " % default_branch)
475-
if pick_ref == "":
476-
pick_ref = default_branch
477-
473+
def _do_cherry_pick(pr_num, merge_hash, pick_ref):
474+
"""Cherry-pick `merge_hash` onto `pick_ref` and push. Returns the pushed ref."""
478475
pick_branch_name = "%s_PICK_PR_%s_%s" % (BRANCH_PREFIX, pr_num, pick_ref.upper())
479476

480477
run_cmd("git fetch %s %s:%s" % (PUSH_REMOTE_NAME, pick_ref, pick_branch_name))
@@ -495,7 +492,6 @@ def cherry_pick(pr_num, merge_hash, default_branch):
495492
try:
496493
run_cmd("git push %s %s:%s" % (PUSH_REMOTE_NAME, pick_branch_name, pick_ref))
497494
except Exception as e:
498-
clean_up()
499495
fail("Exception while pushing: %s" % e)
500496

501497
pick_hash = run_cmd("git rev-parse %s" % pick_branch_name)[:8]
@@ -506,6 +502,85 @@ def cherry_pick(pr_num, merge_hash, default_branch):
506502
return pick_ref
507503

508504

505+
def _upstream_first_sibling(target_ref, pick_ref, branch_names, already_picked):
506+
"""Return the sibling branch-M.x if Upstream-First should prompt, else None.
507+
508+
The policy only applies when the PR was merged into master: that's the only case
509+
where the committer can type branch-M.N at the cherry-pick prompt and bypass the
510+
rolling branch-M.x. When the PR was opened against branch-M.x the merge itself
511+
lands there (nothing to bypass), and when it was opened against branch-M.N the
512+
author already chose per-branch scope.
513+
514+
>>> _upstream_first_sibling("master", "branch-4.2", ["branch-4.x", "branch-4.2"], ())
515+
'branch-4.x'
516+
>>> _upstream_first_sibling("master", "branch-4.2", ["branch-4.x", "branch-4.2"],
517+
... ("branch-4.x",))
518+
>>> _upstream_first_sibling("master", "branch-4.x", ["branch-4.x"], ())
519+
>>> _upstream_first_sibling("master", "branch-4.99", ["branch-4.2"], ())
520+
>>> _upstream_first_sibling("branch-4.x", "branch-4.2", ["branch-4.x", "branch-4.2"], ())
521+
>>> _upstream_first_sibling("branch-4.2", "branch-3.5", ["branch-4.x", "branch-3.5"], ())
522+
"""
523+
if target_ref != "master":
524+
return None
525+
m = re.match(r"^branch-(\d+)\.(\d+)$", pick_ref)
526+
if not m:
527+
return None
528+
candidate = "branch-%s.x" % m.group(1)
529+
if candidate in branch_names and candidate not in already_picked:
530+
return candidate
531+
return None
532+
533+
534+
def cherry_pick(pr_num, merge_hash, default_branch, branch_names, target_ref, already_picked=()):
535+
"""Prompt for a target branch and cherry-pick `merge_hash` onto it.
536+
537+
Enforces the Upstream-First policy (see header comment) via
538+
`_upstream_first_sibling`: when the PR was merged into master and the committer
539+
types a branch-M.N target while branch-M.x is also a known release branch AND
540+
has not already received this commit, prompt to confirm whether to pick into
541+
BOTH (the policy-compliant default) or branch-M.N only (treated as a
542+
maintenance-only bugfix). Returns the list of refs actually picked into, so
543+
the main loop can advance its remaining-branches list correctly.
544+
"""
545+
pick_ref = bold_input("Enter a branch name [%s]: " % default_branch)
546+
if pick_ref == "":
547+
pick_ref = default_branch
548+
549+
sibling_x = _upstream_first_sibling(target_ref, pick_ref, branch_names, already_picked)
550+
if sibling_x is not None:
551+
print()
552+
print("=" * 80)
553+
print(
554+
"Upstream-First policy: non-bugfix commits on %s should also land on %s."
555+
% (pick_ref, sibling_x)
556+
)
557+
print(
558+
"If this is a %s-only maintenance bugfix, you may pick %s alone." % (pick_ref, pick_ref)
559+
)
560+
print("Otherwise, pick both (%s first, then %s)." % (sibling_x, pick_ref))
561+
print("=" * 80)
562+
choice = (
563+
bold_input(
564+
"Pick into [b]oth %s + %s / [o]nly %s / [a]bort (default: both): "
565+
% (sibling_x, pick_ref, pick_ref)
566+
)
567+
.strip()
568+
.lower()
569+
)
570+
if choice in ("", "b", "both"):
571+
picked_x = _do_cherry_pick(pr_num, merge_hash, sibling_x)
572+
picked_n = _do_cherry_pick(pr_num, merge_hash, pick_ref)
573+
return [picked_x, picked_n]
574+
elif choice in ("o", "only"):
575+
return [_do_cherry_pick(pr_num, merge_hash, pick_ref)]
576+
elif choice in ("a", "abort"):
577+
fail("Aborted by user at Upstream-First policy prompt.")
578+
else:
579+
fail("Unrecognized choice %r; aborting." % choice)
580+
581+
return [_do_cherry_pick(pr_num, merge_hash, pick_ref)]
582+
583+
509584
def print_jira_issue_summary(issue):
510585
summary = "Summary\t\t%s\n" % issue.fields.summary
511586
assignee = issue.fields.assignee
@@ -832,7 +907,6 @@ def main():
832907
branches = get_json("%s/branches" % GITHUB_API_BASE)
833908
branch_names = list(filter(lambda x: x.startswith("branch-"), [x["name"] for x in branches]))
834909
branch_names = sorted(branch_names, key=semver_branch_rank, reverse=True)
835-
branch_iter = iter(branch_names)
836910

837911
if len(sys.argv) == 1:
838912
pr_num = bold_input("Which pull request would you like to merge? (e.g. 34): ")
@@ -942,7 +1016,8 @@ def main():
9421016
fail("Couldn't find any merge commit for #%s, you may need to update HEAD." % pr_num)
9431017

9441018
print("Found commit %s:\n%s" % (merge_hash, message))
945-
cherry_pick(pr_num, merge_hash, next(branch_iter, branch_names[0]))
1019+
default = branch_names[0]
1020+
cherry_pick(pr_num, merge_hash, default, branch_names, target_ref, already_picked=())
9461021
sys.exit(0)
9471022

9481023
if not bool(pr["mergeable"]):
@@ -976,11 +1051,27 @@ def main():
9761051
print("PR #%s is still open after push; closing it explicitly." % pr_num)
9771052
close_pr(pr_num)
9781053

1054+
# Walk a mutable remaining-branches list so the next default correctly skips any
1055+
# branches already picked, including branches consumed by the Upstream-First two-branch
1056+
# path inside cherry_pick (e.g. picking branch-M.x + branch-M.N in a single prompt).
1057+
# merged_refs doubles as the already_picked set passed to cherry_pick: it starts with
1058+
# target_ref (the merge sink, never to be re-picked) and grows with every cherry-pick.
1059+
remaining_branches = [b for b in branch_names if b != target_ref]
9791060
pick_prompt = "Would you like to pick %s into another branch?" % merge_hash
9801061
while bold_input("\n%s (y/N): " % pick_prompt).lower() == "y":
981-
merged_refs = merged_refs + [
982-
cherry_pick(pr_num, merge_hash, next(branch_iter, branch_names[0]))
983-
]
1062+
default = remaining_branches[0] if remaining_branches else branch_names[0]
1063+
picked = cherry_pick(
1064+
pr_num,
1065+
merge_hash,
1066+
default,
1067+
branch_names,
1068+
target_ref,
1069+
already_picked=tuple(merged_refs),
1070+
)
1071+
merged_refs = merged_refs + picked
1072+
for b in picked:
1073+
if b in remaining_branches:
1074+
remaining_branches.remove(b)
9841075

9851076
if asf_jira is not None:
9861077
continue_maybe("Would you like to update an associated JIRA?")

0 commit comments

Comments
 (0)