[SPARK-57341][INFRA] Reconcile JIRA components with the PR title in merge script#56400
[SPARK-57341][INFRA] Reconcile JIRA components with the PR title in merge script#56400zhengruifeng wants to merge 3 commits into
Conversation
…spark_pr.py When merging a PR, merge_spark_pr.py now compares the primary component tags in the normalized PR title against the primary components on the linked JIRA ticket. On a mismatch it prompts the committer to overwrite the JIRA's primary components with the PR title's, append them, or keep JIRA unchanged (the default). Non-primary tags such as [TEST] and non-primary JIRA components such as "Optimizer" are ignored by the comparison and preserved by both updates, so a common title like [SQL][TEST] no longer prompts against a SQL-only ticket. The JIRA summary printed during a merge now also lists the ticket's components. Generated-by: Claude Code (Opus 4.8)
… ones Reconcile every PR-title tag that maps to a JIRA component, primary or not: e.g. [TEST] -> "Tests" and [SHUFFLE] -> "Shuffle" are now handled alongside primary tags like [SQL]. The full mapped set is compared against the ticket's components, and the overwrite/append/keep prompt acts on that set. This drops the earlier primary-only restriction, along with the now-unused Component.find_by_jira_name helper and the primary_only flag on jira_components_from_title_tags. Generated-by: Claude Code (Opus 4.8)
580f6e6 to
902e830
Compare
|
I used this script to merge #56450 |
cloud-fan
left a comment
There was a problem hiding this comment.
0 blocking, 0 non-blocking, 0 nits.
Clean, well-scoped committer-tooling change. The component-reconciliation step is consistent with existing merge-script conventions: it mirrors the choose_jira_assignee peer (prompt, update one JIRA field, non-fatal try/except) but only prompts on a mismatch and defaults to leaving JIRA untouched, so it is strictly less intrusive. title_components is threaded through every call site with a backward-compatible default, the new Components line in the summary printout has matching format placeholders, and the pure mapping helper is doctested (including the non-primary [TEST] -> "Tests" case). LGTM.
| """ | ||
| names = [] | ||
| for tag in tags: | ||
| c = Component.find(tag) |
There was a problem hiding this comment.
Hm .. PySpark has to be "PYTHON" actually. and we have PS that is pandas API on Spark
There was a problem hiding this comment.
Good catch -- [PYTHON] is the canonical tag. The registry already maps PYTHON -> "PySpark" (with PYSPARK as an alias) and keeps PS -> "Pandas API on Spark" as a separate component, so behavior was correct; only this doctest example used the non-preferred PYSPARK alias. Updated it to use PYTHON.
| return | ||
| if choice == "o": | ||
| new_names = list(title_jira_components) | ||
| else: # "a": append the PR title's components, keeping the existing ones first. |
There was a problem hiding this comment.
Nit: if "a" is really what we're looking for here, then it may be better to just use explicit elif choice == "a" to document intent more tightly.
There was a problem hiding this comment.
Done -- switched the catch-all else to an explicit elif choice == "a". Safe since get_input only ever returns "o", "a", or "k", and "k" returns early.
…licit append branch - Use the canonical PYTHON tag (not the PYSPARK alias) in the jira_components_from_title_tags doctest. The registry already maps PYTHON -> "PySpark" and keeps PS -> "Pandas API on Spark" separate; only the example used the non-preferred alias. - Make the append branch explicit (elif choice == "a") instead of a catch-all else, since get_input only ever returns "o", "a", or "k". Generated-by: Claude Code (Opus 4.8)
|
thanks all, merged into master |
What changes were proposed in this pull request?
This PR adds a component-reconciliation step to
dev/merge_spark_pr.py. When a PR is merged, the script already normalizes the component tags in the PR title (e.g.[SQL],[CORE],[TEST]). This change maps every title tag that corresponds to a JIRA component -- primary or not, e.g.[SQL]-> "SQL" and[TEST]-> "Tests" -- and compares that set against the components on the linked JIRA ticket. On a mismatch it prompts the committer to:Tags that do not correspond to a JIRA component (
[MINOR],[FOLLOWUP], version tags like[4.X], unknown tags) are ignored. The JIRA issue summary printed during a merge now also lists the ticket's components.Why are the changes needed?
The PR title and the JIRA ticket can drift out of sync on which components a change touches. Today the merge tool resolves the ticket without checking, so a committer has to notice and fix component mismatches by hand. Surfacing the difference at merge time, with a safe default of leaving JIRA untouched, makes it easy to keep the two consistent without forcing any change.
Does this PR introduce any user-facing change?
No.
dev/merge_spark_pr.pyis a committer-only tool.How was this patch tested?
The script runs its doctests on startup via
doctest.testmod(). A doctest covering the tag-to-JIRA-component mapping (including non-primary tags such as[TEST]-> "Tests") was added; the full suite passes. Formatting was verified withblack 26.3.1(the repo's pinned version) against the rootpyproject.toml.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Opus 4.8)