Skip to content

workflow: check TiDB code PRs weekly to update docs#22801

Open
hfxsd wants to merge 426 commits intopingcap:masterfrom
hfxsd:Weekly-TiDB-PR-Doc-Check
Open

workflow: check TiDB code PRs weekly to update docs#22801
hfxsd wants to merge 426 commits intopingcap:masterfrom
hfxsd:Weekly-TiDB-PR-Doc-Check

Conversation

@hfxsd
Copy link
Copy Markdown
Collaborator

@hfxsd hfxsd commented Apr 24, 2026

First-time contributors' checklist

What is changed, added or deleted? (Required)

Which TiDB version(s) do your changes apply to? (Required)

Tips for choosing the affected version(s):

By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.

For details, see tips for choosing the affected versions.

  • master (the latest development version)
  • v9.0 (TiDB 9.0 versions)
  • v8.5 (TiDB 8.5 versions)
  • v8.1 (TiDB 8.1 versions)
  • v7.5 (TiDB 7.5 versions)
  • v7.1 (TiDB 7.1 versions)
  • v6.5 (TiDB 6.5 versions)
  • v6.1 (TiDB 6.1 versions)
  • v5.4 (TiDB 5.4 versions)

What is the related PR or file link(s)?

  • This PR is translated from:
  • Other reference link(s):

Do your changes match any of the following descriptions?

  • Delete files
  • Change aliases
  • Need modification after applied to another branch
  • Might cause conflicts after applied to another branch

hfxsd added 30 commits October 12, 2023 10:49
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 24, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign tangenta for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added the missing-translation-status This PR does not have translation status info. label Apr 24, 2026
@hfxsd hfxsd self-assigned this Apr 24, 2026
@ti-chi-bot ti-chi-bot Bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Apr 24, 2026
@hfxsd hfxsd added translation/no-need No need to translate this PR. and removed missing-translation-status This PR does not have translation status info. labels Apr 24, 2026
@hfxsd hfxsd changed the title workflow: weekly TiDB code check to update docs workflow: check TiDB code PRs weekly to update docs Apr 24, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a Python script to automate the weekly identification of merged TiDB PRs that may require documentation updates based on labels, keywords, and file paths. Feedback includes addressing a logic error in the time window calculation to avoid timezone mismatches and overlapping reports, using word boundaries for keyword matching to prevent false positives, removing redundant keywords, and implementing error handling for GitHub API requests.

Comment on lines +205 to +208
start_date = start_sh.date().isoformat()
end_date = end_sh.date().isoformat()

query = f"repo:{SOURCE_REPO} is:pr is:merged merged:{start_date}..{end_date}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current search query uses date strings (e.g., 2023-10-23..2023-10-30), which has two significant issues:

  1. Timezone Mismatch: GitHub interprets these dates in UTC. Since your window is calculated in Shanghai time (UTC+8), PRs merged between 00:00 and 08:00 Shanghai time on the start date will be missed (as their UTC date is the previous day).
  2. Overlapping Reports: The .. syntax in GitHub search is inclusive. A PR merged on the end_date will be included in this week's report and also in next week's report (where it will be the start_date).

To fix this, use full ISO 8601 timestamps with timezone offsets and ensure the range is exclusive of the end time.

References
  1. Technical accuracy and terminology consistency are important for documentation workflows. (link)

score -= 1
reasons.append(f"Only maintenance labels: {', '.join(hit_negative_labels)}")

kw_hits = sorted({kw for kw in POSITIVE_KEYWORDS if kw in text})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The substring matching kw in text can lead to false positives for short keywords. For example, "api" will match "capital" or "rapid", and "sql" will match "mysql". Consider using regular expressions with word boundaries (\b) to ensure keywords are matched as whole words only.

Comment on lines +45 to +60
POSITIVE_KEYWORDS = [
"compatibility",
"deprecate",
"deprecated",
"new feature",
"sql",
"syntax",
"default value",
"system variable",
"configuration",
"config",
"api",
"planner",
"optimizer",
"ddl",
]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Several keywords in this list are redundant because they are substrings of others (e.g., "config" matches "configuration", "deprecate" matches "deprecated"). Since the script currently uses substring matching, the shorter versions will always catch the longer ones. You can remove the redundant entries to keep the list clean and focused.

Comment on lines +84 to +85
with urllib.request.urlopen(req, timeout=30) as resp:
return json.loads(resp.read().decode("utf-8"))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

The urllib.request.urlopen call lacks error handling. If the GitHub API returns a non-200 status code (e.g., 403 due to rate limiting or 401 due to an invalid token), the script will raise an unhandled HTTPError. Consider adding a try-except block to provide a more informative error message or handle retries for transient issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. translation/no-need No need to translate this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant