workflow: check TiDB code PRs weekly to update docs#22801
workflow: check TiDB code PRs weekly to update docs#22801hfxsd wants to merge 426 commits intopingcap:masterfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Code Review
This pull request introduces a Python script to automate the weekly identification of merged TiDB PRs that may require documentation updates based on labels, keywords, and file paths. Feedback includes addressing a logic error in the time window calculation to avoid timezone mismatches and overlapping reports, using word boundaries for keyword matching to prevent false positives, removing redundant keywords, and implementing error handling for GitHub API requests.
| start_date = start_sh.date().isoformat() | ||
| end_date = end_sh.date().isoformat() | ||
|
|
||
| query = f"repo:{SOURCE_REPO} is:pr is:merged merged:{start_date}..{end_date}" |
There was a problem hiding this comment.
The current search query uses date strings (e.g., 2023-10-23..2023-10-30), which has two significant issues:
- Timezone Mismatch: GitHub interprets these dates in UTC. Since your window is calculated in Shanghai time (UTC+8), PRs merged between 00:00 and 08:00 Shanghai time on the start date will be missed (as their UTC date is the previous day).
- Overlapping Reports: The
..syntax in GitHub search is inclusive. A PR merged on theend_datewill be included in this week's report and also in next week's report (where it will be thestart_date).
To fix this, use full ISO 8601 timestamps with timezone offsets and ensure the range is exclusive of the end time.
References
- Technical accuracy and terminology consistency are important for documentation workflows. (link)
| score -= 1 | ||
| reasons.append(f"Only maintenance labels: {', '.join(hit_negative_labels)}") | ||
|
|
||
| kw_hits = sorted({kw for kw in POSITIVE_KEYWORDS if kw in text}) |
There was a problem hiding this comment.
| POSITIVE_KEYWORDS = [ | ||
| "compatibility", | ||
| "deprecate", | ||
| "deprecated", | ||
| "new feature", | ||
| "sql", | ||
| "syntax", | ||
| "default value", | ||
| "system variable", | ||
| "configuration", | ||
| "config", | ||
| "api", | ||
| "planner", | ||
| "optimizer", | ||
| "ddl", | ||
| ] |
There was a problem hiding this comment.
Several keywords in this list are redundant because they are substrings of others (e.g., "config" matches "configuration", "deprecate" matches "deprecated"). Since the script currently uses substring matching, the shorter versions will always catch the longer ones. You can remove the redundant entries to keep the list clean and focused.
| with urllib.request.urlopen(req, timeout=30) as resp: | ||
| return json.loads(resp.read().decode("utf-8")) |
There was a problem hiding this comment.
The urllib.request.urlopen call lacks error handling. If the GitHub API returns a non-200 status code (e.g., 403 due to rate limiting or 401 due to an invalid token), the script will raise an unhandled HTTPError. Consider adding a try-except block to provide a more informative error message or handle retries for transient issues.
First-time contributors' checklist
What is changed, added or deleted? (Required)
Which TiDB version(s) do your changes apply to? (Required)
Tips for choosing the affected version(s):
By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.
For details, see tips for choosing the affected versions.
What is the related PR or file link(s)?
Do your changes match any of the following descriptions?