Skip to content

Onboard issue dedupe workflow (ml-commons)#4804

Open
peterzhuamazon wants to merge 1 commit into
opensearch-project:mainfrom
peterzhuamazon:analyzer-reviewer-dedupe
Open

Onboard issue dedupe workflow (ml-commons)#4804
peterzhuamazon wants to merge 1 commit into
opensearch-project:mainfrom
peterzhuamazon:analyzer-reviewer-dedupe

Conversation

@peterzhuamazon
Copy link
Copy Markdown
Member

Description

Onboard issue dedupe workflow (ml-commons)

Related Issues

opensearch-project/opensearch-build#5912

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Peter Zhu <zhujiaxi@amazon.com>
@github-actions
Copy link
Copy Markdown

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 690bac5.

PathLineSeverityDescription
.github/workflows/issue-dedupe.yml22highExternal reusable workflow referenced at mutable '@main' ref instead of a pinned commit SHA. If the upstream repository 'opensearch-project/opensearch-build' is compromised or the branch is force-pushed, malicious code executes in this repository's CI context with write permissions to issues and an OIDC id-token.
.github/workflows/issue-dedupe.yml36highSecond external reusable workflow ('issue-dedupe-autoclose.yml') also referenced at mutable '@main'. This job runs on a daily schedule with 'issues: write' permission, providing a persistent, recurring attack surface if the upstream workflow is tampered with.
.github/workflows/issue-dedupe.yml27high'id-token: write' permission and the secret 'BEDROCK_ACCESS_ROLE_ISSUE_DEDUPE' (an AWS IAM role ARN for Bedrock) are forwarded to the external workflow at the mutable '@main' ref. A supply-chain compromise of the upstream workflow could exfiltrate the OIDC token or assume the AWS role to access cloud resources.

The table above displays the top 10 most important findings.

Total: 3 | Critical: 0 | High: 3 | Medium: 0 | Low: 0


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@peterzhuamazon
Copy link
Copy Markdown
Member Author

Expected, adding new issue dedupe workflows here.

@peterzhuamazon peterzhuamazon had a problem deploying to ml-commons-cicd-env-require-approval April 28, 2026 19:54 — with GitHub Actions Error
@peterzhuamazon peterzhuamazon had a problem deploying to ml-commons-cicd-env-require-approval April 28, 2026 19:54 — with GitHub Actions Error
@peterzhuamazon peterzhuamazon had a problem deploying to ml-commons-cicd-env-require-approval April 28, 2026 19:54 — with GitHub Actions Failure
@peterzhuamazon peterzhuamazon had a problem deploying to ml-commons-cicd-env-require-approval April 28, 2026 19:54 — with GitHub Actions Failure
@github-actions
Copy link
Copy Markdown

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 Security concerns

Sensitive information exposure:
The BEDROCK_ACCESS_ROLE_ISSUE_DEDUPE secret is passed to a reusable workflow pinned at @main in an external repository. If the upstream workflow at opensearch-project/opensearch-build is ever modified maliciously or accidentally, it could potentially misuse the secret. Pinning to a specific commit SHA would mitigate this risk.

✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Pinned Version

The reusable workflows are referenced at @main branch, meaning any breaking changes to the upstream workflows in opensearch-project/opensearch-build will immediately affect this workflow without any control. Consider pinning to a specific tag or commit SHA for stability and security.

  uses: opensearch-project/opensearch-build/.github/workflows/issue-dedupe-detect.yml@main
  permissions:
    contents: read
    issues: write
    id-token: write
  secrets:
    BEDROCK_ACCESS_ROLE_ISSUE_DEDUPE: ${{ secrets.BEDROCK_ACCESS_ROLE_ISSUE_DEDUPE }}
  with:
    issue_number: ${{ inputs.issue_number || '' }}
    grace_days: ${{ vars.DUPLICATE_GRACE_DAYS || '7' }}

auto-close-issue:
  if: github.event_name == 'schedule' && github.repository == 'opensearch-project/ml-commons'
  uses: opensearch-project/opensearch-build/.github/workflows/issue-dedupe-autoclose.yml@main
Missing Schedule Condition

The schedule trigger has no repository guard in the detect-issue job's if condition, but the auto-close-issue job does check github.repository. This is intentional since detect-issue skips schedule events, but it's worth verifying that the scheduled run only triggers auto-close-issue and not detect-issue unintentionally on forks.

  if: >-
    (github.event_name == 'workflow_dispatch' &&
     github.repository == 'opensearch-project/ml-commons') ||
    (github.event_name == 'issues' &&
     github.event.issue.user.type != 'Bot' &&
     github.repository == 'opensearch-project/ml-commons')
  uses: opensearch-project/opensearch-build/.github/workflows/issue-dedupe-detect.yml@main
  permissions:
    contents: read
    issues: write
    id-token: write
  secrets:
    BEDROCK_ACCESS_ROLE_ISSUE_DEDUPE: ${{ secrets.BEDROCK_ACCESS_ROLE_ISSUE_DEDUPE }}
  with:
    issue_number: ${{ inputs.issue_number || '' }}
    grace_days: ${{ vars.DUPLICATE_GRACE_DAYS || '7' }}

auto-close-issue:
  if: github.event_name == 'schedule' && github.repository == 'opensearch-project/ml-commons'

@github-actions
Copy link
Copy Markdown

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Pass event issue number as fallback value

When the issues event triggers the workflow (not workflow_dispatch),
inputs.issue_number will be empty and the fallback '' will be passed. The called
reusable workflow should handle an empty issue_number by inferring it from the event
context, but if it does not, this could silently cause incorrect behavior. Ensure
the called workflow handles an empty string gracefully, or pass
github.event.issue.number as the fallback instead of an empty string.

.github/workflows/issue-dedupe.yml [30-32]

 with:
-  issue_number: ${{ inputs.issue_number || '' }}
+  issue_number: ${{ inputs.issue_number || github.event.issue.number || '' }}
   grace_days: ${{ vars.DUPLICATE_GRACE_DAYS || '7' }}
Suggestion importance[1-10]: 6

__

Why: When triggered by the issues event, inputs.issue_number will be empty, and passing github.event.issue.number as a fallback ensures the correct issue is processed rather than relying on the called workflow to infer it from context.

Low
General
Clarify intentional condition asymmetry between triggers

The schedule event is not included in the detect-issue job condition, but it is also
not included in the auto-close-issue job's condition either — this is fine. However,
if a schedule event fires, neither job's if condition will match for detect-issue,
which is correct. But the workflow_dispatch branch of the condition does not filter
out Bot-created issues, while the issues branch does. Consider whether
workflow_dispatch should also have consistent filtering, or at minimum add a comment
explaining the intentional difference.

.github/workflows/issue-dedupe.yml [17-22]

 if: >-
   (github.event_name == 'workflow_dispatch' &&
    github.repository == 'opensearch-project/ml-commons') ||
   (github.event_name == 'issues' &&
    github.event.issue.user.type != 'Bot' &&
    github.repository == 'opensearch-project/ml-commons')
+# Note: workflow_dispatch intentionally allows manual triggering regardless of issue author type
Suggestion importance[1-10]: 2

__

Why: The suggestion asks to add a comment explaining intentional behavior, but the improved_code only adds a YAML comment which is not valid inside a multi-line if expression block. The suggestion has minimal impact and the improved_code is not accurately applicable.

Low

@peterzhuamazon peterzhuamazon had a problem deploying to ml-commons-cicd-env-require-approval April 29, 2026 21:51 — with GitHub Actions Failure
@peterzhuamazon peterzhuamazon had a problem deploying to ml-commons-cicd-env-require-approval April 29, 2026 21:51 — with GitHub Actions Error
@peterzhuamazon peterzhuamazon temporarily deployed to ml-commons-cicd-env-require-approval April 29, 2026 21:51 — with GitHub Actions Inactive
@peterzhuamazon peterzhuamazon temporarily deployed to ml-commons-cicd-env-require-approval April 29, 2026 21:51 — with GitHub Actions Inactive
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 29, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.42%. Comparing base (12f884e) to head (690bac5).

Additional details and impacted files
@@            Coverage Diff            @@
##               main    #4804   +/-   ##
=========================================
  Coverage     77.42%   77.42%           
  Complexity    11907    11907           
=========================================
  Files           963      963           
  Lines         53326    53326           
  Branches       6503     6503           
=========================================
+ Hits          41285    41287    +2     
+ Misses         9289     9288    -1     
+ Partials       2752     2751    -1     
Flag Coverage Δ
ml-commons 77.42% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@peterzhuamazon peterzhuamazon had a problem deploying to ml-commons-cicd-env-require-approval April 29, 2026 22:45 — with GitHub Actions Error
@peterzhuamazon peterzhuamazon had a problem deploying to ml-commons-cicd-env-require-approval April 29, 2026 22:45 — with GitHub Actions Failure
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request infra skip-diff-analyzer Maintainer to skip code-diff-analyzer check, after reviewing issues in AI analysis.

Projects

Status: 👀 In Review
Status: In review

Development

Successfully merging this pull request may close these issues.

2 participants