-
Notifications
You must be signed in to change notification settings - Fork 277
183 lines (150 loc) Β· 6.29 KB
/
duplicate-detection.yml
File metadata and controls
183 lines (150 loc) Β· 6.29 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
name: π Duplicate Detection
on:
pull_request_target:
branches: [ main ]
types: [opened, synchronize, reopened]
permissions:
contents: read
pull-requests: write
issues: write
jobs:
check-duplicates:
name: π Check for Duplicate Files
runs-on: ubuntu-latest
steps:
- name: π₯ Checkout base branch
uses: actions/checkout@v4
with:
ref: ${{ github.base_ref }}
- name: π₯ Checkout PR branch
uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.head.sha }}
path: pr-code
- name: π Get changed files
id: changed-files
uses: tj-actions/changed-files@v40
with:
path: pr-code
files: |
C/**/*.c
CPP/**/*.cpp
Java/**/*.java
Python/**/*.py
JavaScript/**/*.js
Go/**/*.go
Rust/**/*.rs
- name: π Detect duplicate implementations
if: steps.changed-files.outputs.any_changed == 'true'
id: duplicate-check
run: |
echo "Checking for potential duplicates..."
duplicates=""
warnings=""
for new_file in ${{ steps.changed-files.outputs.all_changed_files }}; do
# Remove pr-code/ prefix
file_path="${new_file#pr-code/}"
filename=$(basename "$file_path")
base_name="${filename%.*}"
# Normalize filename (remove underscores, hyphens, convert to lowercase)
normalized=$(echo "$base_name" | tr '_-' ' ' | tr '[:upper:]' '[:lower:]')
echo "Checking: $file_path (normalized: $normalized)"
# Get directory and language
lang_dir=$(echo "$file_path" | cut -d'/' -f1)
category=$(echo "$file_path" | cut -d'/' -f2)
# Search for similar files in the same category across the main branch
echo "Searching in $lang_dir/$category/ for similar implementations..."
# Check if directory exists in base branch
if [ ! -d "$lang_dir/$category" ]; then
echo "Directory $lang_dir/$category doesn't exist in base branch - this is a new contribution!"
continue
fi
# Find files with similar names
similar_files=$(find "$lang_dir/$category" -type f 2>/dev/null | while read existing_file; do
existing_name=$(basename "$existing_file")
existing_base="${existing_name%.*}"
existing_normalized=$(echo "$existing_base" | tr '_-' ' ' | tr '[:upper:]' '[:lower:]')
# Check if normalized names match
if [ "$normalized" = "$existing_normalized" ]; then
echo "$existing_file"
fi
done)
if [ -n "$similar_files" ]; then
duplicates="${duplicates}**β οΈ Potential Duplicate:** \`$file_path\`\n"
duplicates="${duplicates}Similar file(s) already exist:\n"
while IFS= read -r similar; do
duplicates="${duplicates}- \`$similar\`\n"
done <<< "$similar_files"
duplicates="${duplicates}\n"
fi
done
# Save results
if [ -n "$duplicates" ]; then
echo "found=true" >> $GITHUB_OUTPUT
echo "duplicates<<EOF" >> $GITHUB_OUTPUT
echo -e "$duplicates" >> $GITHUB_OUTPUT
echo "EOF" >> $GITHUB_OUTPUT
else
echo "found=false" >> $GITHUB_OUTPUT
fi
- name: π¬ Comment on PR if duplicates found
if: steps.duplicate-check.outputs.found == 'true'
uses: actions/github-script@v7
env:
DUPLICATES_REPORT: ${{ steps.duplicate-check.outputs.duplicates }}
with:
script: |
const duplicates = process.env.DUPLICATES_REPORT;
// Check if we already commented about duplicates
const comments = await github.rest.issues.listComments({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: context.issue.number
});
const botComment = comments.data.find(comment =>
comment.user.type === 'Bot' && comment.body.includes('π Duplicate Detection Results')
);
const comment = `## π Duplicate Detection Results
### β οΈ Potential Duplicates Found
${ duplicates }
### π What This Means
We found existing implementations that appear similar to your contribution. This doesn't necessarily mean your PR will be rejected, but please review:
1. **Is this truly a duplicate?** Check the existing files to see if they implement the same algorithm
2. **Is your implementation different/better?** If so, explain in your PR description:
- What makes it different
- Why it's an improvement
- Any unique features or optimizations
3. **Consider improving existing code** instead of adding a duplicate
### β
What To Do Next
- **If it's a duplicate:** Consider withdrawing this PR and improving the existing implementation
- **If it's different:** Add a clear explanation in your PR description about how it differs
- **If unsure:** Ask the maintainers for guidance!
### π‘ Quality Over Quantity
Remember: One high-quality, unique contribution is worth more than multiple duplicates! π
---
*This is an automated check. Maintainers will make the final decision.*`;
// Only post if we haven't already commented
if (!botComment) {
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: context.issue.number,
body: comment
});
} else {
// Update existing comment
await github.rest.issues.updateComment({
owner: context.repo.owner,
repo: context.repo.repo,
comment_id: botComment.id,
body: comment
});
}
- name: β
Summary
run: |
if [ "${{ steps.duplicate-check.outputs.found }}" = "true" ]; then
echo "β οΈ Potential duplicates detected - please review"
echo "This is a warning, not a failure. Maintainers will review."
else
echo "β
No duplicates detected - great job!"
fi