|
| 1 | +--- |
| 2 | +description: | |
| 3 | + Daily workflow that scans the codebase for duplicate and near-duplicate code blocks, |
| 4 | + copy-paste patterns, and repeated logic sequences in TypeScript source and JavaScript |
| 5 | + container code. Files actionable issues for high-impact deduplication opportunities |
| 6 | + to prevent technical debt from accumulating silently. |
| 7 | +
|
| 8 | +on: |
| 9 | + schedule: daily |
| 10 | + workflow_dispatch: |
| 11 | + |
| 12 | +permissions: |
| 13 | + contents: read |
| 14 | + issues: read |
| 15 | + |
| 16 | +sandbox: |
| 17 | + agent: |
| 18 | + version: v0.25.29 |
| 19 | +network: |
| 20 | + allowed: |
| 21 | + - node |
| 22 | + - github |
| 23 | + |
| 24 | +tools: |
| 25 | + github: |
| 26 | + toolsets: [issues] |
| 27 | + bash: true |
| 28 | + |
| 29 | +safe-outputs: |
| 30 | + threat-detection: |
| 31 | + enabled: false |
| 32 | + create-issue: |
| 33 | + title-prefix: "[Duplicate Code] " |
| 34 | + labels: [code-quality, refactoring] |
| 35 | + max: 5 |
| 36 | + expires: 30d |
| 37 | + |
| 38 | +timeout-minutes: 20 |
| 39 | +--- |
| 40 | + |
| 41 | +# Duplicate Code Detector |
| 42 | + |
| 43 | +You are a code quality engineer analyzing the `${{ github.repository }}` codebase for duplicated and near-duplicate code. Your mission is to surface high-impact deduplication opportunities that will reduce maintenance burden and improve consistency. |
| 44 | + |
| 45 | +## Repository Context |
| 46 | + |
| 47 | +This is **gh-aw-firewall**, a network firewall for GitHub Copilot CLI. The most important source files for duplication analysis are: |
| 48 | + |
| 49 | +- `src/docker-manager.ts` — 3,900+ lines; container lifecycle, env-var construction, volume mounts |
| 50 | +- `src/cli.ts` — 1,700+ lines; argument parsing, orchestration, config merging |
| 51 | +- `containers/api-proxy/server.js` — provider-agnostic proxy server |
| 52 | +- `containers/api-proxy/providers/*.js` — per-provider adapter modules |
| 53 | + |
| 54 | +## Phase 1: Gather Codebase Metrics |
| 55 | + |
| 56 | +Run these commands to understand the scope before diving into duplication: |
| 57 | + |
| 58 | +```bash |
| 59 | +# File sizes and line counts |
| 60 | +wc -l src/*.ts src/**/*.ts containers/api-proxy/*.js containers/api-proxy/providers/*.js 2>/dev/null | sort -rn | head -30 |
| 61 | + |
| 62 | +# Total files and lines |
| 63 | +echo "=== TypeScript source ===" |
| 64 | +find src -name "*.ts" ! -name "*.test.ts" | xargs wc -l 2>/dev/null | sort -rn | head -20 |
| 65 | +echo "=== Container JS ===" |
| 66 | +find containers -name "*.js" | xargs wc -l 2>/dev/null | sort -rn | head -20 |
| 67 | +``` |
| 68 | + |
| 69 | +## Phase 2: Detect Structural Duplication |
| 70 | + |
| 71 | +Install and run the `jscpd` (JavaScript Copy/Paste Detector) tool to find literal code duplication: |
| 72 | + |
| 73 | +```bash |
| 74 | +# Install jscpd |
| 75 | +npm install -g jscpd 2>&1 | tail -3 |
| 76 | + |
| 77 | +# Run duplicate detection on TypeScript source |
| 78 | +jscpd src --min-lines 10 --min-tokens 50 --reporters json --output /tmp/jscpd-src 2>&1 | tail -20 |
| 79 | + |
| 80 | +# Run on container JS |
| 81 | +jscpd containers --min-lines 10 --min-tokens 50 --reporters json --output /tmp/jscpd-containers 2>&1 | tail -20 |
| 82 | + |
| 83 | +# Show summary |
| 84 | +cat /tmp/jscpd-src/jscpd-report.json 2>/dev/null | node -e " |
| 85 | + const d = JSON.parse(require('fs').readFileSync('/dev/stdin', 'utf8')); |
| 86 | + const clones = d.duplicates || []; |
| 87 | + console.log('Total duplicates found:', clones.length); |
| 88 | + clones.slice(0, 10).forEach(c => { |
| 89 | + const f1 = c.firstFile?.name?.replace(process.cwd() + '/', '') || 'unknown'; |
| 90 | + const f2 = c.secondFile?.name?.replace(process.cwd() + '/', '') || 'unknown'; |
| 91 | + console.log(\` \${f1}:\${c.firstFile?.start}-\${c.firstFile?.end} <-> \${f2}:\${c.secondFile?.start}-\${c.secondFile?.end} (\${c.fragment?.split('\\n').length || 0} lines)\`); |
| 92 | + }); |
| 93 | +" || echo "(jscpd report not available)" |
| 94 | +``` |
| 95 | + |
| 96 | +## Phase 3: Detect Pattern-Level Duplication |
| 97 | + |
| 98 | +Use grep to find repeated code patterns that jscpd may not catch (semantic duplication): |
| 99 | + |
| 100 | +```bash |
| 101 | +echo "=== Env-var reading/trimming patterns ===" |
| 102 | +grep -rn "process\.env\." src/ --include="*.ts" | grep -v "test" | head -40 |
| 103 | + |
| 104 | +echo "=== Docker exec/run command construction patterns ===" |
| 105 | +grep -n "execa\|execaSync\|docker.*run\|docker.*exec" src/docker-manager.ts | head -30 |
| 106 | + |
| 107 | +echo "=== Config/validation patterns in config-file.ts and schema-validator.ts ===" |
| 108 | +grep -n "throw\|error\|invalid\|validate" src/config-file.ts | head -20 |
| 109 | +grep -n "throw\|error\|invalid\|validate" src/schema-validator.ts 2>/dev/null | head -20 |
| 110 | + |
| 111 | +echo "=== Repeated try/catch error handling patterns ===" |
| 112 | +grep -n -A 3 "catch (e" src/docker-manager.ts | head -60 |
| 113 | + |
| 114 | +echo "=== Provider adapter patterns in api-proxy ===" |
| 115 | +for f in containers/api-proxy/providers/*.js; do |
| 116 | + echo "--- $f ---" |
| 117 | + grep -n "function\|const.*=.*(" "$f" | head -10 |
| 118 | +done |
| 119 | + |
| 120 | +echo "=== Repeated log construction patterns ===" |
| 121 | +grep -rn "logger\.\(debug\|info\|warn\|error\)" src/ --include="*.ts" | \ |
| 122 | + sed 's/.*logger\.\(debug\|info\|warn\|error\)(\(.*\))/\2/' | \ |
| 123 | + sort | uniq -d | head -20 |
| 124 | +``` |
| 125 | + |
| 126 | +## Phase 4: Analyze Specific Known Duplication Areas |
| 127 | + |
| 128 | +Based on codebase knowledge, deeply analyze the most likely duplication hotspots: |
| 129 | + |
| 130 | +```bash |
| 131 | +echo "=== docker-manager.ts: env-var construction ===" |
| 132 | +grep -n "env\[.*\]\s*=\|envVars\.\|\.trim()\|process\.env\." src/docker-manager.ts | head -50 |
| 133 | + |
| 134 | +echo "=== docker-manager.ts: repeated docker compose args patterns ===" |
| 135 | +grep -n "composeArgs\|dockerArgs\|\-f.*compose\|--project-name" src/docker-manager.ts | head -30 |
| 136 | + |
| 137 | +echo "=== cli.ts: option handling patterns ===" |
| 138 | +grep -n "\.option\|options\.\|program\." src/cli.ts | head -50 |
| 139 | + |
| 140 | +echo "=== API proxy provider similarity (getConfig patterns) ===" |
| 141 | +for f in containers/api-proxy/providers/openai.js containers/api-proxy/providers/anthropic.js containers/api-proxy/providers/gemini.js containers/api-proxy/providers/copilot.js containers/api-proxy/providers/opencode.js; do |
| 142 | + if [ -f "$f" ]; then |
| 143 | + echo "--- $f: exported functions ---" |
| 144 | + grep -n "^function\|^const.*=\s*function\|^module\.exports\|^exports\." "$f" | head -10 |
| 145 | + fi |
| 146 | +done |
| 147 | + |
| 148 | +echo "=== proxy-utils.js: shared utilities ===" |
| 149 | +cat containers/api-proxy/proxy-utils.js 2>/dev/null | head -60 |
| 150 | +``` |
| 151 | + |
| 152 | +## Phase 5: Check for Existing Issues |
| 153 | + |
| 154 | +Before filing new issues, check what's already been reported: |
| 155 | + |
| 156 | +1. Search for open issues with `[Duplicate Code]` prefix using the GitHub toolset |
| 157 | +2. Also search for issues with labels `code-quality` or `refactoring` that describe duplication |
| 158 | +3. Skip any finding that already has an open tracking issue |
| 159 | + |
| 160 | +## Phase 6: Prioritize and Report Findings |
| 161 | + |
| 162 | +Based on your analysis, identify the **top duplications by impact** using this scoring: |
| 163 | + |
| 164 | +| Factor | Points | |
| 165 | +|--------|--------| |
| 166 | +| >20 duplicate lines | +3 | |
| 167 | +| Affects security-critical path | +3 | |
| 168 | +| In file >1000 lines (maintenance burden) | +2 | |
| 169 | +| More than 2 copies | +2 | |
| 170 | +| Easy to extract (no complex dependencies) | +1 | |
| 171 | + |
| 172 | +Report only findings with score ≥ 4. |
| 173 | + |
| 174 | +### For each high-impact finding, create an issue with this format: |
| 175 | + |
| 176 | +**Title**: `[Duplicate Code] <brief description of what is duplicated>` |
| 177 | + |
| 178 | +**Body**: |
| 179 | +```markdown |
| 180 | +## Duplicate Code Opportunity |
| 181 | + |
| 182 | +### Summary |
| 183 | +- **Pattern**: Brief description of what is being duplicated |
| 184 | +- **Locations**: File(s) and line ranges containing duplicates |
| 185 | +- **Impact**: Lines saved / maintenance burden reduction |
| 186 | + |
| 187 | +### Evidence |
| 188 | + |
| 189 | +<Show the specific duplicated code blocks side by side> |
| 190 | + |
| 191 | +### Suggested Refactoring |
| 192 | + |
| 193 | +Describe the shared utility or abstraction that would eliminate the duplication. |
| 194 | +For example: |
| 195 | +- Extract a `parseEnvVars(obj)` helper in `src/env-utils.ts` |
| 196 | +- Create a base class or mixin for provider adapters |
| 197 | +- Add a `buildDockerArgs(config)` factory function |
| 198 | + |
| 199 | +### Affected Files |
| 200 | +- `path/to/file.ts` — lines X-Y |
| 201 | +- `path/to/other.ts` — lines A-B |
| 202 | + |
| 203 | +### Effort Estimate |
| 204 | +Low / Medium / High |
| 205 | + |
| 206 | +--- |
| 207 | +*Detected by Duplicate Code Detector workflow. Run date: $(date -u +"%Y-%m-%d")* |
| 208 | +``` |
| 209 | + |
| 210 | +## Guidelines |
| 211 | + |
| 212 | +- **Be specific**: Always include file paths and line numbers in the evidence section |
| 213 | +- **Be actionable**: Each issue should have a clear, implementable suggestion |
| 214 | +- **Avoid noise**: Only file issues for genuine duplication with real maintenance impact — not cosmetic similarities |
| 215 | +- **No duplicates**: Check existing open issues before creating new ones |
| 216 | +- **Security awareness**: Flag duplicated security-critical logic (domain validation, ACL rules, capability management) with higher urgency |
| 217 | +- **Cap at 5 issues**: File at most 5 issues per run to avoid flooding the tracker |
| 218 | + |
| 219 | +## Edge Cases |
| 220 | + |
| 221 | +- **No significant duplication found**: Exit gracefully without creating issues; print a summary to the log |
| 222 | +- **jscpd unavailable**: Fall back to grep-based pattern analysis only |
| 223 | +- **All findings already tracked**: Skip creation and log that existing issues cover the findings |
0 commit comments