| permalink | /labs/lab-04-duplication/ |
|---|---|
| title | Lab 04: Duplication Detection |
| description | Detect code duplication across the codebase using jscpd and configure detection thresholds. |
| Duration | Level | Prerequisites |
|---|---|---|
| 30 min | Intermediate | Lab 03 |
- Run jscpd to detect code duplication across multiple languages
- Interpret duplication reports and understand clone types
- Configure jscpd thresholds using
.jscpd.json - Understand the relationship between duplication and maintainability
- Review SARIF output from jscpd
- Completed Lab 03: Complexity Analysis
- jscpd installed (
npm install -g jscpd)
Working Directory: Run the following commands from the
code-quality-scan-demo-apprepository root.
Run jscpd across all 5 demo apps:
jscpd cq-demo-app-001/src cq-demo-app-002/src cq-demo-app-003/src cq-demo-app-004/src cq-demo-app-005/jscpd scans for blocks of duplicated code (clones) across files and languages. The output shows:
- Source and target file locations
- Lines of duplicated code
- Percentage of duplication in the codebase
Generate a detailed HTML report for visual inspection:
jscpd cq-demo-app-001/src cq-demo-app-002/src cq-demo-app-003/src cq-demo-app-004/src cq-demo-app-005/ --reporters html --output jscpd-reportOpen the report:
# In a Codespace, use the built-in preview or port forwarding
# Locally, open the file directly:
Start-Process jscpd-report/html/index.htmlThe HTML report provides:
- A summary table showing duplication percentage per file
- Side-by-side comparison of duplicated blocks
- File-level and project-level duplication metrics
jscpd detects different types of code clones:
| Type | Name | Description | Example |
|---|---|---|---|
| Type 1 | Exact | Identical code blocks (whitespace/comments may differ) | Copy-pasted function |
| Type 2 | Renamed | Structurally identical but with renamed identifiers | Same logic with different variable names |
| Type 3 | Near-miss | Similar code with minor modifications | Same algorithm with a few extra lines |
Review a detected clone in detail:
jscpd cq-demo-app-001/src cq-demo-app-002/src --min-lines 5 --reporters consoleFulljscpd can be configured with a .jscpd.json file at the repository root. Examine the existing configuration:
Get-Content src/config/.jscpd.jsonA typical configuration looks like:
{
"threshold": 5,
"reporters": ["json", "consoleFull"],
"ignore": [
"**/node_modules/**",
"**/*.test.*",
"**/*_test.go",
"**/bin/**",
"**/obj/**",
"**/target/**"
],
"minLines": 10,
"minTokens": 50,
"output": "jscpd-report"
}Key configuration options:
| Option | Default | Description |
|---|---|---|
threshold |
0 | Maximum allowed duplication percentage (0 = no limit) |
minLines |
5 | Minimum lines for a block to be considered a clone |
minTokens |
50 | Minimum tokens for a block to be considered a clone |
ignore |
[] |
Glob patterns for files/directories to exclude |
reporters |
["consoleFull"] |
Output formats: console, consoleFull, json, html, sarif |
Run jscpd with SARIF output for integration with GitHub Security tab:
jscpd cq-demo-app-001/src cq-demo-app-002/src cq-demo-app-003/src cq-demo-app-004/src cq-demo-app-005/ --reporters sarif --output jscpd-reportExamine the SARIF output:
Get-Content jscpd-report/sarif/jscpd-report.sarif | ConvertFrom-Json | Select-Object -ExpandProperty runs | Select-Object -ExpandProperty results | Measure-ObjectEach SARIF result from jscpd includes:
ruleId:jscpd:duplicationlevel:warningmessage: Description of the duplicated block with source and target locationslocations: File paths and line ranges
Try different threshold values to see their effect:
Strict (catch small clones):
jscpd cq-demo-app-001/src --min-lines 5 --min-tokens 25Relaxed (only large blocks):
jscpd cq-demo-app-001/src --min-lines 20 --min-tokens 100Observe how the number of detected clones changes with different thresholds. The workshop default of 10 lines / 50 tokens provides a good balance between noise and detection coverage.
Verify your work before continuing:
- jscpd ran successfully across all 5 demo apps
- You generated an HTML report showing duplication locations
- You can explain the difference between Type 1, Type 2, and Type 3 clones
- You understand the
.jscpd.jsonconfiguration options - You generated SARIF output from jscpd
Code duplication is a maintainability risk — when duplicated code needs to change, every copy must be updated. jscpd detects duplicated blocks across files and languages, making it ideal for multi-language repositories. By configuring appropriate thresholds and generating SARIF output, duplication findings integrate with the same triage workflow as lint and complexity findings.
Remediation strategies for duplication:
- Extract shared logic into utility functions or modules
- Use base classes or traits for repeated patterns
- Apply the DRY (Don't Repeat Yourself) principle
Proceed to Lab 05: Coverage Analysis.



