The V9 analyzer now includes intelligent file selection to optimize performance for large repositories. Instead of analyzing every file in a repository, the system can intelligently select up to 500 most relevant files based on PR context and security criticality.
Smart file selection automatically activates for:
- Large repositories: > 10,000 source files
- Enterprise codebases: > 50,000 lines of code
- Performance-critical analyses: When speed matters
For small/medium repositories (< 10,000 files AND < 50,000 LOC), the system performs full analysis by default.
The system uses a weighted algorithm to select files:
Priority Distribution (500 files max):
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
60% - PR Modified Files (300 files)
└─ Files actually changed in the pull request
20% - Security-Critical Paths (100 files)
└─ auth*, security*, crypto*, api*, handler*
10% - Entry Points (50 files)
└─ main.*, Application.*, index.*, server.*
5% - Configuration Files (25 files)
└─ pom.xml, package.json, Cargo.toml, go.mod
5% - Test Files (25 files)
└─ *test*, *spec*, *Test.java, *_test.go
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Each language has tailored selection patterns:
- Critical:
*Security*.java,*Auth*.java,*Controller*.java - Entry:
Application.java,Main.java,*SpringBoot*.java - Config:
pom.xml,build.gradle,application.properties
- Critical:
*auth*.rs,*crypto*.rs,*unsafe*.rs,*ffi*.rs - Entry:
main.rs,lib.rs,bin/*.rs - Config:
Cargo.toml,Cargo.lock
- Critical:
*auth*.js,*api*.js,*middleware*.js - Entry:
index.js,app.js,server.js - Config:
package.json,tsconfig.json
- Critical:
*auth*.py,*security*.py,*api*.py - Entry:
__main__.py,main.py,app.py - Config:
requirements.txt,pyproject.toml
# Force full repository analysis (disable smart selection)
export CODEQUAL_FORCE_FULL_ANALYSIS=true
# Custom file limit (default: 500)
export CODEQUAL_MAX_FILES=1000
# Run analysis with custom settings
npx ts-node analyze-pr.tsimport { V9JavaAnalyzer } from '@codequal/agents';
const analyzer = new V9JavaAnalyzer();
// Override configuration
analyzer.analysisConfig = {
useSmartSelection: true, // Enable smart selection
maxFiles: 750, // Increase file limit
forceFullAnalysis: false // Don't force full analysis
};
await analyzer.analyzePR(repoUrl, prNumber);Best for:
- Large enterprise repositories (10,000+ files)
- Quick PR validation
- CI/CD pipelines with time constraints
- Cost-conscious analysis
Benefits:
- ⚡ 5-10x faster analysis
- 💰 Lower computational costs
- 🎯 Focused on relevant changes
- 📊 Same blocking logic applies
Best for:
- Security audits
- Compliance reviews
- Release candidates
- Small repositories (< 1,000 files)
Enable with:
export CODEQUAL_FORCE_FULL_ANALYSIS=true| Repository Size | Full Analysis | Smart Selection | Speed Improvement |
|---|---|---|---|
| Small (< 1K files) | 30 seconds | N/A (uses full) | - |
| Medium (1-10K) | 2-5 minutes | 30-60 seconds | 3-5x |
| Large (10-50K) | 10-30 minutes | 1-3 minutes | 8-10x |
| Enterprise (50K+) | 30-60 minutes | 2-5 minutes | 10-15x |
- Tools run on all files (current behavior)
- Issues are filtered to only selected files
- Blocking logic applies only to issues in selected files
- Modified file tracking ensures critical issues in PR files always block
- PR modified files are ALWAYS analyzed (highest priority)
- Security-critical files are prioritized even if not modified
- Blocking logic remains the same (critical/high in modified files)
- Score calculation only includes issues from selected files
The analyzer logs selection details:
📊 Large repository detected (15,234 files) - using smart file selection
📁 Smart selection: 500 files selected for analysis
- PR changes: 12
- Critical files: 89
- Entry points: 45
- Configuration: 8
✅ Analysis complete: 234 issues in main, 187 issues in PR
- Tool-specific file lists - Pass selected files directly to tools
- Dynamic threshold - Adjust file count based on available resources
- ML-based selection - Learn which files typically have issues
- Incremental analysis - Only analyze changed methods/functions
- Distributed analysis - Parallel processing across multiple pods
Future versions will include a web UI for configuration:
- Visual file selection preview
- Custom pattern configuration
- Performance metrics dashboard
- Selection effectiveness analytics
Check:
- Repository has > 10,000 files OR > 50,000 LOC
CODEQUAL_FORCE_FULL_ANALYSISis not set totrue- No errors in file counting
If important issues are missed:
- Increase
CODEQUAL_MAX_FILESto 750 or 1000 - Add custom patterns to critical file selection
- Use full analysis for security audits
Consider:
- Reducing file limit to 250 for faster analysis
- Using cloud execution for large repos
- Implementing caching for repeat analyses
class SmartFileSelector {
async selectFiles(config: FileSelectionConfig): Promise<SelectedFiles>
}
interface FileSelectionConfig {
repository: string;
prNumber: number;
baseBranch: string;
prBranch: string;
language: string;
maxFiles?: number;
repoPath: string;
}
interface SelectedFiles {
prChangedFiles: string[];
criticalFiles: string[];
entryPoints: string[];
configFiles: string[];
testFiles: string[];
totalSelected: number;
selectionReason: string;
}# Repository: 25,000 files
# PR changes: 8 files
# With smart selection (default)
npm run analyze
# Result: Analyzes 500 files in 2 minutes
# With full analysis
CODEQUAL_FORCE_FULL_ANALYSIS=true npm run analyze
# Result: Analyzes 25,000 files in 45 minutes# Force full analysis for complete security review
export CODEQUAL_FORCE_FULL_ANALYSIS=true
export CODEQUAL_MAX_FILES=999999
npm run analyze --security-audit# Use minimal file set for fastest results
export CODEQUAL_MAX_FILES=250
npm run analyze --quickNote: Smart file selection is designed to maintain analysis quality while significantly improving performance for large repositories. The system ensures that all PR-modified files and security-critical paths are always analyzed.