From periodic pentests to autonomous, every-deploy security validation.
The industry gap is structural: engineering teams push thousands of lines daily, but security validation happens periodically — maybe twice a year for external pentests, or continuously-but-incompletely for in-house teams making hard triage choices about what to cover. Every change that ships untested is a version of the application that was never fully validated.
Argus already addresses much of this with its 6-phase pipeline, AI enrichment, and proof-by-exploitation. But there are specific capabilities that would close the remaining gaps between "scan on schedule" and "secure every release."
This guide maps what Argus has today, what's missing, and concrete implementation paths for each gap.
| Capability | Industry Goal | Argus Today | Gap |
|---|---|---|---|
| Diff-aware scoping | Only test what changed; skip README edits, focus on auth logic | FileSelector.get_changed_files() in fast mode; Phase 1 scanners scan full project |
Scanners not diff-scoped |
| Per-deploy trigger | Every release triggers security validation | CI workflows on push/PR + weekly cron | No deployment-event integration |
| Multi-step attack reasoning | Agents chain cross-component vulnerabilities | VulnerabilityChainer with 14 rule-based chain patterns |
Rule-based, not agent-driven |
| Live exploitation | Validate against running deployment | SandboxValidator + ProofByExploitation in Docker |
Sandbox only, not against live targets |
| AutoFix → PR | Generate merge-ready PRs with code fixes | RemediationEngine generates diffs/text, no PR creation |
No automated PR creation for fixes |
| Retest after fix | Automatically verify fix holds | RegressionTester as separate CI step |
Not a closed loop within a single run |
| Persistent knowledge base | Each scan enriches cross-run intelligence | Flat-file JSONL feedback + per-scan JSON outputs | No cross-scan dedup, trending, or historical context |
| Code-to-runtime context | Source + API specs + cloud config + architecture | SAST + DAST exist separately; sast_dast_correlator.py bridges |
No unified context model |
scripts/orchestrator/file_selector.py has get_changed_files() using git diff --name-only HEAD^ HEAD. When only_changed=True, the AI file-selection layer filters to changed files and boosts their priority by +200 points.
Phase 1 scanners (Semgrep, Trivy, Checkov, TruffleHog) always scan the full project path. For a 500-file repo where 3 files changed, all 4 scanners still analyze everything.
a) Semgrep diff scoping
Semgrep natively supports --include patterns. Pass changed file paths:
# In scanner_runners.py, SemgrepRunner.run()
if self.only_changed and self.changed_files:
for f in self.changed_files:
cmd.extend(["--include", f])b) Impact radius expansion
A single-line change to an auth middleware affects every protected route. Diff-scoping shouldn't be file-literal — it should expand to the blast radius:
class DiffImpactAnalyzer:
"""Expand changed files to their security-relevant impact radius."""
def expand_impact(self, changed_files: list[str], project_path: str) -> list[str]:
"""Given changed files, return the full set of files in the blast radius.
- If a middleware/decorator changed, include all files that import it
- If a model changed, include all routes that use that model
- If an auth module changed, include all protected endpoints
"""
expanded = set(changed_files)
for f in changed_files:
if self._is_security_critical(f):
importers = self._find_importers(f, project_path)
expanded.update(importers)
return list(expanded)
def _is_security_critical(self, filepath: str) -> bool:
"""Check if file is auth, middleware, permissions, crypto, etc."""
security_indicators = [
'auth', 'permission', 'middleware', 'security',
'crypto', 'session', 'token', 'oauth', 'rbac',
'acl', 'policy', 'guard', 'interceptor'
]
name = os.path.basename(filepath).lower()
return any(ind in name for ind in security_indicators)c) Smart skip logic
The Aikido article highlights: "Updated a README and button color? Skipped." Argus should classify diffs by security relevance before deciding whether to scan at all:
class DiffClassifier:
"""Classify a diff as security-relevant or skip-safe."""
SKIP_PATTERNS = [
r'\.md$', r'\.txt$', r'\.css$', r'\.scss$',
r'\.svg$', r'\.png$', r'\.jpg$',
r'CHANGELOG', r'LICENSE', r'\.gitignore',
]
ALWAYS_SCAN_PATTERNS = [
r'auth', r'login', r'session', r'token', r'password',
r'secret', r'key', r'crypt', r'permission', r'rbac',
r'middleware', r'guard', r'policy', r'\.env',
r'docker', r'Dockerfile', r'\.tf$', r'\.yml$', r'\.yaml$',
]
def classify(self, changed_files: list[str]) -> dict:
security_relevant = []
skippable = []
for f in changed_files:
if any(re.search(p, f, re.I) for p in self.ALWAYS_SCAN_PATTERNS):
security_relevant.append(f)
elif any(re.search(p, f, re.I) for p in self.SKIP_PATTERNS):
skippable.append(f)
else:
security_relevant.append(f) # Default: scan
return {
'security_relevant': security_relevant,
'skippable': skippable,
'should_scan': len(security_relevant) > 0
}scripts/hybrid_analyzer.pyPhase 1 entry, before scanner invocationscripts/scanner_runners.pyin each scanner'srun()method- New module:
scripts/diff_impact_analyzer.py - Config toggle:
enable_diff_scoping=True,diff_expand_impact_radius=True
CI workflows trigger on push, PR, and weekly cron. The action.yml GitHub Action is the primary integration point.
No integration with deployment events. Security validation happens pre-merge, not post-deploy. The running application is never validated against what actually shipped.
a) GitHub Deployment event webhook
# .github/workflows/post-deploy-scan.yml
name: Post-Deploy Security Validation
on:
deployment_status:
types: [success]
jobs:
scan:
if: github.event.deployment_status.state == 'success'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Get deployment diff
run: |
PREV_SHA=$(git log --format='%H' -2 | tail -1)
echo "DIFF_BASE=$PREV_SHA" >> $GITHUB_ENV
- uses: devatsecure/Argus-Security@v1
with:
anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
review-type: security
only-changed: true
fail-on-blockers: true
# If DAST target is available, also run live validation
- name: DAST against deployment
if: vars.DEPLOYMENT_URL != ''
run: |
python scripts/dast_orchestrator.py \
--target-url "${{ vars.DEPLOYMENT_URL }}" \
--auth-config .argus/dast-auth.yml \
--diff-onlyb) Container registry webhook
For Docker-based deployments, trigger on image push:
on:
registry_package:
types: [published]c) ArgoCD / Flux / Spinnaker integration
Expose a webhook endpoint (or use the MCP server) that deployment tools call post-deploy:
# In scripts/mcp_server.py, add tool:
@mcp_server.tool("trigger_post_deploy_scan")
async def trigger_post_deploy_scan(
deployment_url: str,
commit_sha: str,
previous_sha: str,
environment: str = "staging",
) -> dict:
"""Trigger a diff-scoped security scan after deployment."""
changed_files = get_diff_files(previous_sha, commit_sha)
results = await run_pipeline(
target_path=repo_path,
changed_files=changed_files,
dast_target=deployment_url,
scan_mode="post-deploy",
)
return results- New workflow:
.github/workflows/post-deploy-scan.yml scripts/mcp_server.pynew tool registration- Config:
scan_trigger=["push", "pr", "deploy", "schedule"]
scripts/vulnerability_chaining_engine.py implements VulnerabilityChainer with 14 pre-defined rule-based chaining patterns (IDOR → Privilege Escalation → Data Breach, XSS → Session Hijacking → Account Takeover, etc.). Uses NetworkX for graph traversal with probability-weighted edges.
Phase 3's agent_personas.py runs 5 specialized AI personas for multi-agent review, but these are independent reviewers — they don't collaborate on attack path discovery.
The chaining is rule-based: if finding A is category X and finding B is category Y, draw an edge. It doesn't reason about application-specific context — whether a particular XSS is actually in a position to steal a session token, or whether a specific SSRF can reach an internal metadata endpoint.
The Aikido article describes agents that "reason about application behavior, chain multi-step attack paths, and validate exploitability through real exploitation." That's agent-driven reasoning, not pattern matching.
a) LLM-powered chain discovery
Use the existing LLM infrastructure to ask the model to reason about cross-finding relationships:
class AgentChainDiscovery:
"""Use LLM agents to discover attack chains that rule-based logic misses."""
CHAIN_DISCOVERY_PROMPT = """You are a senior penetration tester analyzing a set of
security findings from the same application. Your job is to identify multi-step
attack paths that chain these findings together.
For each chain you discover, explain:
1. The entry point (which finding starts the chain)
2. Each step and what it enables
3. The final impact (what an attacker achieves)
4. Why these specific findings combine dangerously
5. Whether the chain requires authentication or can be triggered anonymously
Findings:
{findings_json}
Application context:
- Framework: {framework}
- Auth mechanism: {auth_type}
- Architecture: {architecture}
Return your analysis as JSON array of chain objects."""
async def discover_chains(
self,
findings: list[dict],
app_context: dict,
llm_client,
) -> list[dict]:
prompt = self.CHAIN_DISCOVERY_PROMPT.format(
findings_json=json.dumps(findings, indent=2),
framework=app_context.get('framework', 'unknown'),
auth_type=app_context.get('auth_type', 'unknown'),
architecture=app_context.get('architecture', 'unknown'),
)
response = await llm_client.analyze(prompt)
return self._parse_chains(response)b) Cross-component reasoning
The key insight from the article: "Two changes that are individually safe can be dangerous in combination: a new API field here, a relaxed permission check there, and suddenly there's a cross-tenant data leak."
This requires understanding component boundaries. Extend the chaining engine to consider data flow between components:
class CrossComponentAnalyzer:
"""Analyze how findings in different components interact."""
def analyze_cross_component_risk(
self,
findings: list[dict],
dependency_graph: dict, # file → [files it imports]
) -> list[dict]:
"""Find findings that are individually low-risk but
dangerous in combination across component boundaries."""
# Group findings by component
by_component = defaultdict(list)
for f in findings:
component = self._classify_component(f['file_path'])
by_component[component].append(f)
# For each pair of connected components,
# check if their findings combine dangerously
dangerous_combos = []
for comp_a, comp_b in self._connected_pairs(dependency_graph):
findings_a = by_component.get(comp_a, [])
findings_b = by_component.get(comp_b, [])
if findings_a and findings_b:
combos = self._evaluate_combinations(findings_a, findings_b)
dangerous_combos.extend(combos)
return dangerous_combosc) Collaborative agent reasoning
Currently Phase 3 agents review independently. Add a "red team council" step where agents share findings and collaboratively build attack narratives:
# In agent_personas.py, add collaborative reasoning phase
CHAIN_COUNCIL_PROMPT = """The following security agents have independently
reviewed the codebase and found these issues:
{agent_findings}
As the Red Team Council, your job is to:
1. Identify attack paths that span multiple agents' findings
2. Determine if any combination of "medium" findings creates a "critical" chain
3. Propose exploitation sequences that chain findings end-to-end
4. Highlight any finding that is a prerequisite for exploiting another
Focus on practical, real-world attack scenarios."""- Extend
scripts/vulnerability_chaining_engine.pywith LLM-powered discovery - New module:
scripts/cross_component_analyzer.py - Phase 3 in
scripts/agent_personas.py— add collaborative council step - Config:
enable_agent_chain_discovery=True,enable_cross_component_analysis=True
RemediationEnginegeneratesRemediationSuggestionobjects withfixed_code,diff,explanation,testing_recommendationsRegressionTestergenerates language-specific test code and runs it in CIautomated-audit.ymlusespeter-evans/create-pull-requestto open PRs with audit findings (reports, not code fixes)
These three capabilities are disconnected. The engine generates a fix, but nobody applies it. The regression tester exists, but isn't triggered by the fix. There's no PR-with-code-fix flow.
The Aikido article describes: "AutoFix generates a merge-ready PR with the specific code-level fix. Developers review, merge, and agents automatically retest to confirm the fix holds."
a) AutoFix PR generator
Wire RemediationEngine output into actual PR creation:
class AutoFixPRGenerator:
"""Generate merge-ready PRs from remediation suggestions."""
def create_fix_pr(
self,
suggestion: RemediationSuggestion,
repo_path: str,
base_branch: str = "main",
) -> dict:
branch_name = f"argus/fix-{suggestion.vulnerability_type}-{suggestion.finding_id[:8]}"
# Apply the diff to the actual file
self._apply_fix(suggestion, repo_path)
# Generate regression test for this fix
regression_test = self.regression_tester.generate_test(
finding=suggestion.to_finding_dict(),
language=self._detect_language(suggestion.file_path),
)
# Create PR with fix + test
pr_body = self._format_pr_body(suggestion, regression_test)
return {
'branch': branch_name,
'files_changed': [suggestion.file_path],
'test_file': regression_test.path if regression_test else None,
'title': f"fix: {suggestion.vulnerability_type} in {os.path.basename(suggestion.file_path)}",
'body': pr_body,
}
def _format_pr_body(self, suggestion, regression_test) -> str:
return f"""## Security Fix — {suggestion.vulnerability_type}
**Finding:** {suggestion.finding_id}
**File:** `{suggestion.file_path}:{suggestion.line_number}`
**CWE:** {', '.join(suggestion.cwe_references)}
**Confidence:** {suggestion.confidence}
### What changed
{suggestion.explanation}
### Diff
```diff
{suggestion.diff}{chr(10).join(f'- {t}' for t in suggestion.testing_recommendations)}
{'Yes — see ' + regression_test.path if regression_test else 'No (template not available for this vuln type)'}
Generated by Argus Security — [verify before merging] """
**b) Retest-on-merge workflow**
```yaml
# .github/workflows/argus-retest.yml
name: Argus Retest After Fix
on:
pull_request:
types: [closed]
jobs:
retest:
if: |
github.event.pull_request.merged == true &&
startsWith(github.event.pull_request.head.ref, 'argus/fix-')
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run targeted rescan
run: |
# Extract the finding ID from the branch name
FINDING_ID=$(echo "${{ github.event.pull_request.head.ref }}" | sed 's/argus\/fix-.*-//')
python scripts/regression_tester.py run \
--test-dir tests/security_regression \
--finding-id "$FINDING_ID"
- name: Run SAST on fixed file
run: |
CHANGED_FILES=$(gh pr view ${{ github.event.pull_request.number }} --json files -q '.files[].path')
semgrep --config=auto $CHANGED_FILES
- name: Update finding status
if: success()
run: |
python -c "
from feedback_loop import FeedbackLoop
fl = FeedbackLoop()
fl.record_feedback(
finding_id='$FINDING_ID',
automated_verdict='confirmed',
human_verdict='confirmed',
confidence=1.0,
category='fix_verified',
reasoning='Automated retest passed after merge'
)"
c) Full closed-loop orchestration
The holy grail: scan → find → fix → PR → merge → retest → verify, all automated:
class ClosedLoopOrchestrator:
"""Orchestrate the full find → fix → verify loop."""
async def run_closed_loop(self, scan_results: list[dict]) -> dict:
loop_results = {
'fixed': [],
'fix_failed': [],
'retest_failed': [],
'verified': [],
}
fixable = [f for f in scan_results if f.get('auto_fixable')]
for finding in fixable:
# Step 1: Generate fix
suggestion = self.remediation_engine.generate_fix(finding)
if not suggestion or suggestion.confidence == 'low':
loop_results['fix_failed'].append(finding)
continue
# Step 2: Create PR
pr = self.pr_generator.create_fix_pr(suggestion)
loop_results['fixed'].append({**finding, 'pr': pr})
# Step 3: Run regression test against the fix (pre-merge validation)
test_result = self.regression_tester.run_single(
finding_id=finding['id'],
patched_code=suggestion.fixed_code,
)
if test_result.passed:
loop_results['verified'].append(finding)
else:
loop_results['retest_failed'].append(finding)
return loop_results- New module:
scripts/autofix_pr_generator.py - New workflow:
.github/workflows/argus-retest.yml - Extend
scripts/hybrid_analyzer.pyPhase 6 to optionally trigger AutoFix - Config:
enable_autofix_pr=False(opt-in),autofix_confidence_threshold="high",autofix_retest=True
Argus stores results as flat files:
.argus/feedback/feedback_records.jsonl— human TP/FP verdicts.argus/feedback/confidence_adjustments.json— pattern multipliers.argus/sandbox-results/— per-exploit outcomestests/security_regression/— regression test cases- Per-scan JSON/SARIF/Markdown reports
Each scan is independent. There's no way to ask "has this vulnerability been seen before?", "is this a regression?", "what's our false positive rate trending?", or "what attack patterns are most common in this codebase?"
a) SQLite-backed findings store
Lightweight, zero-infrastructure, embedded in the repo (or .argus/):
class FindingsStore:
"""Persistent cross-scan findings database."""
SCHEMA = """
CREATE TABLE IF NOT EXISTS findings (
id TEXT PRIMARY KEY,
scan_id TEXT NOT NULL,
scan_timestamp TEXT NOT NULL,
vuln_type TEXT NOT NULL,
severity TEXT NOT NULL,
file_path TEXT,
line_number INTEGER,
cwe TEXT,
cvss_score REAL,
source_tool TEXT,
status TEXT DEFAULT 'open', -- open, fixed, false_positive, accepted_risk
first_seen TEXT NOT NULL,
last_seen TEXT NOT NULL,
times_seen INTEGER DEFAULT 1,
fix_verified BOOLEAN DEFAULT FALSE,
fingerprint TEXT NOT NULL -- content-based dedup key
);
CREATE TABLE IF NOT EXISTS scan_history (
scan_id TEXT PRIMARY KEY,
timestamp TEXT NOT NULL,
commit_sha TEXT,
branch TEXT,
total_findings INTEGER,
critical INTEGER,
high INTEGER,
medium INTEGER,
low INTEGER,
duration_seconds REAL,
cost_usd REAL
);
CREATE TABLE IF NOT EXISTS fix_history (
finding_id TEXT,
fix_commit TEXT,
fix_timestamp TEXT,
fix_method TEXT, -- autofix, manual, dependency_update
retest_passed BOOLEAN,
regression_detected BOOLEAN DEFAULT FALSE
);
"""
def record_scan(self, scan_results: dict) -> None:
"""Record a scan and upsert all findings."""
...
def is_regression(self, finding: dict) -> bool:
"""Check if a finding was previously fixed but has reappeared."""
...
def trending(self, days: int = 90) -> dict:
"""Return severity trends over time."""
...
def mean_time_to_fix(self, severity: str = None) -> float:
"""Calculate MTTF across all findings or by severity."""
...b) Cross-scan deduplication
Use content-based fingerprinting to track findings across scans:
def fingerprint_finding(finding: dict) -> str:
"""Generate a stable fingerprint for cross-scan dedup.
Uses vulnerability type + file path + code context (not line number,
which shifts with unrelated edits).
"""
key_parts = [
finding.get('vuln_type', ''),
finding.get('file_path', ''),
finding.get('code_snippet', '')[:200], # Normalized code context
finding.get('cwe', ''),
]
return hashlib.sha256('|'.join(key_parts).encode()).hexdigest()[:16]c) Historical context injection
Feed knowledge base context into LLM enrichment prompts:
# In Phase 2 AI enrichment, add historical context
historical_context = f"""
Historical context for this finding:
- First seen: {store.first_seen(fingerprint)}
- Times detected: {store.times_seen(fingerprint)}
- Previous status: {store.previous_status(fingerprint)}
- Related findings in same file: {store.related_count(file_path)}
- False positive rate for this pattern: {store.fp_rate(vuln_type)}%
Use this context to calibrate your confidence score.
"""- New module:
scripts/findings_store.py - Integrate into
scripts/hybrid_analyzer.pyPhase 6 (record) and Phase 2 (query) - Extend
scripts/feedback_loop.pyto write to the store - Config:
enable_findings_store=True,findings_db_path=".argus/findings.db"
- SAST: Semgrep, Checkov, TruffleHog, heuristic scanner
- DAST: ZAP + Nuclei via
dast_orchestrator.py - Correlation:
sast_dast_correlator.pybridges SAST and DAST findings - Auth config:
dast_auth_config.pyhandles authenticated scanning
Each scanner operates with its own isolated view. There's no unified application model that says "here's the full attack surface: these API endpoints exist, they're backed by these handlers, protected by this middleware, talking to this database, deployed behind this cloud config." The correlator connects findings after the fact, but doesn't inform the scanning itself.
a) Application context model
Build a lightweight representation of the application that all phases can reference:
@dataclass
class ApplicationContext:
"""Unified application context fed to all pipeline phases."""
# Code structure
framework: str # django, express, spring, etc.
language: str
entry_points: list[str] # Main files, route definitions
auth_mechanism: str # jwt, session, oauth2, api_key
# API surface
api_endpoints: list[dict] # From OpenAPI spec or route discovery
middleware_chain: list[str] # Auth, CORS, rate limiting, etc.
data_models: list[dict] # Database models / schemas
# Infrastructure
cloud_provider: str # aws, gcp, azure, none
iac_files: list[str] # Terraform, K8s manifests
container_config: dict # Dockerfile analysis
secrets_management: str # vault, env, ssm, none
# Dependencies
direct_deps: list[dict] # From package.json, requirements.txt, etc.
transitive_deps: list[dict] # Full dependency tree
# DAST context (if available)
deployment_url: str | None
authenticated_endpoints: list[str]
discovered_endpoints: list[str] # From crawlingb) Context-aware scanning
Pass the context model into scanner configuration:
# Phase 1: Context-aware Semgrep rules
if context.framework == 'django':
semgrep_rules.append('p/django')
semgrep_rules.append('p/python-django-security')
if context.auth_mechanism == 'jwt':
semgrep_rules.append('p/jwt')
# Phase 2: Context-enriched LLM prompts
enrichment_prompt += f"""
Application context:
- Framework: {context.framework}
- Auth: {context.auth_mechanism}
- Cloud: {context.cloud_provider}
- Known API endpoints: {len(context.api_endpoints)}
- Middleware chain: {' → '.join(context.middleware_chain)}
Use this context to determine if the finding is actually exploitable
in this specific application architecture.
"""- New module:
scripts/app_context_builder.py - Called once at pipeline start, passed to all phases
- Feed into
scripts/config_loader.pyas enrichment context - Config:
enable_app_context=True
SandboxValidatorruns exploit PoCs in isolated Docker containersdast_orchestrator.pyruns ZAP + Nuclei against live targets- These are separate capabilities — sandbox validates code-level findings, DAST scans network-level surface
The sandbox proves "this code is theoretically exploitable in isolation." DAST finds "this endpoint responds to this payload." Neither proves "this specific finding from SAST is exploitable in the deployed application."
The article describes: "Every finding is confirmed through direct exploitation against the live target."
a) SAST-to-DAST validation pipeline
Take SAST findings and generate targeted DAST tests:
class SastToDastValidator:
"""Validate SAST findings against the live deployment."""
async def validate_finding(
self,
finding: dict,
target_url: str,
auth_config: dict,
) -> dict:
"""Generate and execute a targeted DAST test for a SAST finding."""
# Map SAST finding to HTTP test
test_case = self._generate_test_case(finding)
if not test_case:
return {'validated': False, 'reason': 'no_test_mapping'}
# Execute against live target
result = await self._execute_test(test_case, target_url, auth_config)
return {
'validated': result.exploitable,
'evidence': result.response_excerpt,
'http_status': result.status_code,
'validation_method': 'live_dast',
}
def _generate_test_case(self, finding: dict) -> dict | None:
"""Map a SAST finding to a concrete HTTP test case.
Example: SQL injection in /api/users?id= →
GET /api/users?id=1' OR '1'='1 and check for data leak indicators
"""
vuln_type = finding.get('vuln_type', '')
endpoint = finding.get('endpoint') or self._infer_endpoint(finding)
if not endpoint:
return None
# Generate test payloads based on vulnerability type
payloads = self.payload_generator.for_vuln_type(vuln_type)
return {
'endpoint': endpoint,
'method': finding.get('http_method', 'GET'),
'payloads': payloads,
'success_indicators': self._success_indicators(vuln_type),
}b) Exploit replay against deployment
Wire ProofByExploitation to optionally target a live URL instead of only the Docker sandbox:
# In sandbox_validator.py, add a mode for live target validation
class LiveTargetValidator:
"""Validate findings against a live deployment (staging/preview)."""
ALLOWED_ENVIRONMENTS = ['staging', 'preview', 'development'] # Never production
def validate(self, finding: dict, target_url: str, environment: str) -> dict:
if environment not in self.ALLOWED_ENVIRONMENTS:
raise ValueError(f"Live validation not allowed against {environment}")
...- New module:
scripts/sast_dast_validator.py - Extend Phase 4 to include live validation when
dast_target_urlis set - Config:
enable_live_validation=False,live_validation_environment="staging"
Ordered by impact-to-effort ratio:
| Priority | Feature | Effort | Impact |
|---|---|---|---|
| P0 | Diff-intelligent scanner scoping | Medium | High — every scan runs faster and more focused |
| P0 | AutoFix → PR generation | Medium | High — closes the most visible gap |
| P1 | Persistent findings store (SQLite) | Medium | High — enables trending, regression detection, MTTF |
| P1 | Retest-on-merge workflow | Low | Medium — completes the closed loop |
| P1 | Agent-driven chain discovery | Medium | High — biggest quality uplift for finding depth |
| P2 | Deployment-triggered scanning | Low | Medium — extends coverage to post-deploy |
| P2 | Application context model | High | High — improves everything but requires broad integration |
| P2 | SAST-to-DAST validation | High | Medium — requires live target, auth, environment setup |
All new capabilities follow Argus's existing pattern of config-driven feature flags:
# In profiles/ or config_loader.py defaults
continuous_testing:
enable_diff_scoping: true
diff_expand_impact_radius: true
scan_trigger: ["push", "pr", "deploy"]
autonomous_loop:
enable_autofix_pr: false # Opt-in: generates PRs with fixes
autofix_confidence_threshold: high # Only auto-fix high-confidence suggestions
autofix_retest: true # Retest after fix merge
autofix_max_prs_per_scan: 5 # Rate limit
knowledge_base:
enable_findings_store: true
findings_db_path: ".argus/findings.db"
enable_cross_scan_dedup: true
enable_trending: true
inject_historical_context: true # Feed history into LLM prompts
agent_reasoning:
enable_agent_chain_discovery: true
enable_cross_component_analysis: true
enable_collaborative_council: false # Expensive: multi-agent discussion
live_validation:
enable_live_validation: false
live_validation_environment: staging
enable_sast_dast_validation: falseToday, Argus is a powerful scan-on-demand pipeline: trigger it, get results, act on them.
With these additions, Argus becomes a continuous security loop:
Code pushed → Diff classified → Scanners scoped to blast radius
→ AI enrichment with historical context → Agent-driven chain discovery
→ Sandbox + live validation → AutoFix PRs generated
→ Developer merges → Automated retest → Finding marked verified
→ Knowledge base updated → Next scan is smarter
The attackers have autonomous tools. This gives defenders the same.