Skip to content

feat(security): US2 unified report + cross-scanner consensus confidence (Spec 077)#792

Merged
Dumbris merged 6 commits into
mainfrom
077-us2-unified-report
Jul 2, 2026
Merged

feat(security): US2 unified report + cross-scanner consensus confidence (Spec 077)#792
Dumbris merged 6 commits into
mainfrom
077-us2-unified-report

Conversation

@Dumbris

@Dumbris Dumbris commented Jul 2, 2026

Copy link
Copy Markdown
Member

Spec 077 US2 — Unified report + cross-scanner consensus confidence

Supersedes the auto-closed #788 (that PR was stacked on the pre-merge US1 head 0f5838e6; GitHub auto-closed it when US1 squash-merged to main as 5b925e45). This PR is the same US2 work rebased fresh onto main, so it applies cleanly on top of the final US1 baseline.

What this adds (on top of merged US1)

  • Unified report + cross-scanner consensus (internal/security/scanner/sarif.go, engine.go): CalculateRiskScore weights consensus groups (same location + threat_type from ≥2 distinct sources), additive confidence boost capped at 1.0 (consensusConfidenceStep), Finding.Sources populated + merged across scanners, severity backfill so external/legacy SARIF findings never render blank.
  • Round-1 review fix included (5c14f790): consensus groups scored at max severity, stronger fields preserved on merge.
  • Web UI (frontend/src/views/ScanReport.vue, types/api.ts): renders unified findings with severity + source attribution.

Rebase conflict resolution

Only internal/security/scanner/sarif.go conflicted, in ClassifyThreat. US1 (final, on main) added an early if f.Tier != "" { return } so the legacy keyword classifier never rewrites authoritative detect findings; US2 added defer backfillSeverity(f). Both behaviors are kept: the defer is registered before the early-return so it runs on every exit path (including baseline findings), and it is a no-op when severity is already set — so detect findings keep their authoritative classification while legacy/external findings still get a backfilled severity. No US1 fix was reverted.

Verification (actual output)

  • go build ./... → exit 0
  • go test ./internal/security/... -count=1 → all packages ok (security, detect, detect/checks, patterns, scanner)
  • go run ./cmd/scan-eval --gate --min-recall 0.90 --max-fp 0.05 --corpus specs/065-evaluation-foundation/datasets/detect_corpus_v1.jsonGATE PASSED: recall=1.0000 (≥0.90), fp=0.0000 (≤0.05)
  • golangci-lint run --config .github/.golangci.yml ./internal/security/... → 0 issues

Related: Spec 077

Dumbris added 3 commits July 2, 2026 07:55
Spec 077 US2 (T018-T022). Collapse the per-scanner findings into a single
unified report and let independent scanners agreeing on the same issue raise
confidence and the risk score instead of producing duplicate entries.

- MergeFindings: dedup by (rule_id, location) into one entry whose Sources
  lists every contributing scanner; boost confidence when >=2 distinct sources
  agree on the same (location, threat_type) (T018/T019/T021).
- CalculateRiskScore: matched external/Docker findings on (location,
  threat_type) now ADD to consensus (counted once, weighted by the number of
  agreeing sources) instead of flattening to weight 1; legacy per-rule dedup and
  detect Signals weighting preserved for the empty-threat_type path (T020).
- AggregateReports: run MergeFindings after classification so the aggregated
  report carries populated Sources and consensus-boosted confidence (T021).
- ClassifyThreat: backfill a user-readable severity from the classified threat
  level for external/legacy SARIF findings that arrive without one (T022).

- New: dedup-to-one, consensus-confidence-boost, cross-source-consensus-score,
  severity-backfill. Existing risk-score/dedup/aggregate tests unchanged.

Related: Spec 077 (specs/077-scanner-simplification)
Spec 077 US2 (T023). Surface the merged, deduplicated finding list in the scan
report: each finding shows its severity, its contributing source(s), and a
"consensus xN" badge when two or more scanners independently agreed. Adds the
additive `sources`/`tier`/`confidence`/`signals` fields to SecurityScanFinding.

## Changes
- api.ts: extend SecurityScanFinding with sources/tier/confidence/signals.
- ScanReport.vue: severity badge + consensus badge in the finding header;
  Source(s) row (falls back to the single scanner id for legacy findings).

## Testing
- vue-tsc + vite build clean.

Related: Spec 077 (specs/077-scanner-simplification)
…er fields on merge

CalculateRiskScore consensus branch now scores each (location, threat_type)
group at the most-severe threat category and the max per-finding weight among
all agreeing findings, instead of whichever finding was encountered first. For
severity-derived threat_types (supply_chain, uncategorized, malicious_code
fallback) agreeing findings can carry different threat_levels, so the previous
first-wins behavior made the score order-dependent and could drop a genuine
high/warning finding. Extracted a threatCategory helper and precompute the
group max in the existing consensus pass for determinism.

MergeFindings phase-1 dedup now absorbs the duplicate's stronger fields: it
takes max(Confidence), the more-severe Tier (hard > soft via tierRank), and the
union of Signals, so merging a hard/high-confidence finding with a same-
(rule_id, location) soft/low-confidence duplicate no longer discards the hard
tier and higher confidence.

Adds tests locking order-independent max-severity consensus scoring and
max-confidence + hard-tier-wins on merge.

Related: Spec 077
@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jul 2, 2026

Copy link
Copy Markdown

Deploying mcpproxy-docs with  Cloudflare Pages  Cloudflare Pages

Latest commit: 3d7d435
Status: ✅  Deploy successful!
Preview URL: https://7b4e5635.mcpproxy-docs.pages.dev
Branch Preview URL: https://077-us2-unified-report.mcpproxy-docs.pages.dev

View logs

@codecov-commenter

codecov-commenter commented Jul 2, 2026

Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 93.78531% with 11 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
internal/security/scanner/sarif.go 93.75% 11 Missing ⚠️

📢 Thoughts on this report? Let us know!

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

📦 Build Artifacts

Workflow Run: View Run
Branch: 077-us2-unified-report

Available Artifacts

  • archive-darwin-amd64 (28 MB)
  • archive-darwin-arm64 (25 MB)
  • archive-linux-amd64 (16 MB)
  • archive-linux-arm64 (14 MB)
  • archive-windows-amd64 (28 MB)
  • archive-windows-arm64 (25 MB)
  • frontend-dist-pr (0 MB)
  • installer-dmg-darwin-amd64 (21 MB)
  • installer-dmg-darwin-arm64 (19 MB)

How to Download

Option 1: GitHub Web UI (easiest)

  1. Go to the workflow run page linked above
  2. Scroll to the bottom "Artifacts" section
  3. Click on the artifact you want to download

Option 2: GitHub CLI

gh run download 28570621212 --repo smart-mcp-proxy/mcpproxy-go

Note: Artifacts expire in 14 days.

@mcpproxy-gatekeeper

Copy link
Copy Markdown

🤖 Codex cross-model review — PR #792 US2 (round 1, read-only)

Post-rebase review. Confirms the sarif.go rebase-conflict resolution is correct. One MergeFindings severity-merge gap (order-dependent aggregate score). Round 1 of the 5-round cap. Posted via the gatekeeper app.


VERDICT: changes_requested

Findings

Severity File:line Issue Must-fix?
High internal/security/scanner/sarif.go:525 MergeFindings dedups by (rule_id, location) but only absorbs sources/confidence/tier/signals. If the first duplicate is low/info and a later duplicate is high/warning for the same rule/location, the stronger severity/threat_level is discarded before AggregateReports calls CalculateRiskScore, making aggregate scoring and summary order-dependent and not “max severity among agreeing findings.” Yes

Verified safe

  • ClassifyThreat conflict resolution keeps US1’s if f.Tier != "" { return }; the new defer backfillSeverity(f) still runs without keyword reclassifying baseline detect findings.
  • internal/runtime/tool_quarantine.go is untouched.
  • Source unions are sorted, so merged sources/signals output is deterministic.
  • git diff --check origin/main...HEAD passed.
  • git fetch origin && git diff origin/main...HEAD: fetch failed in the read-only sandbox: cannot open .git/worktrees/.../FETCH_HEAD; reviewed local origin/main...HEAD diff instead.
  • go build ./...: failed before compilation because Go could not create its temp build dir.
  • go test ./internal/security/... -count=1: failed before tests for the same temp-dir permission error.
  • go run ./cmd/scan-eval ...: failed before execution for the same temp-dir permission error.
  • golangci-lint ...: failed with parallel golangci-lint is running on both attempts.

MergeFindings phase-1 dedup by (rule_id, location) took max Confidence
and the most-severe Tier but kept the first occurrence's Severity and
ThreatLevel, discarding a later duplicate's if it was more severe. A
low/info finding followed by a high/warning duplicate at the same
(rule_id, location) merged at the LOWER severity, making the aggregate
CalculateRiskScore and the report summary order-dependent, contradicting
US2's "max severity among agreeing findings" intent.

Add severityRank and threatLevelRank ordering helpers (strict refinements
of threatCategory's bucketing, so they never disagree with
CalculateRiskScore) and take the more-severe Severity and ThreatLevel on
absorb. Merging the same two findings in either order now yields
identical Severity/ThreatLevel/Confidence/Tier and an identical risk
score.

Related: Spec 077
@mcpproxy-gatekeeper

Copy link
Copy Markdown

🤖 Codex cross-model review — PR #792 US2 (round 2, read-only)

Round-1 severity-merge fix verified. Round 2: a consensus-weight interaction (low-severity agreement adding dangerous-bucket weight) + a MergeFindings metadata-loss gap. Note US2 risk_score is display-only (does not gate approval/verdict). Round 2 of the 5-round cap. Posted via the gatekeeper app.


VERDICT: changes_requested

Findings

Severity File:line Issue Must-fix?
High internal/security/scanner/sarif.go:358 Cross-source consensus uses weight = max(groupMaxWeight, sourceCount) and applies it to groupMaxCat. That lets a low-severity/info external finding at the same (location, threat_type) increase the dangerous bucket for a hard baseline finding, e.g. dangerous count 1 -> 2 and score 25 -> 39. Consensus should boost confidence, but low-severity agreement must not add dangerous risk weight. Yes
Medium internal/security/scanner/sarif.go:564 MergeFindings absorbs confidence/tier/severity/threat level/signals, but drops later duplicate metadata such as HelpURI, CVSSScore, package/version fields, evidence, category/title/description improvements, and SupplyChainAudit/ScanPass. That violates the “no info loss on merge” requirement for (rule_id, location) dedup. Yes

Verified safe

  • ClassifyThreat conflict resolution preserves US1’s early return: defer backfillSeverity(f) runs, then if f.Tier != "" { return } still prevents legacy keyword reclassification of authoritative detect findings.
  • Explicit detect severity/tier/level are not overwritten by backfillSeverity; legacy/external findings still get severity backfill after classification.
  • MergeFindings source union is sorted/deduped, and consensus source counting uses maps, so the same scanner is not double-counted.
  • internal/runtime/tool_quarantine.go is not in the branch diff.
  • Determinism looks preserved for merged Sources/Signals via sorted unions and first-occurrence output order.

Verification run results:

  • git fetch origin && git diff origin/main...HEAD: failed because read-only sandbox could not write .git/.../FETCH_HEAD; reviewed local origin/main...HEAD diff.
  • go build ./...: failed before build, mkdir .../T/go-build...: operation not permitted.
  • go test ./internal/security/... -count=1: failed before tests for the same Go build work dir permission.
  • go run ./cmd/scan-eval ...: failed before run for the same permission.
  • golangci-lint: failed with code 3, parallel golangci-lint is running.

Two Codex round-2 correctness fixes in the display-only risk_score/report
merge (no gating or baseline-verdict change):

1. Consensus weight no longer lets a weaker agreeing finding inflate the
   stronger bucket. CalculateRiskScore now weights a consensus group's
   max-severity bucket by only the distinct sources (and max signal weight)
   that themselves rated the issue at that category. A dangerous + info pair
   sharing (location, threat_type) scores dangerous at weight 1 (not the raw
   source count); two genuine dangerous sources still score weight 2. The
   weaker finding can still raise confidence in MergeFindings. Order-independent.

2. MergeFindings phase-1 dedup no longer drops the absorbed duplicate's
   enrichment metadata. On absorb the kept finding backfills any empty field
   from the duplicate (HelpURI, package/version, Evidence, category, title,
   description, scan pass), takes the max CVSSScore and ORs SupplyChainAudit.
   Union semantics, order-independent.

Tests: weak-does-not-inflate-strong consensus scoring (order-independent) and
enrichment-metadata backfill (DeepEqual identical in both merge orders).

Related: Spec 077
@mcpproxy-gatekeeper

Copy link
Copy Markdown

🤖 Codex cross-model review — PR #792 US2 (round 3, read-only)

Round-2 fixes verified. Round 3 finds the aggregate path scores AFTER MergeFindings, defeating the per-source consensus fix. Real but display-only (risk_score does not gate approval/verdict; only manifests with deep-scan-on multi-scanner). Round 3 of 5 — escalating to the maintainer for a merge-vs-structural-fix decision rather than grinding the cap on a presentation field. Posted via the gatekeeper app.


VERDICT: changes_requested

Findings

Severity File:line Issue Must-fix?
High engine.go:778, sarif.go:612 AggregateReports scores after MergeFindings. The merge unions all Sources for same (rule_id, location) and promotes Severity/ThreatLevel to max, losing which scanner supplied which severity. CalculateRiskScore can then weight a dangerous merged finding by all sources, so an info/low duplicate can inflate a dangerous score. Raw scoring avoids this, but aggregate scoring feeds post-merge data. Yes

Verified safe

  • ClassifyThreat conflict resolution preserves US1’s if f.Tier != "" { return } before keyword classification. The deferred severity backfill does not overwrite existing detect severity.
  • internal/runtime/tool_quarantine.go is not in the branch diff.
  • Dedup merge keeps max confidence, most-severe tier, most-severe severity/threat level, unions signals/sources, and backfills enrichment metadata.

Actual verification status:

  • git fetch origin && git diff origin/main...HEAD: fetch failed in read-only sandbox (FETCH_HEAD: Operation not permitted); reviewed local origin/main...HEAD diff.
  • go build ./...: failed before build: cannot create Go work dir.
  • go test ./internal/security/... -count=1: failed before tests: cannot create Go work dir.
  • go run ./cmd/scan-eval ...: failed before run: cannot create Go work dir.
  • golangci-lint ...: failed before analysis: parallel golangci-lint is running.

…t (US2)

Cross-scanner agreement no longer contributes to CalculateRiskScore weight.
The score is now a pure function of each finding's OWN severity/category:
each deduped (rule_id, location) issue scores at its MAX category/signal
weight, and distinct findings each count at their own level. A weak external
finding agreeing on (location, threat_type) can never inflate a stronger
bucket, and the score is deterministic, order-independent, and identical
whether computed before or after MergeFindings has run.

Removed the cross-source consensus-weight machinery in CalculateRiskScore
(consensusKey/catKey groups, groupSources/groupMaxCat/catSources/catMaxWeight,
seenConsensus). Retained: the detect engine's own per-finding multi-signal
weight (consensusWeight), MergeFindings dedup + max severity/tier/confidence
promotion + metadata backfill + Sources union, and the consensus CONFIDENCE
boost on agreement (satisfies SC-008).

This structurally resolves the recurring Codex round-2/round-3 nuances:
low-severity agreement can't inflate a strong finding and aggregate scores
are stable pre/post merge, because consensus adds no score weight at all.

Related: Spec 077
@Dumbris Dumbris merged commit 204b4bc into main Jul 2, 2026
50 checks passed
@Dumbris Dumbris deleted the 077-us2-unified-report branch July 2, 2026 06:56
Dumbris added a commit that referenced this pull request Jul 2, 2026
…iorities, audit fixes (#797)

A multi-agent consistency audit (2026-07-02) found roadmap.yaml stale versus
merged PRs and carrying several false progress badges from wrong spec links.
Corrected CI-filtered telemetry also re-prioritized the personal-edition work.

Statuses corrected per merged PRs:
- scanner-simplification children: US1 #786 / US2 #792 / US4 #794 marked done;
  US3 #793 in_review. Epic stays in_progress. Added deep-scan trust-fix task
  and flagged docs T037-T039 as merge-blocking for #793.
- registries-official-protocol marked done (spec 071 shipped 12/12, #572).

False badges / wrong provenance fixed (per the file's own convention — link
dropped, provenance moved into the note):
- sandbox-isolation no longer links spec 054 (unrelated security-gateway spec).
- ux-audit no longer links spec 064 (unrelated agent-fleet cockpit spec).
- marketplace no longer links spec 070 (that is the registries-search-add spec).
- action-log-transparency no longer links spec 024 (shipped backend, not the
  progress driver for the at-a-glance UX epic).

New epics (telemetry- and audit-driven replan):
- upgrade-nudge (P0, spec 079): ~60% of active installs run pre-v0.40.
- connect-trust (P0, spec 078): 72.4% skip the connect step.
- telemetry-identity (P1, in_progress): hashed machine_id + CI-filter hardening.
- planning-hygiene (P2): automate the checks this audit did by hand.

Windows QA gate: new first child windows-tray-funnel-qa (downloads→actives
12:1 vs macOS 4:1); windows-tray-window now depends on it.

ROADMAP.md regenerated; gen-roadmap.py --check passes.

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
Dumbris added a commit that referenced this pull request Jul 2, 2026
US1 (#786), US2 (#792), US4 (#794), and US3/deepscan-fixes (this branch) are all
merged. Check off every demonstrably-done task (verified against code + git log)
including the T037-T039 docs sweep. Left unchecked: T005 (contract schemas were
never copied to internal/security/scanner/testdata/ — tests validate behavior
directly), and the final validation gates T040-T042. Tally: 38/42.

Regenerate ROADMAP.md (077 now 38/42, in-flight).

Related #793

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Dumbris added a commit that referenced this pull request Jul 2, 2026
…ion (Spec 077) (#793)

* feat(config): unified security.deep_scan block + deprecated-key migration

Add the opt-in security.deep_scan config group (Spec 077 US3) that subsumes the
deprecated top-level scanner_fetch_package_source / scanner_disable_no_new_privileges
keys and gates the heavy Docker scanner layer. deep_scan.enabled defaults false so
only the deterministic in-process baseline runs.

- Add DeepScanConfig{Enabled, FetchPackageSource, DisableNoNewPrivileges, Scanners}
  with swaggertype tags; add nil-safe effective accessors (IsDeepScanEnabled,
  DeepScanScanners, EffectiveFetchPackageSource, IsDisableNoNewPrivileges).
- Remove the orphaned auto_scan_quarantined key (ignored on load if present).
- migrateDeepScanConfig folds the deprecated top-level keys into deep_scan.* on
  every load/hot-reload (wired in initializeRegistries) and clears the legacy
  keys so a re-serialized config exposes only the new surface. Idempotent.
- Round-trip + ignore-removed-key tests.

Related: Spec 077 (specs/077-scanner-simplification)

* feat(security): opt-in deep scan that never blocks or degrades the baseline

Demote the Docker scanner plugins + source extraction to an opt-in "deep scan"
layer (Spec 077 US3). Off by default: only the deterministic in-process baseline
runs and no Docker is invoked; a deep-scan failure is surfaced informationally
and never changes the baseline verdict.

- engine: gate Docker (non-in-process) scanners on deep_scan.enabled via
  deepScanAllowed; when off, resolveScanners drops them entirely so no container
  runs and no failure can degrade the verdict. Optional per-scanner allow-list.
- service: SetDeepScan runtime knob; populate DeepScanDescriptor
  {enabled,ran,available,scanners_failed} from per-scanner job statuses,
  classifying baseline (in-process) vs deep scanners.
- service: remove degradeIfIncompleteCoverage — the scan verdict is now derived
  solely from baseline findings (FR-008/FR-014); a failed deep scanner no longer
  downgrades a clean baseline to "degraded".
- server: wire security.deep_scan.* into the scanner service; read the
  no-new-privileges / fetch-package-source knobs via the effective accessors so
  migrated configs resolve to the unified surface.
- Point user-facing hints/logs at the new deep_scan.* config keys.
- Tests: deep-scan-off-by-default (baseline only, no Docker) and deep-scan
  failure leaves the baseline verdict unchanged with a populated descriptor;
  update the MCP-34.4 Docker-scanner tests to enable deep scan and the former
  MCP-2401 degrade tests to assert the baseline-only verdict.

Related: Spec 077 (specs/077-scanner-simplification)

* docs(api): regenerate OpenAPI for security.deep_scan config surface

Regenerate oas/swagger.yaml + oas/docs.go for the new config.DeepScanConfig
schema and the removed auto_scan_quarantined key. make swagger-verify passes.

Related: Spec 077 (specs/077-scanner-simplification)

* feat(web): surface opt-in deep scan; render deep-scan gaps as info

Spec 077 US3 Web UI: present deep scan as an opt-in affordance and render a
failed/unavailable deep scanner as informational, never an error — the baseline
verdict is authoritative.

- Security.vue: info banner clarifying the deterministic baseline is always on
  and the Docker scanners are an opt-in deep scan that never blocks/degrades it.
- ScanReport.vue: DeepScanDescriptor info block (alert-info) listing unavailable
  deep scanners with an explicit "does not affect the baseline verdict" note.
- api.ts: DeepScanDescriptor type + optional deep_scan on the report.

Related: Spec 077 (specs/077-scanner-simplification)

* fix(security): gate source extraction on deep_scan + wire deep-scan report banner

Address adversarial review findings on Spec 077 US3 (PR #789):

- #1 (MUST): published-package-source extraction is now part of the opt-in
  deep-scan layer. server.go computes the effective fetch as
  IsDeepScanEnabled() && fetch_package_source (default true), and
  SetDeepScan(false) force-disables the resolver's fetch fallback as
  defense-in-depth. Deep scan OFF (default) => no npx/uvx source-fetch
  network egress. Added TestServiceDeepScanGatesPackageSourceFetch and
  updated the doc comments to match.

- #2 (MUST): the deep-scan report banner was dead code — AggregatedReport
  had no DeepScan field, so /scan/report and /scans/{jobId}/report never
  emitted deep_scan. Added DeepScan *DeepScanDescriptor (json deep_scan,
  omitempty) to AggregatedReport and populate it in GetScanReport and
  GetScanReportByJobID (the live report-page path).

- #3 (LOW): buildDeepScanDescriptor now inspects Pass-1 AND Pass-2 scanner
  statuses (variadic jobs, deduped) so heavy trivy/supply-chain scanner
  failures are reflected in scanners_failed/availability.

Baseline verdict logic and the quarantine state machine are unchanged.

Related: Spec 077

* fix(security): confine source resolution + Pass 2 to the opt-in deep-scan layer

Spec 077 US3 promised that with deep scan OFF (the default) only the
deterministic in-process baseline runs — no Docker, no source extraction,
no network egress. Scanner EXECUTION was already gated, but source
RESOLUTION passes still ran. Close the three gaps:

- Finding #1 (HIGH): gate Pass-1 sourceResolver.Resolve on deepScanEnabled().
  With deep scan off, source resolution is skipped entirely (the baseline
  scans tool DEFINITIONS, not source files, so resolved source is unused) —
  no Docker container lookup/extraction and no package-source fetch.
- Finding #2 (HIGH): gate the Pass-2 auto-start (startPass2 → ResolveFullSource,
  the heavy Docker/full-source pass) on deepScanEnabled() in addition to the
  existing not-dry-run/not-URL conditions. Off ⇒ Pass-1 baseline only.
- Finding #3 (MEDIUM): add deep_scan (DeepScanDescriptor: enabled/ran/
  available/scanners_failed) to contracts.SecurityScanSummary, populate it in
  the server.go enricher adapter (was dropped), remove stale "degraded"
  references from the contract comments, and regenerate OpenAPI.

Tests: SourceResolver gains atomic Resolve/ResolveFullSource call counters;
new service-level tests assert neither runs with deep scan off (baseline still
settles a deterministic verdict) and both run with deep scan on; new server
adapter test asserts deep_scan is carried onto the wire summary when populated
and omitted when off.

Invariants preserved: isBlockingFinding stays tier-driven; no
degradeIfIncompleteCoverage; determinism/offline baseline unchanged.

Related: Spec 077

* fix(security): hot-reload deep_scan config + migrate legacy keys on apply (Spec 077 US3)

Codex round-2 findings on PR #793:

Finding #1 (HIGH) — deep_scan changes now hot-reload. DetectConfigChanges
compares Config.Security (deep_scan.{enabled,fetch_package_source,
disable_no_new_privileges,scanners} + deprecated top-level keys) so a lone
security.deep_scan.* edit is reported as a change instead of "no changes
detected". The server subscribes to config.reloaded (both file-edit and
/api/v1/config/apply emit it) and re-runs the scanner wiring via a new
Service.ApplySecurityConfig, so toggling deep scan takes effect without a
restart. Startup and reload now configure the scanner through the same call.

Finding #2 (MEDIUM) — /api/v1/config/apply now normalizes identically to
LoadFromFile. ApplyConfig runs config.MigrateDeepScanConfig on the submitted
config before diffing/saving, folding the deprecated
security.scanner_fetch_package_source / scanner_disable_no_new_privileges keys
into security.deep_scan (auto_scan_quarantined has no struct field, dropped at
decode). An API apply carrying the deprecated keys now saves only the unified
deep_scan surface (SC-007).

Tests: DetectConfigChanges deep-scan detection; Service.ApplySecurityConfig
reconfigures a live service without restart (incl. legacy-key fallback);
ApplyConfig legacy-key migration asserted on the saved file + runtime config.

Constraints respected: no new dependency; tool_quarantine.go untouched; no
US1-removed behavior reintroduced; determinism/offline preserved.

Related: Spec 077

* fix(security): deep-scan trust fixes — FR-014 baseline-only verdict, always-on deep_scan descriptor, enable-time hint

Three verified audit findings against the Spec 077 US3 deep-scan layer
(re-checked against head 7e1d51a before fixing):

FIX 1 (nil-Security gating) — already fixed at HEAD by a25ae2f/7e1d51ad:
source resolution + Pass 2 are gated on deepScanEnabled() at the call
sites and ApplySecurityConfig(nil) forces the layer off via nil-safe
accessors. This commit adds defense-in-depth (the server wiring now calls
ApplySecurityConfig unconditionally, even when Config itself is nil) and a
regression test pinning the exact audit scenario: config.DefaultConfig()
never initializes Config.Security, and that nil block must still yield
deep-scan OFF, no Docker scanners resolved, package-source fetch OFF.

FIX 2 (FR-014 verdict purity) — the "warnings" level was driven by
ThreatLevel across ALL findings, so a tierless deep-scan/external finding
at threat_level=warning flipped a clean baseline to "warnings", while a
tierless threat_level=dangerous finding fell into the Info bucket (an
inversion: LESS effect than warning). GetScanSummary now derives the
verdict at EVERY level solely from baseline (tiered) findings: dangerous
requires >=1 hard-tier finding (unchanged predicate, shared with the
approval gate), warnings requires >=1 warning-level baseline (soft)
finding. Tierless findings still surface in FindingCounts — a tierless
dangerous now counts at warning prominence instead of info — and in the
merged report/RiskScore (FR-009..FR-012 consensus weighting untouched),
but they never move the verdict.

FIX 3 (silent Docker-scanner skip):
(a) buildDeepScanDescriptor no longer returns nil when the layer is off;
    it always emits {enabled:false, ran:false, skipped_scanners:[ids of
    enabled-but-skipped Docker scanners]}, making quickstart scenario 1
    (deep_scan.enabled=false) actually observable in scan/report JSON.
    Field added to scanner + contracts descriptors, REST/SSE projection,
    frontend TS type; OpenAPI regenerated.
(b) POST /security/scanners/{id}/enable now returns a "hint" when a
    Docker-based scanner is enabled while security.deep_scan.enabled is
    false, and `mcpproxy security enable <id>` prints it ("scanner
    enabled, but it will not run until security.deep_scan.enabled=true"),
    plus help-text. SecurityController grew DeepScanEnabled() (implemented
    by scanner.Service already).

Assumptions documented (zero-interruption policy):
- Tierless threat_level=dangerous findings are bucketed as Warning in
  FindingCounts (inform-without-gating prominence) so counts cannot
  contradict the tier-driven Dangerous count / verdict.
- SkippedScanners lists installed/configured (i.e. enabled) non-in-process
  scanners only; merely "available" scanners are not "enabled" and are not
  listed.
- TestGetScanSummaryBothPasses and the descriptor-omitted assertions
  encoded the pre-FR-014 behavior and were updated to the spec-mandated
  contract.

Tests: new regression tests for all three fixes (scanner service, httpapi
handler, CLI hint extraction); go test -race across security/server/
config/httpapi/contracts/cmd green; golangci-lint v2 clean; frontend
vue-tsc clean; scripts/test-api-e2e.sh 65/65.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(security): FR-014 verdict purity on the report page — share the tier-driven verdict with the report payload

Codex review follow-up. The server-list summary (GetScanSummary) derived
the verdict tier-driven/baseline-only (FR-014), but the report page badge
still read the RAW threat-level ReportSummary — with deep scan on, a
tierless Docker-scanner finding classified "dangerous" made the report say
dangerous while the server list said clean, and the same finding counted
as Warning on one surface and Dangerous on the other.

- extract deriveBaselineVerdict() as the single source of truth, shared by
  GetScanSummary and AggregateReports so the two surfaces structurally
  cannot disagree
- AggregatedReport gains Verdict + tier-driven FindingCounts (additive;
  raw Summary counts retained for transparency)
- ScanReport.vue: status badge, threat tiles, Quarantine-button gate and
  the Approve-disable gate (hasUnresolvedCritical) now read the
  tier-driven verdict/finding_counts (raw summary only as a fallback for
  pre-Spec-077 payloads); the approve gate now mirrors the backend
  hard-tier-only isBlockingFinding instead of raw summary.critical
- ServerDetail.vue: approve-confirmation count and the Security-tab
  summary strip prefer tier-driven finding_counts over the raw report
  summary, so a tierless deep-scan "dangerous" finding no longer triggers
  the "Dangerous Findings Detected" modal on a clean baseline
- frontend/src/types/api.ts: SecurityScanSummary catches up with the wire
  contract (deep_scan, scanners_run/failed/total); SecurityScanReport
  gains verdict + finding_counts

Tests: engine-level AggregateReports verdict matrix (tierless-dangerous →
clean/warning-bucket, hard → dangerous, soft → warnings) and an
end-to-end pin that GetScanReportByJobID.Verdict/FindingCounts equal
GetScanSummary for the same scan data.

Assumption documented (zero-interruption policy): ServerCard.vue was
reported as sharing the raw-summary preference but already reads only the
tier-driven security_scan.finding_counts — left unchanged.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* docs(security): Spec 077 truth sweep — detect engine is sole in-process detector, deep_scan opt-in

Reconcile the security docs with the post-077 code:

- tool-scanner.md: remove the deleted legacy-TPA-rule coexistence section;
  the detect engine is now the sole in-process detector. Count the actually-
  registered checks — seven (four hard incl. phrase.injection, three soft) —
  and fix the front-matter, "The seven checks", and at-a-glance table.
- security-scanner-plugins.md: drop the legacy-rules claim + six-check count
  from the tpa-descriptions row; document the security.deep_scan block as the
  primary config surface; remove the deprecated auto_scan_quarantined /
  scanner_fetch_package_source docs; note the deep-scan gate + enable hint.
- docker-isolation.md (+ top-level copy): the sandbox/none Docker-scanner skip
  no longer downgrades to security_scan.status:"degraded" (FR-008); it surfaces
  via the always-emitted deep_scan descriptor. Replace the deprecated
  scanner_disable_no_new_privileges instruction with deep_scan.disable_no_new_privileges.
- security-quarantine.md: replace the deleted keyword-heuristic Detection
  Patterns table + Security Analysis JSON with the real quarantine-block shape
  and a pointer to the seven-check detect engine.
- configuration.md: add the security.deep_scan config block + migration note.

Related #793

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* docs(spec-077): reconcile tasks.md checkboxes with shipped reality

US1 (#786), US2 (#792), US4 (#794), and US3/deepscan-fixes (this branch) are all
merged. Check off every demonstrably-done task (verified against code + git log)
including the T037-T039 docs sweep. Left unchecked: T005 (contract schemas were
never copied to internal/security/scanner/testdata/ — tests validate behavior
directly), and the final validation gates T040-T042. Tally: 38/42.

Regenerate ROADMAP.md (077 now 38/42, in-flight).

Related #793

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* test(server): close bolt DB before TempDir cleanup (Windows unlink failure)

TestScanSummaryEnricherAdapterCarriesDeepScan passed its assertions but
failed on windows-latest: t.TempDir cleanup cannot unlink config.db
while the bbolt handle is open. setupTestStorage registers no close;
add an explicit t.Cleanup in this test's seed helper.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
Dumbris added a commit that referenced this pull request Jul 2, 2026
…ne, connect-trust/upgrade-nudge progress (#803)

check-github now passes with 0 errors: scanner-simplification epic
complete (#786/#792/#793/#794 incl. deep-scan trust fixes + docs
sweep); connect-trust US1 preview (#802) + backup visibility (#799)
done; upgrade-nudge status/log slice (#798) split out as done with
the banner+config remainder tracked separately; telemetry machine_id
client (#796) and hygiene check-github (#800) done. Remaining
warnings are the known windows-tray no-PR-evidence items.

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants