feat(security): US2 unified report + cross-scanner consensus confidence (Spec 077) by Dumbris · Pull Request #792 · smart-mcp-proxy/mcpproxy-go

Dumbris · 2026-07-02T04:57:17Z

Spec 077 US2 — Unified report + cross-scanner consensus confidence

Supersedes the auto-closed #788 (that PR was stacked on the pre-merge US1 head 0f5838e6; GitHub auto-closed it when US1 squash-merged to main as 5b925e45). This PR is the same US2 work rebased fresh onto main, so it applies cleanly on top of the final US1 baseline.

What this adds (on top of merged US1)

Unified report + cross-scanner consensus (internal/security/scanner/sarif.go, engine.go): CalculateRiskScore weights consensus groups (same location + threat_type from ≥2 distinct sources), additive confidence boost capped at 1.0 (consensusConfidenceStep), Finding.Sources populated + merged across scanners, severity backfill so external/legacy SARIF findings never render blank.
Round-1 review fix included (5c14f790): consensus groups scored at max severity, stronger fields preserved on merge.
Web UI (frontend/src/views/ScanReport.vue, types/api.ts): renders unified findings with severity + source attribution.

Rebase conflict resolution

Only internal/security/scanner/sarif.go conflicted, in ClassifyThreat. US1 (final, on main) added an early if f.Tier != "" { return } so the legacy keyword classifier never rewrites authoritative detect findings; US2 added defer backfillSeverity(f). Both behaviors are kept: the defer is registered before the early-return so it runs on every exit path (including baseline findings), and it is a no-op when severity is already set — so detect findings keep their authoritative classification while legacy/external findings still get a backfilled severity. No US1 fix was reverted.

Verification (actual output)

go build ./... → exit 0
go test ./internal/security/... -count=1 → all packages ok (security, detect, detect/checks, patterns, scanner)
go run ./cmd/scan-eval --gate --min-recall 0.90 --max-fp 0.05 --corpus specs/065-evaluation-foundation/datasets/detect_corpus_v1.json → GATE PASSED: recall=1.0000 (≥0.90), fp=0.0000 (≤0.05)
golangci-lint run --config .github/.golangci.yml ./internal/security/... → 0 issues

Related: Spec 077

Spec 077 US2 (T018-T022). Collapse the per-scanner findings into a single unified report and let independent scanners agreeing on the same issue raise confidence and the risk score instead of producing duplicate entries. - MergeFindings: dedup by (rule_id, location) into one entry whose Sources lists every contributing scanner; boost confidence when >=2 distinct sources agree on the same (location, threat_type) (T018/T019/T021). - CalculateRiskScore: matched external/Docker findings on (location, threat_type) now ADD to consensus (counted once, weighted by the number of agreeing sources) instead of flattening to weight 1; legacy per-rule dedup and detect Signals weighting preserved for the empty-threat_type path (T020). - AggregateReports: run MergeFindings after classification so the aggregated report carries populated Sources and consensus-boosted confidence (T021). - ClassifyThreat: backfill a user-readable severity from the classified threat level for external/legacy SARIF findings that arrive without one (T022). - New: dedup-to-one, consensus-confidence-boost, cross-source-consensus-score, severity-backfill. Existing risk-score/dedup/aggregate tests unchanged. Related: Spec 077 (specs/077-scanner-simplification)

Spec 077 US2 (T023). Surface the merged, deduplicated finding list in the scan report: each finding shows its severity, its contributing source(s), and a "consensus xN" badge when two or more scanners independently agreed. Adds the additive `sources`/`tier`/`confidence`/`signals` fields to SecurityScanFinding. ## Changes - api.ts: extend SecurityScanFinding with sources/tier/confidence/signals. - ScanReport.vue: severity badge + consensus badge in the finding header; Source(s) row (falls back to the single scanner id for legacy findings). ## Testing - vue-tsc + vite build clean. Related: Spec 077 (specs/077-scanner-simplification)

…er fields on merge CalculateRiskScore consensus branch now scores each (location, threat_type) group at the most-severe threat category and the max per-finding weight among all agreeing findings, instead of whichever finding was encountered first. For severity-derived threat_types (supply_chain, uncategorized, malicious_code fallback) agreeing findings can carry different threat_levels, so the previous first-wins behavior made the score order-dependent and could drop a genuine high/warning finding. Extracted a threatCategory helper and precompute the group max in the existing consensus pass for determinism. MergeFindings phase-1 dedup now absorbs the duplicate's stronger fields: it takes max(Confidence), the more-severe Tier (hard > soft via tierRank), and the union of Signals, so merging a hard/high-confidence finding with a same- (rule_id, location) soft/low-confidence duplicate no longer discards the hard tier and higher confidence. Adds tests locking order-independent max-severity consensus scoring and max-confidence + hard-tier-wins on merge. Related: Spec 077

cloudflare-workers-and-pages · 2026-07-02T05:00:53Z

Deploying mcpproxy-docs with Cloudflare Pages

Latest commit:	`3d7d435`
Status:	✅ Deploy successful!
Preview URL:	https://7b4e5635.mcpproxy-docs.pages.dev
Branch Preview URL:	https://077-us2-unified-report.mcpproxy-docs.pages.dev

View logs

codecov-commenter · 2026-07-02T05:04:24Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 93.78531% with 11 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
internal/security/scanner/sarif.go	93.75%	11 Missing ⚠️

📢 Thoughts on this report? Let us know!

github-actions · 2026-07-02T05:10:03Z

📦 Build Artifacts

Workflow Run: View Run
Branch: 077-us2-unified-report

Available Artifacts

archive-darwin-amd64 (28 MB)
archive-darwin-arm64 (25 MB)
archive-linux-amd64 (16 MB)
archive-linux-arm64 (14 MB)
archive-windows-amd64 (28 MB)
archive-windows-arm64 (25 MB)
frontend-dist-pr (0 MB)
installer-dmg-darwin-amd64 (21 MB)
installer-dmg-darwin-arm64 (19 MB)

How to Download

Option 1: GitHub Web UI (easiest)

Go to the workflow run page linked above
Scroll to the bottom "Artifacts" section
Click on the artifact you want to download

Option 2: GitHub CLI

gh run download 28570621212 --repo smart-mcp-proxy/mcpproxy-go

Note: Artifacts expire in 14 days.

mcpproxy-gatekeeper · 2026-07-02T06:08:58Z

🤖 Codex cross-model review — PR #792 US2 (round 1, read-only)

Post-rebase review. Confirms the sarif.go rebase-conflict resolution is correct. One MergeFindings severity-merge gap (order-dependent aggregate score). Round 1 of the 5-round cap. Posted via the gatekeeper app.

VERDICT: changes_requested

Findings

Severity	File:line	Issue	Must-fix?
High	`internal/security/scanner/sarif.go:525`	`MergeFindings` dedups by `(rule_id, location)` but only absorbs sources/confidence/tier/signals. If the first duplicate is low/info and a later duplicate is high/warning for the same rule/location, the stronger `severity`/`threat_level` is discarded before `AggregateReports` calls `CalculateRiskScore`, making aggregate scoring and summary order-dependent and not “max severity among agreeing findings.”	Yes

Verified safe

ClassifyThreat conflict resolution keeps US1’s if f.Tier != "" { return }; the new defer backfillSeverity(f) still runs without keyword reclassifying baseline detect findings.
internal/runtime/tool_quarantine.go is untouched.
Source unions are sorted, so merged sources/signals output is deterministic.
git diff --check origin/main...HEAD passed.
git fetch origin && git diff origin/main...HEAD: fetch failed in the read-only sandbox: cannot open .git/worktrees/.../FETCH_HEAD; reviewed local origin/main...HEAD diff instead.
go build ./...: failed before compilation because Go could not create its temp build dir.
go test ./internal/security/... -count=1: failed before tests for the same temp-dir permission error.
go run ./cmd/scan-eval ...: failed before execution for the same temp-dir permission error.
golangci-lint ...: failed with parallel golangci-lint is running on both attempts.

MergeFindings phase-1 dedup by (rule_id, location) took max Confidence and the most-severe Tier but kept the first occurrence's Severity and ThreatLevel, discarding a later duplicate's if it was more severe. A low/info finding followed by a high/warning duplicate at the same (rule_id, location) merged at the LOWER severity, making the aggregate CalculateRiskScore and the report summary order-dependent, contradicting US2's "max severity among agreeing findings" intent. Add severityRank and threatLevelRank ordering helpers (strict refinements of threatCategory's bucketing, so they never disagree with CalculateRiskScore) and take the more-severe Severity and ThreatLevel on absorb. Merging the same two findings in either order now yields identical Severity/ThreatLevel/Confidence/Tier and an identical risk score. Related: Spec 077

mcpproxy-gatekeeper · 2026-07-02T06:16:13Z

🤖 Codex cross-model review — PR #792 US2 (round 2, read-only)

Round-1 severity-merge fix verified. Round 2: a consensus-weight interaction (low-severity agreement adding dangerous-bucket weight) + a MergeFindings metadata-loss gap. Note US2 risk_score is display-only (does not gate approval/verdict). Round 2 of the 5-round cap. Posted via the gatekeeper app.

VERDICT: changes_requested

Findings

Severity	File:line	Issue	Must-fix?
High	`internal/security/scanner/sarif.go:358`	Cross-source consensus uses `weight = max(groupMaxWeight, sourceCount)` and applies it to `groupMaxCat`. That lets a low-severity/info external finding at the same `(location, threat_type)` increase the dangerous bucket for a hard baseline finding, e.g. dangerous count 1 -> 2 and score 25 -> 39. Consensus should boost confidence, but low-severity agreement must not add dangerous risk weight.	Yes
Medium	`internal/security/scanner/sarif.go:564`	`MergeFindings` absorbs confidence/tier/severity/threat level/signals, but drops later duplicate metadata such as `HelpURI`, `CVSSScore`, package/version fields, evidence, category/title/description improvements, and `SupplyChainAudit`/`ScanPass`. That violates the “no info loss on merge” requirement for `(rule_id, location)` dedup.	Yes

Verified safe

ClassifyThreat conflict resolution preserves US1’s early return: defer backfillSeverity(f) runs, then if f.Tier != "" { return } still prevents legacy keyword reclassification of authoritative detect findings.
Explicit detect severity/tier/level are not overwritten by backfillSeverity; legacy/external findings still get severity backfill after classification.
MergeFindings source union is sorted/deduped, and consensus source counting uses maps, so the same scanner is not double-counted.
internal/runtime/tool_quarantine.go is not in the branch diff.
Determinism looks preserved for merged Sources/Signals via sorted unions and first-occurrence output order.

Verification run results:

git fetch origin && git diff origin/main...HEAD: failed because read-only sandbox could not write .git/.../FETCH_HEAD; reviewed local origin/main...HEAD diff.
go build ./...: failed before build, mkdir .../T/go-build...: operation not permitted.
go test ./internal/security/... -count=1: failed before tests for the same Go build work dir permission.
go run ./cmd/scan-eval ...: failed before run for the same permission.
golangci-lint: failed with code 3, parallel golangci-lint is running.

Two Codex round-2 correctness fixes in the display-only risk_score/report merge (no gating or baseline-verdict change): 1. Consensus weight no longer lets a weaker agreeing finding inflate the stronger bucket. CalculateRiskScore now weights a consensus group's max-severity bucket by only the distinct sources (and max signal weight) that themselves rated the issue at that category. A dangerous + info pair sharing (location, threat_type) scores dangerous at weight 1 (not the raw source count); two genuine dangerous sources still score weight 2. The weaker finding can still raise confidence in MergeFindings. Order-independent. 2. MergeFindings phase-1 dedup no longer drops the absorbed duplicate's enrichment metadata. On absorb the kept finding backfills any empty field from the duplicate (HelpURI, package/version, Evidence, category, title, description, scan pass), takes the max CVSSScore and ORs SupplyChainAudit. Union semantics, order-independent. Tests: weak-does-not-inflate-strong consensus scoring (order-independent) and enrichment-metadata backfill (DeepEqual identical in both merge orders). Related: Spec 077

mcpproxy-gatekeeper · 2026-07-02T06:27:49Z

🤖 Codex cross-model review — PR #792 US2 (round 3, read-only)

Round-2 fixes verified. Round 3 finds the aggregate path scores AFTER MergeFindings, defeating the per-source consensus fix. Real but display-only (risk_score does not gate approval/verdict; only manifests with deep-scan-on multi-scanner). Round 3 of 5 — escalating to the maintainer for a merge-vs-structural-fix decision rather than grinding the cap on a presentation field. Posted via the gatekeeper app.

VERDICT: changes_requested

Findings

Severity	File:line	Issue	Must-fix?
High	engine.go:778, sarif.go:612	`AggregateReports` scores after `MergeFindings`. The merge unions all `Sources` for same `(rule_id, location)` and promotes `Severity`/`ThreatLevel` to max, losing which scanner supplied which severity. `CalculateRiskScore` can then weight a dangerous merged finding by all sources, so an info/low duplicate can inflate a dangerous score. Raw scoring avoids this, but aggregate scoring feeds post-merge data.	Yes

Verified safe

ClassifyThreat conflict resolution preserves US1’s if f.Tier != "" { return } before keyword classification. The deferred severity backfill does not overwrite existing detect severity.
internal/runtime/tool_quarantine.go is not in the branch diff.
Dedup merge keeps max confidence, most-severe tier, most-severe severity/threat level, unions signals/sources, and backfills enrichment metadata.

Actual verification status:

git fetch origin && git diff origin/main...HEAD: fetch failed in read-only sandbox (FETCH_HEAD: Operation not permitted); reviewed local origin/main...HEAD diff.
go build ./...: failed before build: cannot create Go work dir.
go test ./internal/security/... -count=1: failed before tests: cannot create Go work dir.
go run ./cmd/scan-eval ...: failed before run: cannot create Go work dir.
golangci-lint ...: failed before analysis: parallel golangci-lint is running.

…t (US2) Cross-scanner agreement no longer contributes to CalculateRiskScore weight. The score is now a pure function of each finding's OWN severity/category: each deduped (rule_id, location) issue scores at its MAX category/signal weight, and distinct findings each count at their own level. A weak external finding agreeing on (location, threat_type) can never inflate a stronger bucket, and the score is deterministic, order-independent, and identical whether computed before or after MergeFindings has run. Removed the cross-source consensus-weight machinery in CalculateRiskScore (consensusKey/catKey groups, groupSources/groupMaxCat/catSources/catMaxWeight, seenConsensus). Retained: the detect engine's own per-finding multi-signal weight (consensusWeight), MergeFindings dedup + max severity/tier/confidence promotion + metadata backfill + Sources union, and the consensus CONFIDENCE boost on agreement (satisfies SC-008). This structurally resolves the recurring Codex round-2/round-3 nuances: low-severity agreement can't inflate a strong finding and aggregate scores are stable pre/post merge, because consensus adds no score weight at all. Related: Spec 077

…iorities, audit fixes (#797) A multi-agent consistency audit (2026-07-02) found roadmap.yaml stale versus merged PRs and carrying several false progress badges from wrong spec links. Corrected CI-filtered telemetry also re-prioritized the personal-edition work. Statuses corrected per merged PRs: - scanner-simplification children: US1 #786 / US2 #792 / US4 #794 marked done; US3 #793 in_review. Epic stays in_progress. Added deep-scan trust-fix task and flagged docs T037-T039 as merge-blocking for #793. - registries-official-protocol marked done (spec 071 shipped 12/12, #572). False badges / wrong provenance fixed (per the file's own convention — link dropped, provenance moved into the note): - sandbox-isolation no longer links spec 054 (unrelated security-gateway spec). - ux-audit no longer links spec 064 (unrelated agent-fleet cockpit spec). - marketplace no longer links spec 070 (that is the registries-search-add spec). - action-log-transparency no longer links spec 024 (shipped backend, not the progress driver for the at-a-glance UX epic). New epics (telemetry- and audit-driven replan): - upgrade-nudge (P0, spec 079): ~60% of active installs run pre-v0.40. - connect-trust (P0, spec 078): 72.4% skip the connect step. - telemetry-identity (P1, in_progress): hashed machine_id + CI-filter hardening. - planning-hygiene (P2): automate the checks this audit did by hand. Windows QA gate: new first child windows-tray-funnel-qa (downloads→actives 12:1 vs macOS 4:1); windows-tray-window now depends on it. ROADMAP.md regenerated; gen-roadmap.py --check passes. Co-authored-by: Claude Fable 5 <noreply@anthropic.com>

US1 (#786), US2 (#792), US4 (#794), and US3/deepscan-fixes (this branch) are all merged. Check off every demonstrably-done task (verified against code + git log) including the T037-T039 docs sweep. Left unchecked: T005 (contract schemas were never copied to internal/security/scanner/testdata/ — tests validate behavior directly), and the final validation gates T040-T042. Tally: 38/42. Regenerate ROADMAP.md (077 now 38/42, in-flight). Related #793 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…ion (Spec 077) (#793) * feat(config): unified security.deep_scan block + deprecated-key migration Add the opt-in security.deep_scan config group (Spec 077 US3) that subsumes the deprecated top-level scanner_fetch_package_source / scanner_disable_no_new_privileges keys and gates the heavy Docker scanner layer. deep_scan.enabled defaults false so only the deterministic in-process baseline runs. - Add DeepScanConfig{Enabled, FetchPackageSource, DisableNoNewPrivileges, Scanners} with swaggertype tags; add nil-safe effective accessors (IsDeepScanEnabled, DeepScanScanners, EffectiveFetchPackageSource, IsDisableNoNewPrivileges). - Remove the orphaned auto_scan_quarantined key (ignored on load if present). - migrateDeepScanConfig folds the deprecated top-level keys into deep_scan.* on every load/hot-reload (wired in initializeRegistries) and clears the legacy keys so a re-serialized config exposes only the new surface. Idempotent. - Round-trip + ignore-removed-key tests. Related: Spec 077 (specs/077-scanner-simplification) * feat(security): opt-in deep scan that never blocks or degrades the baseline Demote the Docker scanner plugins + source extraction to an opt-in "deep scan" layer (Spec 077 US3). Off by default: only the deterministic in-process baseline runs and no Docker is invoked; a deep-scan failure is surfaced informationally and never changes the baseline verdict. - engine: gate Docker (non-in-process) scanners on deep_scan.enabled via deepScanAllowed; when off, resolveScanners drops them entirely so no container runs and no failure can degrade the verdict. Optional per-scanner allow-list. - service: SetDeepScan runtime knob; populate DeepScanDescriptor {enabled,ran,available,scanners_failed} from per-scanner job statuses, classifying baseline (in-process) vs deep scanners. - service: remove degradeIfIncompleteCoverage — the scan verdict is now derived solely from baseline findings (FR-008/FR-014); a failed deep scanner no longer downgrades a clean baseline to "degraded". - server: wire security.deep_scan.* into the scanner service; read the no-new-privileges / fetch-package-source knobs via the effective accessors so migrated configs resolve to the unified surface. - Point user-facing hints/logs at the new deep_scan.* config keys. - Tests: deep-scan-off-by-default (baseline only, no Docker) and deep-scan failure leaves the baseline verdict unchanged with a populated descriptor; update the MCP-34.4 Docker-scanner tests to enable deep scan and the former MCP-2401 degrade tests to assert the baseline-only verdict. Related: Spec 077 (specs/077-scanner-simplification) * docs(api): regenerate OpenAPI for security.deep_scan config surface Regenerate oas/swagger.yaml + oas/docs.go for the new config.DeepScanConfig schema and the removed auto_scan_quarantined key. make swagger-verify passes. Related: Spec 077 (specs/077-scanner-simplification) * feat(web): surface opt-in deep scan; render deep-scan gaps as info Spec 077 US3 Web UI: present deep scan as an opt-in affordance and render a failed/unavailable deep scanner as informational, never an error — the baseline verdict is authoritative. - Security.vue: info banner clarifying the deterministic baseline is always on and the Docker scanners are an opt-in deep scan that never blocks/degrades it. - ScanReport.vue: DeepScanDescriptor info block (alert-info) listing unavailable deep scanners with an explicit "does not affect the baseline verdict" note. - api.ts: DeepScanDescriptor type + optional deep_scan on the report. Related: Spec 077 (specs/077-scanner-simplification) * fix(security): gate source extraction on deep_scan + wire deep-scan report banner Address adversarial review findings on Spec 077 US3 (PR #789): - #1 (MUST): published-package-source extraction is now part of the opt-in deep-scan layer. server.go computes the effective fetch as IsDeepScanEnabled() && fetch_package_source (default true), and SetDeepScan(false) force-disables the resolver's fetch fallback as defense-in-depth. Deep scan OFF (default) => no npx/uvx source-fetch network egress. Added TestServiceDeepScanGatesPackageSourceFetch and updated the doc comments to match. - #2 (MUST): the deep-scan report banner was dead code — AggregatedReport had no DeepScan field, so /scan/report and /scans/{jobId}/report never emitted deep_scan. Added DeepScan *DeepScanDescriptor (json deep_scan, omitempty) to AggregatedReport and populate it in GetScanReport and GetScanReportByJobID (the live report-page path). - #3 (LOW): buildDeepScanDescriptor now inspects Pass-1 AND Pass-2 scanner statuses (variadic jobs, deduped) so heavy trivy/supply-chain scanner failures are reflected in scanners_failed/availability. Baseline verdict logic and the quarantine state machine are unchanged. Related: Spec 077 * fix(security): confine source resolution + Pass 2 to the opt-in deep-scan layer Spec 077 US3 promised that with deep scan OFF (the default) only the deterministic in-process baseline runs — no Docker, no source extraction, no network egress. Scanner EXECUTION was already gated, but source RESOLUTION passes still ran. Close the three gaps: - Finding #1 (HIGH): gate Pass-1 sourceResolver.Resolve on deepScanEnabled(). With deep scan off, source resolution is skipped entirely (the baseline scans tool DEFINITIONS, not source files, so resolved source is unused) — no Docker container lookup/extraction and no package-source fetch. - Finding #2 (HIGH): gate the Pass-2 auto-start (startPass2 → ResolveFullSource, the heavy Docker/full-source pass) on deepScanEnabled() in addition to the existing not-dry-run/not-URL conditions. Off ⇒ Pass-1 baseline only. - Finding #3 (MEDIUM): add deep_scan (DeepScanDescriptor: enabled/ran/ available/scanners_failed) to contracts.SecurityScanSummary, populate it in the server.go enricher adapter (was dropped), remove stale "degraded" references from the contract comments, and regenerate OpenAPI. Tests: SourceResolver gains atomic Resolve/ResolveFullSource call counters; new service-level tests assert neither runs with deep scan off (baseline still settles a deterministic verdict) and both run with deep scan on; new server adapter test asserts deep_scan is carried onto the wire summary when populated and omitted when off. Invariants preserved: isBlockingFinding stays tier-driven; no degradeIfIncompleteCoverage; determinism/offline baseline unchanged. Related: Spec 077 * fix(security): hot-reload deep_scan config + migrate legacy keys on apply (Spec 077 US3) Codex round-2 findings on PR #793: Finding #1 (HIGH) — deep_scan changes now hot-reload. DetectConfigChanges compares Config.Security (deep_scan.{enabled,fetch_package_source, disable_no_new_privileges,scanners} + deprecated top-level keys) so a lone security.deep_scan.* edit is reported as a change instead of "no changes detected". The server subscribes to config.reloaded (both file-edit and /api/v1/config/apply emit it) and re-runs the scanner wiring via a new Service.ApplySecurityConfig, so toggling deep scan takes effect without a restart. Startup and reload now configure the scanner through the same call. Finding #2 (MEDIUM) — /api/v1/config/apply now normalizes identically to LoadFromFile. ApplyConfig runs config.MigrateDeepScanConfig on the submitted config before diffing/saving, folding the deprecated security.scanner_fetch_package_source / scanner_disable_no_new_privileges keys into security.deep_scan (auto_scan_quarantined has no struct field, dropped at decode). An API apply carrying the deprecated keys now saves only the unified deep_scan surface (SC-007). Tests: DetectConfigChanges deep-scan detection; Service.ApplySecurityConfig reconfigures a live service without restart (incl. legacy-key fallback); ApplyConfig legacy-key migration asserted on the saved file + runtime config. Constraints respected: no new dependency; tool_quarantine.go untouched; no US1-removed behavior reintroduced; determinism/offline preserved. Related: Spec 077 * fix(security): deep-scan trust fixes — FR-014 baseline-only verdict, always-on deep_scan descriptor, enable-time hint Three verified audit findings against the Spec 077 US3 deep-scan layer (re-checked against head 7e1d51a before fixing): FIX 1 (nil-Security gating) — already fixed at HEAD by a25ae2f/7e1d51ad: source resolution + Pass 2 are gated on deepScanEnabled() at the call sites and ApplySecurityConfig(nil) forces the layer off via nil-safe accessors. This commit adds defense-in-depth (the server wiring now calls ApplySecurityConfig unconditionally, even when Config itself is nil) and a regression test pinning the exact audit scenario: config.DefaultConfig() never initializes Config.Security, and that nil block must still yield deep-scan OFF, no Docker scanners resolved, package-source fetch OFF. FIX 2 (FR-014 verdict purity) — the "warnings" level was driven by ThreatLevel across ALL findings, so a tierless deep-scan/external finding at threat_level=warning flipped a clean baseline to "warnings", while a tierless threat_level=dangerous finding fell into the Info bucket (an inversion: LESS effect than warning). GetScanSummary now derives the verdict at EVERY level solely from baseline (tiered) findings: dangerous requires >=1 hard-tier finding (unchanged predicate, shared with the approval gate), warnings requires >=1 warning-level baseline (soft) finding. Tierless findings still surface in FindingCounts — a tierless dangerous now counts at warning prominence instead of info — and in the merged report/RiskScore (FR-009..FR-012 consensus weighting untouched), but they never move the verdict. FIX 3 (silent Docker-scanner skip): (a) buildDeepScanDescriptor no longer returns nil when the layer is off; it always emits {enabled:false, ran:false, skipped_scanners:[ids of enabled-but-skipped Docker scanners]}, making quickstart scenario 1 (deep_scan.enabled=false) actually observable in scan/report JSON. Field added to scanner + contracts descriptors, REST/SSE projection, frontend TS type; OpenAPI regenerated. (b) POST /security/scanners/{id}/enable now returns a "hint" when a Docker-based scanner is enabled while security.deep_scan.enabled is false, and `mcpproxy security enable <id>` prints it ("scanner enabled, but it will not run until security.deep_scan.enabled=true"), plus help-text. SecurityController grew DeepScanEnabled() (implemented by scanner.Service already). Assumptions documented (zero-interruption policy): - Tierless threat_level=dangerous findings are bucketed as Warning in FindingCounts (inform-without-gating prominence) so counts cannot contradict the tier-driven Dangerous count / verdict. - SkippedScanners lists installed/configured (i.e. enabled) non-in-process scanners only; merely "available" scanners are not "enabled" and are not listed. - TestGetScanSummaryBothPasses and the descriptor-omitted assertions encoded the pre-FR-014 behavior and were updated to the spec-mandated contract. Tests: new regression tests for all three fixes (scanner service, httpapi handler, CLI hint extraction); go test -race across security/server/ config/httpapi/contracts/cmd green; golangci-lint v2 clean; frontend vue-tsc clean; scripts/test-api-e2e.sh 65/65. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(security): FR-014 verdict purity on the report page — share the tier-driven verdict with the report payload Codex review follow-up. The server-list summary (GetScanSummary) derived the verdict tier-driven/baseline-only (FR-014), but the report page badge still read the RAW threat-level ReportSummary — with deep scan on, a tierless Docker-scanner finding classified "dangerous" made the report say dangerous while the server list said clean, and the same finding counted as Warning on one surface and Dangerous on the other. - extract deriveBaselineVerdict() as the single source of truth, shared by GetScanSummary and AggregateReports so the two surfaces structurally cannot disagree - AggregatedReport gains Verdict + tier-driven FindingCounts (additive; raw Summary counts retained for transparency) - ScanReport.vue: status badge, threat tiles, Quarantine-button gate and the Approve-disable gate (hasUnresolvedCritical) now read the tier-driven verdict/finding_counts (raw summary only as a fallback for pre-Spec-077 payloads); the approve gate now mirrors the backend hard-tier-only isBlockingFinding instead of raw summary.critical - ServerDetail.vue: approve-confirmation count and the Security-tab summary strip prefer tier-driven finding_counts over the raw report summary, so a tierless deep-scan "dangerous" finding no longer triggers the "Dangerous Findings Detected" modal on a clean baseline - frontend/src/types/api.ts: SecurityScanSummary catches up with the wire contract (deep_scan, scanners_run/failed/total); SecurityScanReport gains verdict + finding_counts Tests: engine-level AggregateReports verdict matrix (tierless-dangerous → clean/warning-bucket, hard → dangerous, soft → warnings) and an end-to-end pin that GetScanReportByJobID.Verdict/FindingCounts equal GetScanSummary for the same scan data. Assumption documented (zero-interruption policy): ServerCard.vue was reported as sharing the raw-summary preference but already reads only the tier-driven security_scan.finding_counts — left unchanged. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * docs(security): Spec 077 truth sweep — detect engine is sole in-process detector, deep_scan opt-in Reconcile the security docs with the post-077 code: - tool-scanner.md: remove the deleted legacy-TPA-rule coexistence section; the detect engine is now the sole in-process detector. Count the actually- registered checks — seven (four hard incl. phrase.injection, three soft) — and fix the front-matter, "The seven checks", and at-a-glance table. - security-scanner-plugins.md: drop the legacy-rules claim + six-check count from the tpa-descriptions row; document the security.deep_scan block as the primary config surface; remove the deprecated auto_scan_quarantined / scanner_fetch_package_source docs; note the deep-scan gate + enable hint. - docker-isolation.md (+ top-level copy): the sandbox/none Docker-scanner skip no longer downgrades to security_scan.status:"degraded" (FR-008); it surfaces via the always-emitted deep_scan descriptor. Replace the deprecated scanner_disable_no_new_privileges instruction with deep_scan.disable_no_new_privileges. - security-quarantine.md: replace the deleted keyword-heuristic Detection Patterns table + Security Analysis JSON with the real quarantine-block shape and a pointer to the seven-check detect engine. - configuration.md: add the security.deep_scan config block + migration note. Related #793 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * docs(spec-077): reconcile tasks.md checkboxes with shipped reality US1 (#786), US2 (#792), US4 (#794), and US3/deepscan-fixes (this branch) are all merged. Check off every demonstrably-done task (verified against code + git log) including the T037-T039 docs sweep. Left unchecked: T005 (contract schemas were never copied to internal/security/scanner/testdata/ — tests validate behavior directly), and the final validation gates T040-T042. Tally: 38/42. Regenerate ROADMAP.md (077 now 38/42, in-flight). Related #793 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * test(server): close bolt DB before TempDir cleanup (Windows unlink failure) TestScanSummaryEnricherAdapterCarriesDeepScan passed its assertions but failed on windows-latest: t.TempDir cleanup cannot unlink config.db while the bbolt handle is open. setupTestStorage registers no close; add an explicit t.Cleanup in this test's seed helper. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>

…ne, connect-trust/upgrade-nudge progress (#803) check-github now passes with 0 errors: scanner-simplification epic complete (#786/#792/#793/#794 incl. deep-scan trust fixes + docs sweep); connect-trust US1 preview (#802) + backup visibility (#799) done; upgrade-nudge status/log slice (#798) split out as done with the banner+config remainder tracked separately; telemetry machine_id client (#796) and hygiene check-github (#800) done. Remaining warnings are the known windows-tray no-PR-evidence items. Co-authored-by: Claude Fable 5 <noreply@anthropic.com>

Dumbris added 3 commits July 2, 2026 07:55

Dumbris mentioned this pull request Jul 2, 2026

Spec 077 follow-up: phrase-injection heuristic false-positive long tail + secret-path sync #795

Open

Dumbris merged commit 204b4bc into main Jul 2, 2026
50 checks passed

Dumbris deleted the 077-us2-unified-report branch July 2, 2026 06:56

Dumbris mentioned this pull request Jul 2, 2026

feat(roadmap): gen-roadmap --check-github ground-truth validation #800

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(security): US2 unified report + cross-scanner consensus confidence (Spec 077)#792

feat(security): US2 unified report + cross-scanner consensus confidence (Spec 077)#792
Dumbris merged 6 commits into
mainfrom
077-us2-unified-report

Dumbris commented Jul 2, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented Jul 2, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Jul 2, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jul 2, 2026 •

edited

Loading

Uh oh!

mcpproxy-gatekeeper Bot commented Jul 2, 2026

Uh oh!

mcpproxy-gatekeeper Bot commented Jul 2, 2026

Uh oh!

mcpproxy-gatekeeper Bot commented Jul 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Dumbris commented Jul 2, 2026

Spec 077 US2 — Unified report + cross-scanner consensus confidence

What this adds (on top of merged US1)

Rebase conflict resolution

Verification (actual output)

Uh oh!

cloudflare-workers-and-pages Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying mcpproxy-docs with Cloudflare Pages

Uh oh!

codecov-commenter commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📦 Build Artifacts

Available Artifacts

How to Download

Uh oh!

mcpproxy-gatekeeper Bot commented Jul 2, 2026

🤖 Codex cross-model review — PR #792 US2 (round 1, read-only)

Findings

Verified safe

Uh oh!

mcpproxy-gatekeeper Bot commented Jul 2, 2026

🤖 Codex cross-model review — PR #792 US2 (round 2, read-only)

Findings

Verified safe

Uh oh!

mcpproxy-gatekeeper Bot commented Jul 2, 2026

🤖 Codex cross-model review — PR #792 US2 (round 3, read-only)

Findings

Verified safe

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cloudflare-workers-and-pages Bot commented Jul 2, 2026 •

edited

Loading

codecov-commenter commented Jul 2, 2026 •

edited

Loading

github-actions Bot commented Jul 2, 2026 •

edited

Loading