ci: add vulnerability and SAST scanning to CI/CD#6441
Open
otavio wants to merge 9 commits into
Open
Conversation
|
You are seeing this message because GitHub Code Scanning has recently been set up for this repository, or this pull request contains the workflow file for the Code Scanning tool. What Enabling Code Scanning Means:
For more information about GitHub Code Scanning, check out the documentation. |
Add govulncheck, Trivy image scanning, CodeQL and Semgrep to the pipeline to satisfy CRA's "no known exploitable vulnerabilities" requirement. Scans block on findings a PR introduces (baseline + block-new): pre-existing/unfixable CVEs are tracked by a weekly full scan rather than red-walling unrelated PRs. - security.yml: govulncheck over all 8 module roots (build tags for agent/ssh) via a unit-tested allowlist wrapper, plus Trivy image scans for every shipped image; a single always-running security-gate aggregator is the one stable required-check name (no draft/fork pending-check trap). - codeql.yml / semgrep.yml: per-module CodeQL with non-empty-DB assertions, Semgrep with --baseline-ref; both fail only on new alerts. - docker-publish.yml: scan-before-push, digest-identical (nothing vulnerable reaches a registry). build-agent.yml gates the multi-arch push on an amd64 Trivy scan. - gosec: reduce exclusions to G104 only (errcheck-redundant); review the G301/G302/G304 sites, hardening gateway dir perms to 0o750 and annotating trusted-source/test-fixture/utmp cases. gateway is now linted (added to qa.yml with its own config). - Baseline files (.trivyignore, .govulncheck-allow.txt, .semgrepignore) + CODEOWNERS, and SECURITY-SCANNING.md documenting the required check and the non-blocking-first rollout. Fixes: shellhub-io/team#99
First real Actions run surfaced four plumbing bugs (no real findings): - trivy-action pinned to a non-existent tag; the action's tags are v-prefixed (0.31.0 -> v0.31.0). Affected security, docker-publish and build-agent workflows. - govulncheck wrapper invoked via `go run` on a path in the separate .github module, which failed to resolve against the scanned module. Build the wrapper binary from its own module first, then pipe into it. - CodeQL uploaded SARIF twice per job (analyze + a manual filter step) with a colliding category. Drop the redundant upload; code scanning surfaces new-only alerts on PRs natively. - Semgrep baseline fetch failed (exit 128) on dubious-ownership/shallow checkout. Use fetch-depth 0, mark the workspace safe, and diff against the PR base SHA instead of origin/master.
Replace the bespoke govulncheck wrapper (a custom JSON parser plus a homegrown allowlist file in its own .github Go module) with the official golang/govulncheck-action emitting SARIF to GitHub code scanning — the same model CodeQL and Semgrep already use here. New-vs-baseline gating and suppression are now handled natively by the code-scanning platform (dismiss with a reason in the Security tab, audited) instead of a checked-in allowlist parsed by custom code, which was a maintenance and security-control bypass surface. Build-tag coverage for agent (docker) and ssh (internal_api) is forwarded via GOFLAGS. Drops ~600 lines of bespoke Go, its tests, the extra go.mod, and .govulncheck-allow.txt.
The previous commit deleted the wrapper files but a failed `git add` pathspec dropped the security.yml/CODEOWNERS/docs edits, leaving the job pointing at the now-deleted wrapper. This commits the actual govulncheck-action + SARIF-upload job body and the doc/CODEOWNERS updates.
- semgrep: --baseline-ref is not a valid flag; use --baseline-commit with the PR base SHA so only new findings fail the PR. - qa: adding gateway to the validate matrix ran its integration smoke test (TestMain_smoke shells out to certbot) which can't pass in CI. Revert that and add a dedicated gateway-lint job so gosec/golangci enforcement stays without running gateway's integration tests.
6c85875 to
d04f1a2
Compare
Pass github.ref_name through an env var instead of interpolating it directly into the run: shell, per the semgrep run-shell-injection rule.
aquasecurity/trivy-action@v0.31.0's bundled installer fails downloading the trivy binary in CI (exits 1 before any scan), so every image scan failed at install — not on findings. Run the pinned official trivy container (ghcr.io/aquasecurity/trivy:0.63.0) against the locally loaded image via the docker socket, which installs nothing and pulls its DB from mirror.gcr.io. Same flags (CRITICAL,HIGH, ignore-unfixed, .trivyignore) and SARIF upload preserved across security, docker-publish and build-agent workflows. Verified locally: api/ssh/cli images scan clean (0 CRITICAL/HIGH).
Trivy flags a HIGH libxml2 DoS (CVE-2026-6732) inherited from nginx:1.31.1-alpine. Upgrade the package to the fixed 2.13.9-r1 in the production stage. Verified: ui image scans 0 CRITICAL/HIGH afterward.
Bump github.com/docker/docker to the 28.x backport commit that carries the CVE-2026-34040 fix (v29.3.1's fix backported to 28.x at 31a1689; v29+ is unreachable via the github.com/docker/docker module path since moby moved to github.com/moby/moby/v2). Build, tests and lint pass with no call-site changes — the agent uses docker/docker only as a client. Trivy still maps the fix to the non-existent v29.3.1 tag, so the binary is documented in .trivyignore with the backport justification (the agent never runs the vulnerable daemon AuthZ middleware). Verified: agent image scans 0 CRITICAL/HIGH.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

What
Adds vulnerability and SAST scanning across the CI/CD pipeline —
govulncheck, Trivy image scanning, CodeQL, and Semgrep — and replaces
the blanket
gosecexclusions with evidence-based, per-rule decisions.This is the CRA "no known exploitable vulnerabilities" control
(Phase 1).
Why
Closes shellhub-io/team#99 (part of the CRA compliance tracking,
#96). Today scanning is limited to Dependabot (updates only) and
gosecvia golangci-lint with four blanket exclusions — no Go vuln-DBcheck, no container image scan, no SAST.
Approach: baseline + block-new
Hard-blocking every CRITICAL/HIGH on day one would red-wall unrelated
PRs, because shared base images (nginx/alpine/node) and the Go stdlib
accrue fixable CVEs daily that no PR author can fix. Instead, scans
block on findings a PR introduces; pre-existing/unfixable CVEs are
suppressed via checked-in baselines and tracked by a weekly full scan.
Changes
govulncheckover all 8 Go module roots(build tags for
agent/ssh) via a unit-tested allowlist wrapper in.github/govulncheck/, plus Trivy image scans for every shipped image(api, ssh, gateway, cli, ui, agent). A single always-running
security-gateaggregator (if: always(), success+skipped = pass) isthe one stable required-check name — no draft/fork pending-check
trap.
non-empty-DB assertions; Semgrep with
--baseline-ref origin/master.Both fail only on new alerts; SARIF upload is fork-guarded.
digest-identical — nothing vulnerable reaches a registry.
build-agent.yml gates the multi-arch push on an amd64 Trivy scan.
G104only (redundant with thedeliberately-disabled errcheck). Reviewed the G301/G302/G304 sites:
hardened gateway dir perms to
0o750, annotated trusted-source /test-fixture / utmp (world-readable by POSIX design) cases.
gatewayis now linted (added to
qa.ymlwith its own config)..trivyignore,.govulncheck-allow.txt,.semgrepignore+ CODEOWNERS, andSECURITY-SCANNING.mddocumentingthe scan inventory, suppression conventions, and rollout.
Rollout (important — these scans must NOT be flipped to blocking yet)
SECURITY-SCANNING.mddocuments the required order: (1) land theseworkflows non-blocking and let them run to completion on
masterto populate the CodeQL/Semgrep baselines and seed the ignore/allow
files; (2) seed the baseline files from that run; (3) flip to
blocking-on-new; (4) a repo admin marks the single
security-gatecheck required in branch protection (plus "Require review from Code
Owners" for the baseline files). Until step 4, the jobs report but do
not gate merges.
Testing
0o750), thegovulncheck wrapper (called vs imported-only, duplicate-id, stale-id,
empty), and agent path-handling. Run via
./bin/docker-compose.actionlint. CodeQL's multi-module manual build,the live Trivy/Semgrep/CodeQL runs, and the multi-arch agent buildx
can only be validated on the first real Actions run on this branch —
which is exactly why the rollout lands them non-blocking first.
Reviewers: confirm the first
security.ymlrun traces all 8 modulesand that
security-gatereaches a definite conclusion on a draft anda fork PR before anyone marks it required.