Skip to content

ci: add vulnerability and SAST scanning to CI/CD#6441

Open
otavio wants to merge 9 commits into
masterfrom
feat/cra-ci-security-scanning
Open

ci: add vulnerability and SAST scanning to CI/CD#6441
otavio wants to merge 9 commits into
masterfrom
feat/cra-ci-security-scanning

Conversation

@otavio

@otavio otavio commented Jun 8, 2026

Copy link
Copy Markdown
Member

What

Adds vulnerability and SAST scanning across the CI/CD pipeline —
govulncheck, Trivy image scanning, CodeQL, and Semgrep — and replaces
the blanket gosec exclusions with evidence-based, per-rule decisions.
This is the CRA "no known exploitable vulnerabilities" control
(Phase 1).

Why

Closes shellhub-io/team#99 (part of the CRA compliance tracking,
#96). Today scanning is limited to Dependabot (updates only) and
gosec via golangci-lint with four blanket exclusions — no Go vuln-DB
check, no container image scan, no SAST.

Approach: baseline + block-new

Hard-blocking every CRITICAL/HIGH on day one would red-wall unrelated
PRs, because shared base images (nginx/alpine/node) and the Go stdlib
accrue fixable CVEs daily that no PR author can fix. Instead, scans
block on findings a PR introduces; pre-existing/unfixable CVEs are
suppressed via checked-in baselines and tracked by a weekly full scan.

Changes

  • security.yml (new): govulncheck over all 8 Go module roots
    (build tags for agent/ssh) via a unit-tested allowlist wrapper in
    .github/govulncheck/, plus Trivy image scans for every shipped image
    (api, ssh, gateway, cli, ui, agent). A single always-running
    security-gate aggregator (if: always(), success+skipped = pass) is
    the one stable required-check name — no draft/fork pending-check
    trap.
  • codeql.yml / semgrep.yml (new): per-module CodeQL with
    non-empty-DB assertions; Semgrep with --baseline-ref origin/master.
    Both fail only on new alerts; SARIF upload is fork-guarded.
  • docker-publish.yml: reworked to scan-before-push,
    digest-identical — nothing vulnerable reaches a registry.
    build-agent.yml gates the multi-arch push on an amd64 Trivy scan.
  • gosec: exclusions reduced to G104 only (redundant with the
    deliberately-disabled errcheck). Reviewed the G301/G302/G304 sites:
    hardened gateway dir perms to 0o750, annotated trusted-source /
    test-fixture / utmp (world-readable by POSIX design) cases. gateway
    is now linted (added to qa.yml with its own config).
  • Baselines: .trivyignore, .govulncheck-allow.txt,
    .semgrepignore + CODEOWNERS, and SECURITY-SCANNING.md documenting
    the scan inventory, suppression conventions, and rollout.

Rollout (important — these scans must NOT be flipped to blocking yet)

SECURITY-SCANNING.md documents the required order: (1) land these
workflows non-blocking and let them run to completion on master
to populate the CodeQL/Semgrep baselines and seed the ignore/allow
files; (2) seed the baseline files from that run; (3) flip to
blocking-on-new; (4) a repo admin marks the single security-gate
check required
in branch protection (plus "Require review from Code
Owners" for the baseline files). Until step 4, the jobs report but do
not gate merges.

Testing

  • Go changes are covered by tests: gateway dir-mode (0o750), the
    govulncheck wrapper (called vs imported-only, duplicate-id, stale-id,
    empty), and agent path-handling. Run via ./bin/docker-compose.
  • All workflows pass actionlint. CodeQL's multi-module manual build,
    the live Trivy/Semgrep/CodeQL runs, and the multi-arch agent buildx
    can only be validated on the first real Actions run on this branch —
    which is exactly why the rollout lands them non-blocking first.
    Reviewers: confirm the first security.yml run traces all 8 modules
    and that security-gate reaches a definite conclusion on a draft and
    a fork PR before anyone marks it required.

@otavio otavio requested review from a team as code owners June 8, 2026 15:48
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown

Code Review

  • Gathered PR context
  • Reviewing with 5 specialized agents
  • Posting feedback

View job run

@github-advanced-security

Copy link
Copy Markdown

You are seeing this message because GitHub Code Scanning has recently been set up for this repository, or this pull request contains the workflow file for the Code Scanning tool.

What Enabling Code Scanning Means:

  • The 'Security' tab will display more code scanning analysis results (e.g., for the default branch).
  • Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results.
  • You will be able to see the analysis results for the pull request's branch on this overview once the scans have completed and the checks have passed.

For more information about GitHub Code Scanning, check out the documentation.

otavio added 5 commits June 8, 2026 18:27
Add govulncheck, Trivy image scanning, CodeQL and Semgrep to the
pipeline to satisfy CRA's "no known exploitable vulnerabilities"
requirement. Scans block on findings a PR introduces (baseline +
block-new): pre-existing/unfixable CVEs are tracked by a weekly full
scan rather than red-walling unrelated PRs.

- security.yml: govulncheck over all 8 module roots (build tags for
  agent/ssh) via a unit-tested allowlist wrapper, plus Trivy image
  scans for every shipped image; a single always-running security-gate
  aggregator is the one stable required-check name (no draft/fork
  pending-check trap).
- codeql.yml / semgrep.yml: per-module CodeQL with non-empty-DB
  assertions, Semgrep with --baseline-ref; both fail only on new alerts.
- docker-publish.yml: scan-before-push, digest-identical (nothing
  vulnerable reaches a registry). build-agent.yml gates the multi-arch
  push on an amd64 Trivy scan.
- gosec: reduce exclusions to G104 only (errcheck-redundant); review the
  G301/G302/G304 sites, hardening gateway dir perms to 0o750 and
  annotating trusted-source/test-fixture/utmp cases. gateway is now
  linted (added to qa.yml with its own config).
- Baseline files (.trivyignore, .govulncheck-allow.txt, .semgrepignore)
  + CODEOWNERS, and SECURITY-SCANNING.md documenting the required check
  and the non-blocking-first rollout.

Fixes: shellhub-io/team#99
First real Actions run surfaced four plumbing bugs (no real findings):

- trivy-action pinned to a non-existent tag; the action's tags are
  v-prefixed (0.31.0 -> v0.31.0). Affected security, docker-publish and
  build-agent workflows.
- govulncheck wrapper invoked via `go run` on a path in the separate
  .github module, which failed to resolve against the scanned module.
  Build the wrapper binary from its own module first, then pipe into it.
- CodeQL uploaded SARIF twice per job (analyze + a manual filter step)
  with a colliding category. Drop the redundant upload; code scanning
  surfaces new-only alerts on PRs natively.
- Semgrep baseline fetch failed (exit 128) on dubious-ownership/shallow
  checkout. Use fetch-depth 0, mark the workspace safe, and diff against
  the PR base SHA instead of origin/master.
Replace the bespoke govulncheck wrapper (a custom JSON parser plus a
homegrown allowlist file in its own .github Go module) with the
official golang/govulncheck-action emitting SARIF to GitHub code
scanning — the same model CodeQL and Semgrep already use here.

New-vs-baseline gating and suppression are now handled natively by the
code-scanning platform (dismiss with a reason in the Security tab,
audited) instead of a checked-in allowlist parsed by custom code, which
was a maintenance and security-control bypass surface. Build-tag
coverage for agent (docker) and ssh (internal_api) is forwarded via
GOFLAGS. Drops ~600 lines of bespoke Go, its tests, the extra go.mod,
and .govulncheck-allow.txt.
The previous commit deleted the wrapper files but a failed `git add`
pathspec dropped the security.yml/CODEOWNERS/docs edits, leaving the
job pointing at the now-deleted wrapper. This commits the actual
govulncheck-action + SARIF-upload job body and the doc/CODEOWNERS
updates.
- semgrep: --baseline-ref is not a valid flag; use --baseline-commit
  with the PR base SHA so only new findings fail the PR.
- qa: adding gateway to the validate matrix ran its integration smoke
  test (TestMain_smoke shells out to certbot) which can't pass in CI.
  Revert that and add a dedicated gateway-lint job so gosec/golangci
  enforcement stays without running gateway's integration tests.
@otavio otavio force-pushed the feat/cra-ci-security-scanning branch from 6c85875 to d04f1a2 Compare June 8, 2026 21:28
otavio added 4 commits June 8, 2026 18:44
Pass github.ref_name through an env var instead of interpolating it
directly into the run: shell, per the semgrep run-shell-injection rule.
aquasecurity/trivy-action@v0.31.0's bundled installer fails downloading
the trivy binary in CI (exits 1 before any scan), so every image scan
failed at install — not on findings. Run the pinned official trivy
container (ghcr.io/aquasecurity/trivy:0.63.0) against the locally loaded
image via the docker socket, which installs nothing and pulls its DB
from mirror.gcr.io. Same flags (CRITICAL,HIGH, ignore-unfixed,
.trivyignore) and SARIF upload preserved across security, docker-publish
and build-agent workflows.

Verified locally: api/ssh/cli images scan clean (0 CRITICAL/HIGH).
Trivy flags a HIGH libxml2 DoS (CVE-2026-6732) inherited from
nginx:1.31.1-alpine. Upgrade the package to the fixed 2.13.9-r1 in the
production stage. Verified: ui image scans 0 CRITICAL/HIGH afterward.
Bump github.com/docker/docker to the 28.x backport commit that carries
the CVE-2026-34040 fix (v29.3.1's fix backported to 28.x at 31a1689;
v29+ is unreachable via the github.com/docker/docker module path since
moby moved to github.com/moby/moby/v2). Build, tests and lint pass with
no call-site changes — the agent uses docker/docker only as a client.

Trivy still maps the fix to the non-existent v29.3.1 tag, so the binary
is documented in .trivyignore with the backport justification (the
agent never runs the vulnerable daemon AuthZ middleware). Verified:
agent image scans 0 CRITICAL/HIGH.
@otavio otavio requested a review from a team as a code owner June 8, 2026 22:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants