Skip to content

Kafka Connect: Add Trivy CVE scan to CI#15430

Open
rmoff wants to merge 4 commits intoapache:mainfrom
rmoff:trivy-cve-scan-kafka-connect
Open

Kafka Connect: Add Trivy CVE scan to CI#15430
rmoff wants to merge 4 commits intoapache:mainfrom
rmoff:trivy-cve-scan-kafka-connect

Conversation

@rmoff
Copy link
Copy Markdown
Contributor

@rmoff rmoff commented Feb 24, 2026

Summary

  • Adds a Trivy vulnerability scan to the Kafka Connect CI workflow
  • Runs as part of the existing test job (on one matrix entry only, to avoid redundant scans — dependency CVEs are JVM-independent) — after check completes, it builds distZip, unpacks it, and scans the bundled jars for CRITICAL/HIGH CVEs
  • On push events (main, version branches, RC tags), uploads SARIF results to GitHub's Security tab
  • On PRs, outputs scan results to CI logs for developer visibility
  • Fails the step, but since the job is not a required status check, it will not block merging PRs.

Context

Discussion on dev@ mailing list: https://lists.apache.org/thread/kbf98950pzstzgon92st7mh9vrbv5yhb

Confluent Marketplace requires a Trivy scan before listing connectors. This has previously caught CVEs that needed patching (e.g. #14985). Running the scan in CI catches vulnerabilities during development and — critically — on RC tags before the release vote starts, when fixes can still be applied.

This is independent of #15212 (adding the KC artifact to the release process) and can land in either order.

@github-actions github-actions Bot added the INFRA label Feb 24, 2026
@rmoff rmoff marked this pull request as draft February 24, 2026 14:12
@rmoff rmoff marked this pull request as ready for review February 24, 2026 14:36
@kevinjqliu
Copy link
Copy Markdown
Contributor

i think its better to do this in the docker image publishing step, similar to what Superset is doing

@rmoff
Copy link
Copy Markdown
Contributor Author

rmoff commented Feb 26, 2026

@kevinjqliu I'm not sure I follow. How's this relate to publishing docker images? This is to identify CVEs in the Kafka Connect connector itself. thanks.

@kevinjqliu
Copy link
Copy Markdown
Contributor

Looking at how Trivy is used in Superset, its scanning the docker image only. In this PR, we're unpacking the jars to scan. I think it makes more sense to build a kafka connect image and then use Trivy to scan the image.

Just my preference

@rymurr
Copy link
Copy Markdown
Contributor

rymurr commented Feb 27, 2026

Looking at how Trivy is used in Superset, its scanning the docker image only. In this PR, we're unpacking the jars to scan. I think it makes more sense to build a kafka connect image and then use Trivy to scan the image.

Just my preference

Not really clear why we would add the infra to build a docker image just to use trivy. Seems like a direct scan is more parsimonious

Copy link
Copy Markdown
Contributor

@rymurr rymurr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I just want to see if we can avoid 2x build w/o over complicating the CI job

Comment thread .github/workflows/kafka-connect-ci.yml Outdated
Comment thread .github/workflows/kafka-connect-ci.yml
@kevinjqliu
Copy link
Copy Markdown
Contributor

Not really clear why we would add the infra to build a docker image just to use trivy. Seems like a direct scan is more parsimonious

Just looking at general patterns from the apache repos. https://grep.app/search?f.repo.pattern=apache%2F&q=uses%3A+aquasecurity%2Ftrivy-action
All the use cases I can find are using trivy for scanning images. (using image-ref: )

The only instance of scan-type: 'fs' i found has it disabled https://github.com/apache/plc4x/blob/eb41533bfab101acb87b9acdaf81c70d2e2fa286/.github/workflows/sast.yaml#L34-L47

From the docs, it seems like Filesystem scan and Container image scan are similar in that they both scan for Vulnerabilities, Misconfigurations, Secrets, and Licenses.

I think it would be helpful here if you can run the change on your fork repo and see if the fs trivy scan catches the currently open CVE for kafka connect (#15440)

@rmoff
Copy link
Copy Markdown
Contributor Author

rmoff commented Mar 2, 2026

I think it would be helpful here if you can run the change on your fork repo and see if the fs trivy scan catches the currently open CVE for kafka connect (#15440)

That's what's shown here under "Reproducing" section.

@rmoff rmoff requested a review from rymurr March 17, 2026 09:44
Copy link
Copy Markdown
Contributor

@rymurr rymurr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes themselves look good to me. I am just confused/questioning the github actions config...

My assumption is we want:

  • results to be uploaded regardless of pass/fail states of steps above
  • the job to fail if it detects errors
  • the 2nd scan to run regardless of if the other failed
    I think whats happening now is the first always looks like it fails, the second always looks like it passes and the entire job always looks like it passes.

It might be worth trying this a few times in your own fork to make sure this actaully has the edge cases we expect.

Comment thread .github/workflows/kafka-connect-ci.yml Outdated
Copy link
Copy Markdown
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I think this is a great addition, however I'm concerned about supply chain risk due to the recent (second) compromise of trivy's infra.

See

I see we are using aquasecurity/trivy-action@e368e328979b113139d6f9068e03accaed98a518 . Glad to see we're using the commit hash.

I believe this is still ongoing; we should wait until theres resolution to proceed with this PR.

EDIT: looks like trivy was just pulled from the list of allowed actions by asf-infra apache/infrastructure-actions#548
https://infra.apache.org/blog/trivy_security_incident.html

@lhotari
Copy link
Copy Markdown
Member

lhotari commented Mar 23, 2026

EDIT: looks like trivy was just pulled from the list of allowed actions by asf-infra apache/infrastructure-actions#548
https://infra.apache.org/blog/trivy_security_incident.html

@kevinjqliu PR to add it: apache/infrastructure-actions#573

@kevinjqliu
Copy link
Copy Markdown
Contributor

apache/infrastructure-actions#582 is merged
we can try using lhotari/sandboxed-trivy-action, which is a fork running trivy in a locked-down docker container (https://github.com/lhotari/sandboxed-trivy-action)

ASF Infra has already allowlisted it

@rmoff
Copy link
Copy Markdown
Contributor Author

rmoff commented Apr 7, 2026

apache/infrastructure-actions#582 is merged we can try using lhotari/sandboxed-trivy-action, which is a fork running trivy in a locked-down docker container (https://github.com/lhotari/sandboxed-trivy-action)

ASF Infra has already allowlisted it

Thanks @kevinjqliu.
I've raised a PR to add rootfs support to the action so that we can use it: lhotari/sandboxed-trivy-action#1
edit: now merged, waiting on apache/infrastructure-actions#711
edit edit: apache/infrastructure-actions#718 now merged, so we can use f01374b6cc3bf7264ab238293e94f6db7ada6dd0 # v1.0.2 of the trivy action which includes rootfs.

@rmoff
Copy link
Copy Markdown
Contributor Author

rmoff commented Apr 15, 2026

OK, I think this PR is ready for folk to take another look please.
It uses the locked-down Trivy action (h/t @lhotari), which has been allowlisted by ASF.

How it works:

  • Builds distZip, unpacks it, and scans the JARs using rootfs mode
  • Runs on JVM 21 only — dependency CVEs are JVM-independent, so one scan is sufficient
  • Flag and fail: exit-code 1 fails the step and job when CVEs are found. Since this job is not a required check, it won't block merging.
  • On push to main/release branches: uploads SARIF to the GitHub Security tab
  • On PRs: prints findings to the CI log (SARIF upload skipped — GitHub only accepts results from default/protected branches)
  • ignore-unfixed: true — only CVEs with available fixes are reported

Here are the execution paths, expected behaviours, and test evidence:

# Event CVEs present? Trivy step SARIF upload Job Test run
1 pull_request No Green (exit 0) Skipped Green apache #24563211070
2 pull_request Yes Red (exit 1) Skipped Red (non-blocking) apache #24562251735
3 push No Green (exit 0) Uploads empty SARIF Green fork #24563271199
4 push Yes Red (exit 1) Uploads SARIF Red (non-blocking) fork #24562836884

N.B. To test "no CVEs present" condition, known CVEs were temporarily suppressed via .trivyignore.

Add a Trivy vulnerability scan to the Kafka Connect CI workflow that
scans bundled JARs for known CVEs with available fixes.

The scan runs after the existing check task on JVM 21 only (dependency
CVEs are JVM-independent). It builds distZip, unpacks it, and scans
using rootfs mode via lhotari/sandboxed-trivy-action (ASF-allowlisted).

Behaviour:
- Flag, don't block: continue-on-error keeps the job green when CVEs
  are found. The step shows an orange warning icon in the GitHub UI.
- On push to main/release branches: SARIF results are uploaded to the
  GitHub Security tab for ongoing tracking.
- On PRs: SARIF upload is skipped (GitHub only accepts results from
  default/protected branches). Findings are visible in the CI log.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@rmoff rmoff force-pushed the trivy-cve-scan-kafka-connect branch from 9d21bb3 to 4720573 Compare April 15, 2026 17:01
@rmoff rmoff force-pushed the trivy-cve-scan-kafka-connect branch from ed906dd to 0264d75 Compare April 15, 2026 17:34
@rmoff rmoff requested review from kevinjqliu and rymurr April 15, 2026 18:10
Comment thread .github/workflows/kafka-connect-ci.yml Outdated
# Behaviour:
# - Flag, don't block: the scan step uses exit-code 1 so it
# "fails" when CVEs are found, but continue-on-error keeps
# the overall job green. GitHub Actions shows the step with
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is true. Perhaps a hallucination.

My suggestion would be to invert the flow.

  1. remove continue-on-error and exit-code from the first run.
  2. change the if statement to always() && matrix.jvm==21

This job will fail adn show 'red' in the UI but its a non-blocking check and the PR can still be merged. the always clause ensures that the results are always uploaded and printed regardless of the state of the scan.

I think this has the desired ux: the user knows they have a CVE and why but it isn't blocking for the PR (eg if the patched version hasn';t been released yet)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks @rymurr. Sounds like a good option.
Out of interest where can I see which checks will block a merge and which won't?

Re. exit-code, don't we still want this to be 1, otherwise the step won't fail (it defaults to 0) and show the job as red?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I've made this change now, and updated the above detail & testing to reflect it.
@rymurr please could you take another look?

@rmoff rmoff force-pushed the trivy-cve-scan-kafka-connect branch from c7265f0 to 6d19226 Compare April 17, 2026 11:40
@rmoff rmoff force-pushed the trivy-cve-scan-kafka-connect branch from 6d19226 to d1b2ca2 Compare April 17, 2026 13:15
@rmoff rmoff requested a review from rymurr April 17, 2026 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants