Reduce memory accumulation for long-running scans by egibs · Pull Request #1017 · chainguard-dev/malcontent

egibs · 2025-06-27T01:06:22Z

This PR is a targeted attempt at mitigating an edge-case where long-running scans of millions of files will eventually OOM a system if only the top-level directory is provided as a scan path.

This isn't an all-encompassing or holistic fix but helps quite a bit and there are still improvements such as streaming paths/results which we can make if this is not sufficient.

The main improvements are the usage of channels/pinning instead of a sync.Pool for the scanners which are relatively volatile (at high levels of concurrency, the sync.Pool GC would make them panic, for instance) along with leveraging file descriptors for scanning files. go-yara supported this natively, but we have to get a bit creative with yara-x if we want to avoid reading every single file into memory via io.ReadAll or the current implementation. The downside is that we still need the file's contents to calculate its hash and pull out the match strings. If necessary, these can be optimized in a future PR.

I ran several scans of ~14 million files each on a VM with 512GB of RAM and did not OOM or panic once, though I did cap out at about ~410GB of memory usage. If all else fails, we can drop the scanner pool and run single-use scanners via yrs.Scan which is much slower because of the scanner creation/destruction overhead but also has relatively little memory impact.

Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>

eslerm

VM with 512GB of RAM and did not OOM or panic once

🤯

Small memory optimizations

b0dce31

Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>

egibs requested review from antitree and eslerm June 27, 2025 01:06

eslerm approved these changes Jun 27, 2025

View reviewed changes

egibs merged commit 17a726c into chainguard-dev:main Jun 27, 2025
12 checks passed

egibs mentioned this pull request Jul 1, 2025

Clean up Generate function in report.go #992

Merged

egibs deleted the memory-pressure branch July 2, 2025 23:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce memory accumulation for long-running scans#1017

Reduce memory accumulation for long-running scans#1017
egibs merged 1 commit into
chainguard-dev:mainfrom
egibs:memory-pressure

egibs commented Jun 27, 2025 •

edited

Loading

Uh oh!

eslerm left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

egibs commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eslerm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

egibs commented Jun 27, 2025 •

edited

Loading