Skip to content

Reduce memory accumulation for long-running scans#1017

Merged
egibs merged 1 commit into
chainguard-dev:mainfrom
egibs:memory-pressure
Jun 27, 2025
Merged

Reduce memory accumulation for long-running scans#1017
egibs merged 1 commit into
chainguard-dev:mainfrom
egibs:memory-pressure

Conversation

@egibs
Copy link
Copy Markdown
Member

@egibs egibs commented Jun 27, 2025

This PR is a targeted attempt at mitigating an edge-case where long-running scans of millions of files will eventually OOM a system if only the top-level directory is provided as a scan path.

This isn't an all-encompassing or holistic fix but helps quite a bit and there are still improvements such as streaming paths/results which we can make if this is not sufficient.

The main improvements are the usage of channels/pinning instead of a sync.Pool for the scanners which are relatively volatile (at high levels of concurrency, the sync.Pool GC would make them panic, for instance) along with leveraging file descriptors for scanning files. go-yara supported this natively, but we have to get a bit creative with yara-x if we want to avoid reading every single file into memory via io.ReadAll or the current implementation. The downside is that we still need the file's contents to calculate its hash and pull out the match strings. If necessary, these can be optimized in a future PR.

I ran several scans of ~14 million files each on a VM with 512GB of RAM and did not OOM or panic once, though I did cap out at about ~410GB of memory usage. If all else fails, we can drop the scanner pool and run single-use scanners via yrs.Scan which is much slower because of the scanner creation/destruction overhead but also has relatively little memory impact.

Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>
@egibs egibs requested review from antitree and eslerm June 27, 2025 01:06
Copy link
Copy Markdown
Contributor

@eslerm eslerm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VM with 512GB of RAM and did not OOM or panic once

🤯

@egibs egibs merged commit 17a726c into chainguard-dev:main Jun 27, 2025
12 checks passed
@egibs egibs deleted the memory-pressure branch July 2, 2025 23:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants