Skip to content

Persist compiled rules locally rather than in-memory#1122

Merged
stevebeattie merged 2 commits into
chainguard-dev:mainfrom
egibs:persist-cached-rules
Sep 11, 2025
Merged

Persist compiled rules locally rather than in-memory#1122
stevebeattie merged 2 commits into
chainguard-dev:mainfrom
egibs:persist-cached-rules

Conversation

@egibs

@egibs egibs commented Sep 10, 2025

Copy link
Copy Markdown
Member

We currently persist compiled rules in-memory which only works for the duration of a single mal invocation. In cases where we want to run successive mal scans (usually when looping over specific files or directories to produce per-scan result files), we run into rule compilation overhead which takes at least several seconds per run which can be extremely slow when done dozens of times.

This PR instead stores rules locally in the user's cache directory using the compiled rule hash. This file is read from each time mal is run and will only be recreated if it does not exist.

I also added tests and benchmarks so we can validate this works and is faster (which it is by a factor of 10-12x):

# go test -v ./pkg/compile/...
=== RUN   TestRecursive
--- PASS: TestRecursive (4.32s)
=== RUN   TestGetRulesHash
    compile_test.go:102: Rules hash: 868c8cdf7ef4fe0636048239ba7a45c4f4bc1ff61ec3d1725d7400cda6b08988
--- PASS: TestGetRulesHash (0.02s)
=== RUN   TestCacheOperations
--- PASS: TestCacheOperations (4.78s)
=== RUN   TestRecursiveCached
    compile_test.go:161: First compilation (cache miss) took: 4.471298854s
    compile_test.go:175: Second compilation (cache hit) took: 369.288203ms
    compile_test.go:181: Cache speedup: 12.1x faster
--- PASS: TestRecursiveCached (4.85s)
=== RUN   TestRecursiveCachedFallback
--- PASS: TestRecursiveCachedFallback (0.32s)
=== RUN   TestGetCacheDir
    compile_test.go:227: Cache directory: /root/.cache/malcontent
--- PASS: TestGetCacheDir (0.00s)
=== RUN   TestCacheFileSize
    compile_test.go:261: Cache file: /tmp/TestCacheFileSize1908032901/001/rules-868c8cdf7ef4fe0636048239ba7a45c4f4bc1ff61ec3d1725d7400cda6b08988.cache
    compile_test.go:262: Cache file size: 63494871 bytes (60.55 MB)
--- PASS: TestCacheFileSize (4.28s)
PASS
ok      github.com/chainguard-dev/malcontent/pkg/compile        18.601s
 # go test -bench=. ./pkg/compile/
goos: linux
goarch: arm64
pkg: github.com/chainguard-dev/malcontent/pkg/compile
BenchmarkRecursive-12                                  1        4659537986 ns/op
BenchmarkRecursiveCachedFirstRun-12                    1        4291621075 ns/op
BenchmarkRecursiveCachedSubsequentRuns-12              4         317808442 ns/op
BenchmarkGetRulesHash-12                             127           9397723 ns/op
BenchmarkCacheOperations/Save-12                      13          96535091 ns/op
BenchmarkCacheOperations/Load-12                       4         314001626 ns/op
BenchmarkCompareCompilation/Uncached-12                1        4260990381 ns/op
BenchmarkCompareCompilation/CachedFirstRun-12                  1        4484801731 ns/op
BenchmarkCompareCompilation/CachedSubsequentRuns-12            3         337301952 ns/op
PASS
ok      github.com/chainguard-dev/malcontent/pkg/compile        67.588s

BenchmarkRecursive takes ~4.6 seconds whereas BenchmarkRecursiveCachedSubsequentRuns takes ~.32 seconds.

@egibs egibs requested a review from stevebeattie September 10, 2025 15:07
Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>
@egibs egibs force-pushed the persist-cached-rules branch from 1b2afbb to 124fd1e Compare September 10, 2025 15:11

@stevebeattie stevebeattie left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh, this is very nice! Thanks for this, it's very helpful for repetitive, investigative scans.

@stevebeattie stevebeattie merged commit fe2b5f5 into chainguard-dev:main Sep 11, 2025
12 checks passed
@egibs egibs deleted the persist-cached-rules branch October 31, 2025 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants