Skip to content

Address extraction edge cases re: duplicate file names#967

Merged
egibs merged 3 commits into
chainguard-dev:mainfrom
egibs:extraction-edge-cases
May 29, 2025
Merged

Address extraction edge cases re: duplicate file names#967
egibs merged 3 commits into
chainguard-dev:mainfrom
egibs:extraction-edge-cases

Conversation

@egibs
Copy link
Copy Markdown
Member

@egibs egibs commented May 29, 2025

More fallout from scanning Sonarqube.

There are five files of interest in this package:

./linux-x64-musl/node.xz
./linux-x64/node.xz
./win-x64/node.exe.xz
./darwin-arm64/node.xz
./sonarjs-1.0.0.tgz

Our original extraction logic would only process one of the extracted node files because only the file name (node) was stored in the extracted file map. Additionally, extracting the files would fail with legitimate EOF errors and this was also happening with the sonarjs-1.0.0.tgz file. There was also an opportunity to clean up the awkward extractedFiles map ranging and replace it with filepath.WalkDir.

This PR addresses this edge case, decompresses files in such a way to preserve their parent directory, and removes archive types from supportedKind which causes weird behavior. Instead, we can return a non-nil FileType if a given path's extension is present in the archive map.

The test files being erased is expected; these files are actually tar files for some reason (?) and removing tar from supportedKind means they aren't scanned directly.

Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>
@egibs egibs requested a review from eslerm May 29, 2025 01:48
Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@eslerm eslerm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

appreciate simplifications :)

Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>
@egibs egibs enabled auto-merge (squash) May 29, 2025 02:10
@egibs egibs merged commit 6a43323 into chainguard-dev:main May 29, 2025
12 checks passed
@egibs egibs deleted the extraction-edge-cases branch June 25, 2025 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants