Skip to content

Detect zstd compressed objects in automatic compression mode#6909

Open
bagmarnikhil wants to merge 1 commit into
opensearch-project:mainfrom
bagmarnikhil:fix/automatic-zstd-compression-detection
Open

Detect zstd compressed objects in automatic compression mode#6909
bagmarnikhil wants to merge 1 commit into
opensearch-project:mainfrom
bagmarnikhil:fix/automatic-zstd-compression-detection

Conversation

@bagmarnikhil
Copy link
Copy Markdown
Contributor

@bagmarnikhil bagmarnikhil commented Jun 5, 2026

Description

CompressionOption.fromFileName() backs the automatic compression option for the S3 source (both the SQS-notification and scan paths) and the S3 enrich processor. It only recognized .gz (GZIP) and .snappy (SNAPPY) and silently fell back to NONE
for everything else, so zstd-compressed objects were passed to the codec without decompression and failed to parse — even though ZSTD is a valid CompressionOption with a working ZstdDecompressionEngine already wired in.

This change maps the .zst (canonical, per the zstd CLI and RFC 8478) and .zstd (informal alternate spelling) extensions to CompressionOption.ZSTD, so automatic detection decompresses these objects correctly.

Behavior is unchanged unless a user explicitly configures compression: automatic; the shipped default is NONE. No new compression engine or dependency is introduced — the fix reuses the existing ZstdDecompressionEngine. Because the fix lives
in the shared common enum, all three consumers (s3-source SQS path, s3-source scan path, and s3-enrich-processor) are corrected at once.

Unit tests were added covering both the .zst and .zstd extensions.

Issues Resolved

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

CompressionOption.fromFileName() backs the "automatic" compression option
for the S3 source (both SQS-notification and scan paths) and the S3 enrich
processor. It only recognized ".gz" (GZIP) and ".snappy" (SNAPPY) and
silently fell back to NONE for everything else, so zstd-compressed objects
were passed to the codec without decompression and failed to parse — even
though ZSTD is a valid CompressionOption with a working
ZstdDecompressionEngine already wired in.

Map the ".zst" (canonical, per the zstd CLI and RFC 8478) and ".zstd"
(informal alternate spelling) extensions to CompressionOption.ZSTD so
automatic detection decompresses these objects correctly. Behavior is
unchanged unless a user explicitly configures compression: automatic.

Add unit tests covering both extensions.

Signed-off-by: Nikhil Bagmar <nikhilbagmar73@gmail.com>
Copy link
Copy Markdown

@kiran536 kiran536 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants