Skip to content

fix(parquet): check compression codec availability#656

Open
wgtmac wants to merge 2 commits into
apache:mainfrom
wgtmac:codex/parquet-codec-availability
Open

fix(parquet): check compression codec availability#656
wgtmac wants to merge 2 commits into
apache:mainfrom
wgtmac:codex/parquet-codec-availability

Conversation

@wgtmac
Copy link
Copy Markdown
Member

@wgtmac wgtmac commented May 18, 2026

Summary

  • Check Parquet compression codec availability with Arrow before opening the file writer.
  • Return a clear InvalidArgument when a requested Parquet codec is not built into Arrow.
  • Add a regression test that exercises an unavailable codec in the current Arrow build.

Test Plan

  • cmake --build build --target parquet_test data_test && ctest --test-dir build -R 'parquet_test|data_test' --output-on-failure
  • git diff --check

Copy link
Copy Markdown
Collaborator

@zhjwpku zhjwpku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

} else {
return InvalidArgument("Unsupported Parquet compression codec: {}", compression_name);
}
return compression;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unreachable

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's reachable. The returns were changed to assignments except the final else branch. But I have no idea if that's a better style :-)

Copy link
Copy Markdown
Member

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor nit! Thanks @wgtmac

::arrow::Compression::type compression;
};

std::optional<ParquetCodec> UnavailableParquetCodec() {
Copy link
Copy Markdown
Member

@raulcd raulcd May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM should this function return the list of unavailable codecs instead of the first one? Otherwise I feel like the name could be slightly misleading, maybe FirstUnavailableParquetCodec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants