Reject non-archive inputs on archive entry points (#77)#163
Merged
Conversation
- Stop registering libarchive's "raw" format handler in list_archive_files, list_archive_entries, uncompress_archive, uncompress_archive_file, ArchiveIterator, and their _with_encoding and async variants, so non-archive input now errors instead of yielding a single "data" entry. - Add ArchiveIteratorBuilder::raw_format(bool) so callers that relied on the old permissive behavior can opt back in explicitly. - Keep uncompress_data unchanged since handling raw compressed streams (gzip, xz, ...) is its purpose. - Document the stricter behavior in the crate docs and CHANGES.md, update affected tests to opt in where needed, and bump the version to 0.16.0 to reflect the breaking change.
Coverage Report for CI Build 24838989502Coverage increased (+0.5%) to 78.373%Details
Uncovered Changes
Coverage Regressions1 previously-covered line in 1 file lost coverage.
Coverage Stats
💛 - Coveralls |
libarchive's mtree format handler is permissive — it matches free-form text such as a plain gunzip'd text file and yields bogus entries for input that isn't really an mtree specification. The default behavior preserves libarchive's output (including on the raw entry points), but strict callers iterating with ArchiveIterator may want to reject the match instead of acting on the invalid entries. - Add ArchiveIteratorBuilder::mtree_format(bool); default true preserves libarchive's permissive behavior, false rejects entries whose archive_format base mask indicates ARCHIVE_FORMAT_MTREE - Gate the rejection inside ArchiveIterator::unsafe_next_header on the new flag, delegating to a module-private reject_mtree_format helper only when the caller opted out - Expose archive_format plus ARCHIVE_FORMAT_BASE_MASK / ARCHIVE_FORMAT_MTREE through the FFI bindings and generate-ffi script so the rejection can consult libarchive's format code - Extend CHANGES.md and the crate-level docs to describe the iterator-only opt-out - Add an integration test that feeds tests/fixtures/file.txt.gz through the iterator with mtree_format(false) and asserts that the surface errors out
acaeb31 to
8ae5b66
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
list_archive_files/uncompress_archive/ArchiveIteratorcouldn't distinguish real archives from plain files — it would yield a single entry calleddata. This PR stops registering the raw handler on the archive code paths and bumps the crate to 0.16.0 for the breaking change.feat!: require real archives on archive entry points) removes the raw handler registration acrosslist_archive_files,list_archive_entries,uncompress_archive,uncompress_archive_file,ArchiveIterator(and their_with_encodingand async variants).uncompress_datais unchanged — handling raw compressed streams is its purpose. AddsArchiveIteratorBuilder::raw_format(true)as an opt-in escape hatch for callers that relied on the old permissive behavior.feat: expose ArchiveIteratorBuilder::mtree_format opt-out) adds a matching opt-out for the mtree half. libarchive's mtree handler also matches free-form text (a plain gunzip'd text file is enough), but it remains registered by default to preserve libarchive's behavior on the top-level entry points. Strict callers iterating withArchiveIteratorcan pass.mtree_format(false)to reject entries whose format base mask indicates mtree. Exposesarchive_format+ARCHIVE_FORMAT_BASE_MASK+ARCHIVE_FORMAT_MTREEin the FFI bindings (and thegenerate-ffiscript so regeneration stays in sync).Test plan
cargo test --features tokio_support,futures_support,async_support— 62 integration + 14 doctests passcargo clippy --features tokio_support,futures_support,async_support --tests -- -D warningscleancargo test --test disk_full_testpasseslist_archive_files_rejects_non_archive_bytes,uncompress_archive_rejects_non_archive_bytes,iterator_default_rejects_non_archive_bytes,iterator_raw_format_opt_in_accepts_non_archive_bytes,iterator_mtree_format_opt_out_rejects_gzip_text