Skip to content

Reject non-archive inputs on archive entry points (#77)#163

Merged
otavio merged 2 commits into
masterfrom
issue-77-reject-non-archives
Apr 23, 2026
Merged

Reject non-archive inputs on archive entry points (#77)#163
otavio merged 2 commits into
masterfrom
issue-77-reject-non-archives

Conversation

@otavio
Copy link
Copy Markdown
Member

@otavio otavio commented Apr 23, 2026

Summary

  • Closes Option to disable "raw" libarchive format #77. libarchive's "raw" format handler matches arbitrary bytes, so callers of list_archive_files / uncompress_archive / ArchiveIterator couldn't distinguish real archives from plain files — it would yield a single entry called data. This PR stops registering the raw handler on the archive code paths and bumps the crate to 0.16.0 for the breaking change.
  • Commit 1 (feat!: require real archives on archive entry points) removes the raw handler registration across list_archive_files, list_archive_entries, uncompress_archive, uncompress_archive_file, ArchiveIterator (and their _with_encoding and async variants). uncompress_data is unchanged — handling raw compressed streams is its purpose. Adds ArchiveIteratorBuilder::raw_format(true) as an opt-in escape hatch for callers that relied on the old permissive behavior.
  • Commit 2 (feat: expose ArchiveIteratorBuilder::mtree_format opt-out) adds a matching opt-out for the mtree half. libarchive's mtree handler also matches free-form text (a plain gunzip'd text file is enough), but it remains registered by default to preserve libarchive's behavior on the top-level entry points. Strict callers iterating with ArchiveIterator can pass .mtree_format(false) to reject entries whose format base mask indicates mtree. Exposes archive_format + ARCHIVE_FORMAT_BASE_MASK + ARCHIVE_FORMAT_MTREE in the FFI bindings (and the generate-ffi script so regeneration stays in sync).

Test plan

  • cargo test --features tokio_support,futures_support,async_support — 62 integration + 14 doctests pass
  • cargo clippy --features tokio_support,futures_support,async_support --tests -- -D warnings clean
  • cargo test --test disk_full_test passes
  • Regression tests cover both halves: list_archive_files_rejects_non_archive_bytes, uncompress_archive_rejects_non_archive_bytes, iterator_default_rejects_non_archive_bytes, iterator_raw_format_opt_in_accepts_non_archive_bytes, iterator_mtree_format_opt_out_rejects_gzip_text
  • CI status checks (all 5 required checks run on this branch)

- Stop registering libarchive's "raw" format handler in list_archive_files,
  list_archive_entries, uncompress_archive, uncompress_archive_file,
  ArchiveIterator, and their _with_encoding and async variants, so
  non-archive input now errors instead of yielding a single "data" entry.
- Add ArchiveIteratorBuilder::raw_format(bool) so callers that relied on
  the old permissive behavior can opt back in explicitly.
- Keep uncompress_data unchanged since handling raw compressed streams
  (gzip, xz, ...) is its purpose.
- Document the stricter behavior in the crate docs and CHANGES.md, update
  affected tests to opt in where needed, and bump the version to 0.16.0
  to reflect the breaking change.
@coveralls
Copy link
Copy Markdown

coveralls commented Apr 23, 2026

Coverage Report for CI Build 24838989502

Coverage increased (+0.5%) to 78.373%

Details

  • Coverage increased (+0.5%) from the base build.
  • Patch coverage: 8 uncovered changes across 2 files (86 of 94 lines covered, 91.49%).
  • 1 coverage regression across 1 file.

Uncovered Changes

File Changed Covered %
src/iterator.rs 43 39 90.7%
tests/integration_test.rs 51 47 92.16%

Coverage Regressions

1 previously-covered line in 1 file lost coverage.

File Lines Losing Coverage Coverage
src/async_support.rs 1 70.03%

Coverage Stats

Coverage Status
Relevant Lines: 2557
Covered Lines: 2004
Line Coverage: 78.37%
Coverage Strength: 88.45 hits per line

💛 - Coveralls

libarchive's mtree format handler is permissive — it matches free-form text such as a
plain gunzip'd text file and yields bogus entries for input that isn't really an mtree
specification. The default behavior preserves libarchive's output (including on the raw
entry points), but strict callers iterating with ArchiveIterator may want to reject the
match instead of acting on the invalid entries.

- Add ArchiveIteratorBuilder::mtree_format(bool); default true preserves libarchive's
  permissive behavior, false rejects entries whose archive_format base mask indicates
  ARCHIVE_FORMAT_MTREE
- Gate the rejection inside ArchiveIterator::unsafe_next_header on the new flag,
  delegating to a module-private reject_mtree_format helper only when the caller opted
  out
- Expose archive_format plus ARCHIVE_FORMAT_BASE_MASK / ARCHIVE_FORMAT_MTREE through the
  FFI bindings and generate-ffi script so the rejection can consult libarchive's format
  code
- Extend CHANGES.md and the crate-level docs to describe the iterator-only opt-out
- Add an integration test that feeds tests/fixtures/file.txt.gz through the iterator
  with mtree_format(false) and asserts that the surface errors out
@otavio otavio force-pushed the issue-77-reject-non-archives branch from acaeb31 to 8ae5b66 Compare April 23, 2026 13:48
@otavio otavio merged commit 715e03e into master Apr 23, 2026
27 checks passed
@otavio otavio deleted the issue-77-reject-non-archives branch April 23, 2026 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

Option to disable "raw" libarchive format

2 participants