Skip to content

♻️ Extract SplitArchiveReader to unify memmap/streaming dispatch#2704

Merged
ChanTsune merged 1 commit into
mainfrom
cli/refactor-split-archive-reader
Feb 8, 2026
Merged

♻️ Extract SplitArchiveReader to unify memmap/streaming dispatch#2704
ChanTsune merged 1 commit into
mainfrom
cli/refactor-split-archive-reader

Conversation

@ChanTsune
Copy link
Copy Markdown
Owner

@ChanTsune ChanTsune commented Feb 7, 2026

Introduce SplitArchiveReader in cli/src/command/core/archive_source.rs to encapsulate #[cfg(feature = "memmap")] branching into a single location.

The struct provides three methods (transform_entries, for_each_entry, for_each_read_entry) with dual cfg'd implementations, replacing the 6-line mmap boilerplate that was duplicated across 11 command files.

  • Migrated: delete, chmod, chown, strip, migrate, acl, xattr, diff, sort, list
  • Removed: run_list_archive_mem, non-memmap run_entries (dead code)
  • Made private: run_transform_entry (only used by archive_source.rs)

Summary by CodeRabbit

  • Refactor
    • Unified archive processing across ACL, chmod, chown, delete, diff, list, migrate, sort, strip, xattr, stdio, and update commands. This streamlines entry reading and transformation, improves lifecycle and resource handling, and reduces platform-specific code paths. No user-facing options changed; command behavior and output remain the same while reliability and maintainability are improved.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 7, 2026

📝 Walkthrough

Walkthrough

Introduces a new SplitArchiveReader abstraction and refactors many CLI commands to use it, replacing direct archive/memmap handling and prior public run_entries/run_transform_entry exports with source-based iteration and transformation APIs across the CLI command modules.

Changes

Cohort / File(s) Summary
New Archive Source Abstraction
cli/src/command/core/archive_source.rs
Adds SplitArchiveReader (feature-gated memmap vs file) with constructor and methods: for_each_entry, for_each_read_entry, and transform_entries.
Core module & re-exports
cli/src/command/core.rs
Adds mod archive_source and pub(crate) use ...::SplitArchiveReader; makes run_transform_entry private and adjusts signatures; removes some public run_entries wrappers.
Transforming commands
cli/src/command/acl.rs, cli/src/command/chmod.rs, cli/src/command/chown.rs, cli/src/command/delete.rs, cli/src/command/migrate.rs, cli/src/command/strip.rs, cli/src/command/xattr.rs
Replace direct archive/mmap handling with SplitArchiveReader::new(...); use source.transform_entries(...) for UnSolid/KeepSolid branches; remove memmap setup/teardown and drop source for cleanup.
Entry iteration & listing
cli/src/command/diff.rs, cli/src/command/list.rs, cli/src/command/sort.rs
Unify memmap/non-memmap flows to SplitArchiveReader.for_each_entry / for_each_read_entry; remove memmap-specific public interfaces (e.g., run_list_archive_mem) and collect/process via the reader.
Update / stdio integration
cli/src/command/update.rs, cli/src/command/stdio.rs
Change run_update_archive to accept &mut SplitArchiveReader; call sites construct SplitArchiveReader and use for_each_read_entry; remove memmap variables and teardown.
Imports & public export adjustments
multiple cli/src/command/*.rs
Add imports for SplitArchiveReader from crate::command::core; remove or replace external uses of run_entries and public run_transform_entry where applicable.

Sequence Diagram(s)

sequenceDiagram
  participant CLI as CLI Command
  participant SAR as SplitArchiveReader
  participant Core as core helpers
  participant FS as Files/Mmap

  CLI->>SAR: SplitArchiveReader::new(collect_split_archives(...))
  CLI->>SAR: source.for_each_entry(password, processor) / transform_entries(...)
  SAR->>FS: iterate files or mmaps (cfg-specific) 
  SAR->>Core: call run_read_entries* / run_transform_entry(...)
  Core->>SAR: return processed entries/results
  SAR->>CLI: yield entries / write transformed output
  CLI->>SAR: drop(source)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 I hopped through bytes and mmap dreams,
Replaced the scattered with tidy streams,
One reader now guides each entry's flight,
I nibble bugs and tidy the night,
A happy hop — code neat and bright!

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main refactoring objective: extracting SplitArchiveReader to centralize and unify the memmap/streaming dispatch logic that was previously scattered across multiple command files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cli/refactor-split-archive-reader

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @ChanTsune, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the archive processing logic within the CLI by introducing a SplitArchiveReader struct. This new abstraction encapsulates the conditional compilation (#[cfg(feature = "memmap")]) required for handling archives either through memory-mapping or standard file I/O. By centralizing this logic, the PR eliminates substantial code duplication across numerous command implementations, leading to a cleaner, more maintainable, and easier-to-understand codebase for archive operations.

Highlights

  • New Abstraction: Introduced SplitArchiveReader to centralize #[cfg(feature = "memmap")] conditional logic for archive processing, encapsulating the differences between memory-mapped and streaming archive access.
  • Unified Interface: The SplitArchiveReader provides a consistent set of methods (transform_entries, for_each_entry, for_each_read_entry) for interacting with archive entries, abstracting away the underlying reading mechanism.
  • Code Simplification: Replaced repetitive 6-line mmap boilerplate that was duplicated across 11 command files with calls to the new SplitArchiveReader methods, significantly reducing code duplication.
  • Refactored Commands: Commands such as delete, chmod, chown, strip, migrate, acl, xattr, diff, sort, and list have been updated to utilize this new abstraction.
  • Code Cleanup: Removed run_list_archive_mem and non-memmap run_entries (identified as dead code), and made run_transform_entry private as it's now an internal helper for archive_source.rs.
Changelog
  • cli/src/command/acl.rs
    • Updated imports to include SplitArchiveReader.
    • Replaced direct memory-map handling and run_entries calls with SplitArchiveReader::new() and source.for_each_entry().
    • Migrated run_transform_entry calls to source.transform_entries().
    • Removed drop(mmaps) in favor of drop(source).
  • cli/src/command/chmod.rs
    • Updated imports to include SplitArchiveReader.
    • Replaced direct memory-map handling and run_transform_entry calls with SplitArchiveReader::new() and source.transform_entries().
    • Removed drop(mmaps) in favor of drop(source).
  • cli/src/command/chown.rs
    • Updated imports to include SplitArchiveReader.
    • Replaced direct memory-map handling and run_transform_entry calls with SplitArchiveReader::new() and source.transform_entries().
    • Removed drop(mmaps) in favor of drop(source).
  • cli/src/command/core.rs
    • Added archive_source module.
    • Exported SplitArchiveReader from the archive_source module.
    • Changed visibility of run_transform_entry from pub(crate) to fn (private).
    • Removed the run_entries function for non-memmap configurations.
  • cli/src/command/core/archive_source.rs
    • New file created to define the SplitArchiveReader struct.
    • Implemented new constructor to handle memmap feature branching for initializing internal mmaps or files.
    • Implemented transform_entries, for_each_entry, and for_each_read_entry methods, each with #[cfg(feature = "memmap")] and #[cfg(not(feature = "memmap"))] implementations to abstract archive entry processing.
  • cli/src/command/delete.rs
    • Updated imports to include SplitArchiveReader.
    • Replaced direct memory-map handling and run_transform_entry calls with SplitArchiveReader::new() and source.transform_entries().
    • Removed drop(mmaps) in favor of drop(source).
  • cli/src/command/diff.rs
    • Removed direct imports for run_entries and run_process_archive.
    • Updated imports to include SplitArchiveReader.
    • Replaced direct memory-map handling and run_entries calls with SplitArchiveReader::new() and source.for_each_entry().
  • cli/src/command/list.rs
    • Removed import for run_read_entries_mem.
    • Updated imports to include SplitArchiveReader.
    • Refactored list_archive function to use SplitArchiveReader::new() and source.for_each_read_entry() for processing archive entries.
    • Removed the now-unused run_list_archive_mem function.
  • cli/src/command/migrate.rs
    • Updated imports to include SplitArchiveReader.
    • Replaced direct memory-map handling and run_transform_entry calls with SplitArchiveReader::new() and source.transform_entries().
    • Removed drop(mmaps) in favor of drop(source).
  • cli/src/command/sort.rs
    • Updated imports to include SplitArchiveReader.
    • Replaced direct memory-map handling and run_entries calls with SplitArchiveReader::new() and source.for_each_entry().
    • Removed drop(mmaps) in favor of drop(source).
  • cli/src/command/strip.rs
    • Updated imports to include SplitArchiveReader.
    • Replaced direct memory-map handling and run_transform_entry calls with SplitArchiveReader::new() and source.transform_entries().
    • Removed drop(mmaps) in favor of drop(source).
  • cli/src/command/xattr.rs
    • Updated imports to include SplitArchiveReader.
    • Replaced direct memory-map handling and run_entries calls with SplitArchiveReader::new() and source.for_each_entry().
    • Migrated run_transform_entry calls to source.transform_entries().
    • Removed drop(mmaps) in favor of drop(source).
Activity
  • No specific human activity (comments, reviews, progress updates) is mentioned in the provided context.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request does a great job of refactoring the archive reading logic by introducing the SplitArchiveReader. This significantly reduces boilerplate code across multiple command files and unifies the handling for memmap and streaming reads. The changes are clean and follow the stated goal. I've found a couple of areas for improvement regarding the consistency of the new abstraction and some potential dead code.

I am having trouble creating individual review comments. Click here to see my feedback.

cli/src/command/core/archive_source.rs (33-52)

high

The use of self.files.drain(..) makes this method destructive for the not(feature = "memmap") case, meaning it can only be called once. This is inconsistent with the memmap version of this function, which is reusable. This could lead to unexpected behavior if the reader is used multiple times when the memmap feature is disabled.

To ensure consistent behavior, the method should be made non-destructive. This can be achieved by iterating over references to the files (.iter()) and seeking to the beginning of each file before use.

You'll need to bring std::io::Seek into scope to use file.seek(...).

This change should also be applied to for_each_entry and for_each_read_entry for consistency.

    pub(crate) fn transform_entries<W, F, S>(
        &mut self,
        writer: W,
        password: Option<&[u8]>,
        processor: F,
        strategy: S,
    ) -> anyhow::Result<()>
    where
        W: io::Write,
        F: FnMut(io::Result<NormalEntry>) -> io::Result<Option<NormalEntry>>,
        S: TransformStrategy,
    {
        for file in &mut self.files {
            file.seek(io::SeekFrom::Start(0))?;
        }
        super::run_transform_entry(
            writer,
            self.files.iter(),
            || password,
            processor,
            strategy,
        )
    }

cli/src/command/list.rs (519-530)

medium

After this refactoring, the run_list_archive function appears to be unused. The logic has been moved directly into the list_archive function using the new SplitArchiveReader. Consider removing run_list_archive to clean up dead code.

@github-actions github-actions Bot added the cli This issue is about cli application label Feb 7, 2026
@ChanTsune ChanTsune force-pushed the cli/refactor-split-archive-reader branch from d656d42 to bccd204 Compare February 7, 2026 06:35
Introduce SplitArchiveReader in cli/src/command/core/archive_source.rs
to encapsulate #[cfg(feature = "memmap")] branching into a single location.

The struct provides three methods (transform_entries, for_each_entry,
for_each_read_entry) with dual cfg'd implementations, replacing the
6-line mmap boilerplate that was duplicated across 11 command files.

- Migrated: delete, chmod, chown, strip, migrate, acl, xattr, diff, sort, list, update, stdio
- Unified: run_update_archive from two cfg'd copies into one (takes SplitArchiveReader)
- Removed: run_list_archive_mem, non-memmap run_entries, run_read_entries/run_read_entries_mem imports in update.rs
- Made private: run_transform_entry (only used by archive_source.rs)
@ChanTsune ChanTsune force-pushed the cli/refactor-split-archive-reader branch from bccd204 to 33c165b Compare February 7, 2026 10:58
@ChanTsune ChanTsune merged commit 74989f3 into main Feb 8, 2026
111 of 112 checks passed
@ChanTsune ChanTsune deleted the cli/refactor-split-archive-reader branch February 8, 2026 00:28
@coderabbitai coderabbitai Bot mentioned this pull request Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cli This issue is about cli application

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant