Skip to content

✨ Serial creation#2635

Merged
ChanTsune merged 3 commits into
mainfrom
cli/serial-creation
Jan 24, 2026
Merged

✨ Serial creation#2635
ChanTsune merged 3 commits into
mainfrom
cli/serial-creation

Conversation

@ChanTsune
Copy link
Copy Markdown
Owner

@ChanTsune ChanTsune commented Jan 22, 2026

Summary by CodeRabbit

  • New Features

    • Archive entries now reliably preserve the CLI-specified order across create, append, and update flows, even when files are processed concurrently; applies to both solid and split archives (including concatenation).
  • Tests

    • Added comprehensive integration tests verifying entry-order preservation across create, append, and update with multiple files/directories and solid/split modes.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 22, 2026

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Threads numeric indices through the entry-result pipeline, adds a ReorderByIndex iterator to reorder (idx, EntryResult) pairs, updates spawn/drain to emit/consume indexed results, and updates create/update code and tests to ensure deterministic archive entry ordering.

Changes

Cohort / File(s) Summary
Core ordering infra
cli/src/command/core.rs, cli/src/command/core/iter.rs
Add pub(crate) mod iter and re-export ReorderByIndex; implement ReorderByIndex<I,T> to reorder (usize, T) inputs; change spawn_entry_results to emit Receiver<(usize, EntryResult)>; update drain_entry_results to accept IntoIterator<Item=(usize, EntryResult)> and consume via ReorderByIndex.
Create command
cli/src/command/create.rs
Import and use ReorderByIndex::new(rx.into_iter()) for non-solid path to preserve CLI argument ordering when flattening entry results.
Update command
cli/src/command/update.rs
Enumerate target_items and propagate indices through creation/add paths; emit (idx, EntryResult) for worker outputs and skips; consume worker results via ReorderByIndex in both memmap and non-memmap flows.
Tests — entry-order suites
cli/tests/cli/create/entry_order.rs, cli/tests/cli/append/entry_order.rs, cli/tests/cli/update/entry_order.rs
Add integration tests asserting CLI-argument ordering, directory ordering, deterministic cross-run behavior, and solid/split interactions using fixtures with varying file sizes.
Test module declarations
cli/tests/cli/create.rs, cli/tests/cli/append.rs, cli/tests/cli/update.rs
Add mod entry_order; declarations to include the new test modules.
Manifest / test harness
Cargo.toml (implicit)
Test additions referenced by diffs; no public API changes reported.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI/Input
    participant Spawn as spawn_entry_results
    participant Chan as Channel
    participant Reorder as ReorderByIndex
    participant Drain as drain_entry_results
    participant Writer as ArchiveWriter

    Note over CLI,Writer: Indexed ordering pipeline
    CLI->>Spawn: enumerate items -> (idx, item)
    Spawn->>Chan: send (idx, EntryResult)
    Chan->>Reorder: stream of (idx, EntryResult)
    Reorder->>Reorder: buffer out-of-order entries (HashMap)
    Reorder->>Drain: emit EntryResult in ascending idx order
    Drain->>Writer: write deterministic entries
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • @ support #2523: Modifies the same entry-result pipeline (spawn_entry_results/drain_entry_results and ordering) to preserve input ordering via indexed emits and reordering.

Suggested labels

break

Poem

🐰 I hopped through indices, one by one,
I kept stray pieces till ordering was done,
I buffered the bustle in a tidy map,
Then released them in order — no more gap,
Hooray for orderly archives and fun! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title ':sparkles: Serial creation' is vague and uses an emoji with generic terminology that doesn't clearly convey what changed in the codebase. Replace with a more descriptive title that explains the main change, such as 'Preserve entry order in archive operations using ReorderByIndex iterator' or 'Add index-preserving iteration to maintain original processing order in create, update, and append commands'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @ChanTsune, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical aspect of archive creation and modification: ensuring deterministic ordering of entries. By introducing the OrderedByIndex iterator, the system can now maintain the original input order of files and directories when creating, appending to, or updating archives, regardless of the parallel processing execution path. This enhancement significantly improves the predictability and consistency of archive operations, which is vital for reproducibility and reliable data management.

Highlights

  • Introduced OrderedByIndex Iterator: A new generic iterator OrderedByIndex has been added to cli/src/command/core/iter.rs. This iterator reorders (usize, T) items to ensure they are processed in ascending index order, buffering out-of-order items.
  • Deterministic Entry Ordering: The spawn_entry_results function now emits (usize, EntryResult) tuples, and drain_entry_results consumes these using OrderedByIndex. This guarantees that entries are processed and added to archives in the order they were provided as input, even when parallel processing is involved.
  • Applied to create, append, and update commands: The OrderedByIndex mechanism has been integrated into the create, append, and update commands to ensure consistent entry ordering for both single files and directories, and also for split archives.
  • Comprehensive Test Coverage: New test modules (entry_order.rs) have been added for append, create, and update commands. These tests specifically validate that CLI argument order, walkdir traversal order, and multiple directory argument order are preserved in the resulting archives, even with varying file sizes designed to stress parallel processing.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions github-actions Bot added the cli This issue is about cli application label Jan 22, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an OrderedByIndex iterator to ensure entries are processed in the order they were initially provided, addressing potential issues with parallel processing disrupting the intended sequence. This change affects the create and update commands, as well as their corresponding tests, to guarantee consistent entry ordering.

Comment thread cli/src/command/core.rs
Comment thread cli/src/command/core.rs
Comment thread cli/src/command/update.rs
Comment thread cli/src/command/update.rs Outdated
Comment thread cli/src/command/update.rs
@ChanTsune
Copy link
Copy Markdown
Owner Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 22, 2026

✅ Actions performed

Full review triggered.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 22, 2026

Note

Docstrings generation - SUCCESS
Generated docstrings for this pull request at #2636

coderabbitai Bot added a commit that referenced this pull request Jan 22, 2026
Docstrings generation was requested by @ChanTsune.

* #2635 (comment)

The following files were modified:

* `cli/src/command/core.rs`
* `cli/src/command/core/iter.rs`
* `cli/src/command/create.rs`
* `cli/src/command/update.rs`
* `cli/tests/cli/append/entry_order.rs`
* `cli/tests/cli/create/entry_order.rs`
* `cli/tests/cli/update/entry_order.rs`
@github-actions github-actions Bot added the break API braking change label Jan 22, 2026
@ChanTsune
Copy link
Copy Markdown
Owner Author

ChanTsune added a commit that referenced this pull request Jan 23, 2026
* 📝 Add docstrings to `cli/serial-creation`

Docstrings generation was requested by @ChanTsune.

* #2635 (comment)

The following files were modified:

* `cli/src/command/core.rs`
* `cli/src/command/core/iter.rs`
* `cli/src/command/create.rs`
* `cli/src/command/update.rs`
* `cli/tests/cli/append/entry_order.rs`
* `cli/tests/cli/create/entry_order.rs`
* `cli/tests/cli/update/entry_order.rs`

* ♻️ Keep only iter.rs docstrings and add unit tests

- Revert docstring changes to core.rs, create.rs, update.rs, and test files
- Keep OrderedByIndex documentation in iter.rs
- Replace doc examples with actual unit tests in mod tests

---------

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: ChanTsune <41658782+ChanTsune@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@cli/src/command/update.rs`:
- Around line 539-543: target_files_mapping creation currently uses enumerate on
target_items which allows duplicate EntryName keys to overwrite earlier entries
and leave index gaps that break OrderedByIndex; deduplicate the input paths
before indexing or rebuild contiguous indices when constructing
target_files_mapping: ensure you iterate target_items preserving first
occurrence of each EntryName (e.g., track seen EntryName values and skip
duplicates) or collect into a Vec of unique (EntryName, item) and then
re-enumerate to assign contiguous indices before creating the IndexMap used by
OrderedByIndex; apply the same dedup/reindex approach to the other mappings
mentioned (the remaining-items and non-memmap cases) so no index gaps remain.
🧹 Nitpick comments (2)
cli/src/command/core/iter.rs (2)

76-85: Add a fail-fast signal when indices are missing.

If the underlying iterator ends while buffer still holds items, gaps cause silent drops. A debug assertion (or log) makes this easier to detect during development.

💡 Suggested guard
-            None => return None,
+            None => {
+                debug_assert!(
+                    self.buffer.is_empty(),
+                    "OrderedByIndex: missing index {} while buffered items remain",
+                    self.next
+                );
+                return None;
+            }

94-129: Optional: add a test for missing/duplicate indices.

Locking the documented gap/duplicate behavior would make future refactors safer.

Comment thread cli/src/command/update.rs
ChanTsune added a commit that referenced this pull request Jan 23, 2026
* 📝 Add docstrings to `cli/serial-creation`

Docstrings generation was requested by @ChanTsune.

* #2635 (comment)

The following files were modified:

* `cli/src/command/core.rs`
* `cli/src/command/core/iter.rs`
* `cli/src/command/create.rs`
* `cli/src/command/update.rs`
* `cli/tests/cli/append/entry_order.rs`
* `cli/tests/cli/create/entry_order.rs`
* `cli/tests/cli/update/entry_order.rs`

* ♻️ Keep only iter.rs docstrings and add unit tests

- Revert docstring changes to core.rs, create.rs, update.rs, and test files
- Keep OrderedByIndex documentation in iter.rs
- Replace doc examples with actual unit tests in mod tests

---------

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: ChanTsune <41658782+ChanTsune@users.noreply.github.com>
@ChanTsune ChanTsune force-pushed the cli/serial-creation branch from 438c91b to 407c9fb Compare January 23, 2026 07:24
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@cli/src/command/core/iter.rs`:
- Around line 15-19: The iterator currently buffers items whose index is less
than self.next and later rewinds to emit them, which breaks the ascending-order
guarantee; change the logic in the iterator implementation (the code paths that
handle incoming idx, check self.next, and insert into the buffer) to drop/ignore
any item where idx < self.next instead of buffering it, and update the
top-of-file doc comment to explicitly state that late duplicates are ignored
(last-wins only for items seen before emission) so consumers know no rewinding
occurs. Ensure references to self.next, the buffer insertion code, and any
emit/flush routines are adjusted to skip late indices and do not trigger
rewinding.

Comment thread cli/src/command/core/iter.rs
@github-actions github-actions Bot removed the break API braking change label Jan 23, 2026
coderabbitai Bot and others added 2 commits January 23, 2026 16:50
* 📝 Add docstrings to `cli/serial-creation`

Docstrings generation was requested by @ChanTsune.

* #2635 (comment)

The following files were modified:

* `cli/src/command/core.rs`
* `cli/src/command/core/iter.rs`
* `cli/src/command/create.rs`
* `cli/src/command/update.rs`
* `cli/tests/cli/append/entry_order.rs`
* `cli/tests/cli/create/entry_order.rs`
* `cli/tests/cli/update/entry_order.rs`

* ♻️ Keep only iter.rs docstrings and add unit tests

- Revert docstring changes to core.rs, create.rs, update.rs, and test files
- Keep OrderedByIndex documentation in iter.rs
- Replace doc examples with actual unit tests in mod tests

---------

Co-Authored-By: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-Authored-By: ChanTsune <41658782+ChanTsune@users.noreply.github.com>
@ChanTsune ChanTsune force-pushed the cli/serial-creation branch from 407c9fb to 5f8f38d Compare January 23, 2026 07:50
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@cli/tests/cli/create/entry_order.rs`:
- Around line 333-405: The test create_split_preserves_entry_order may never
trigger splitting because test payloads compress too well; modify the test so
the split is actually exercised by (a) writing less-compressible data (e.g.,
random bytes) for large_file and medium_file instead of repeated bytes, and/or
(b) adding an assertion after creating the split archive (via
cli::Cli::try_parse_from(... "c" ... "--split" "400KB" ...).execute()) that
multiple split parts exist (inspect the filesystem for part files or check
archive metadata) before running cli::Cli::try_parse_from(... "concat"
...).execute(), and keep references to the same file variables (large_file,
medium_file, small_file) and the final validation using archive::for_each_entry.
- Around line 68-143: The test create_preserves_walkdir_order currently only
checks that entry_names == entry_names2 (determinism) but doesn't assert the
actual expected traversal; update the test to compute or hardcode the expected
walkdir depth-first order (based on the CLI's traversal policy) and add an
assert_eq!(entry_names, expected_order) (and optionally the same for
entry_names2) before or alongside the existing determinism check; reference the
variables entry_names/entry_names2, the test function
create_preserves_walkdir_order, the CLI invocation via cli::Cli::try_parse_from
and the archive reader archive::for_each_entry to locate where to insert the
expected_order assertion so the test verifies both correctness and determinism.
♻️ Duplicate comments (1)
cli/src/command/core/iter.rs (1)

15-19: Late duplicate indices break ascending-order guarantee.

The documentation states duplicate indices result in "last item retained," but if a duplicate index arrives after it has already been emitted (idx < self.next), it's buffered and may be emitted later via the exhausted-branch rewind (lines 82-86), breaking the ascending-order contract.

Also applies to: 88-95

🧹 Nitpick comments (1)
cli/src/command/core/iter.rs (1)

102-174: Tests provide good coverage of ordering scenarios.

The test suite covers key edge cases including gaps and reverse ordering. Consider adding a test for duplicate indices to document the expected behavior when the same index appears multiple times.

🧪 Suggested test for duplicate indices
#[test]
fn handles_duplicate_indices() {
    // Second item with index 1 should overwrite the first
    let items = vec![(0, 'a'), (1, 'b'), (1, 'B'), (2, 'c')];
    let out: Vec<_> = ReorderByIndex::new(items.into_iter()).collect();
    assert_eq!(out, vec!['a', 'B', 'c']);
}

Comment thread cli/tests/cli/create/entry_order.rs
Comment thread cli/tests/cli/create/entry_order.rs
@ChanTsune ChanTsune merged commit 139cb27 into main Jan 24, 2026
100 checks passed
ChanTsune added a commit that referenced this pull request Jan 24, 2026
* 📝 Add docstrings to `cli/serial-creation`

Docstrings generation was requested by @ChanTsune.

* #2635 (comment)

The following files were modified:

* `cli/src/command/core.rs`
* `cli/src/command/core/iter.rs`
* `cli/src/command/create.rs`
* `cli/src/command/update.rs`
* `cli/tests/cli/append/entry_order.rs`
* `cli/tests/cli/create/entry_order.rs`
* `cli/tests/cli/update/entry_order.rs`

* ♻️ Keep only iter.rs docstrings and add unit tests

- Revert docstring changes to core.rs, create.rs, update.rs, and test files
- Keep OrderedByIndex documentation in iter.rs
- Replace doc examples with actual unit tests in mod tests

---------

Co-Authored-By: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-Authored-By: ChanTsune <41658782+ChanTsune@users.noreply.github.com>
@ChanTsune ChanTsune deleted the cli/serial-creation branch January 24, 2026 01:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cli This issue is about cli application

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant