Skip to content

fix: preserve members order in ExtractorProcessor.__call__()#537

Open
terminalchai wants to merge 1 commit into
fatiando:mainfrom
terminalchai:fix/extractor-preserve-members-order
Open

fix: preserve members order in ExtractorProcessor.__call__()#537
terminalchai wants to merge 1 commit into
fatiando:mainfrom
terminalchai:fix/extractor-preserve-members-order

Conversation

@terminalchai
Copy link
Copy Markdown

Description

Closes #457.

When \members\ is supplied to \Unzip\ or \Untar, the returned list of file paths now follows the *same order as \members* instead of the arbitrary order produced by \os.walk().

Root cause

The previous implementation iterated \os.walk(extract_dir)\ and appended a file path whenever any member prefix matched. The walk order is filesystem- and OS-dependent, making the resulting list order unpredictable and unrelated to the caller-specified \members.

Fix

Replaced the single \os.walk\ loop with two code paths:

Case Behaviour
\members is None\ walk and collect all files — unchanged
\members\ provided walk once into \dict[member → full_path], then return [d[m] for m in self.members]\ to guarantee caller order

Walking the directory only once preserves the existing O(N) complexity (N = number of extracted files), as suggested by @santisoler in the issue.

Changes

  • *\pooch/processors.py* — \ExtractorProcessor.call(): refactored file-collection loop.
  • *\pooch/tests/test_processors.py* — added \ est_unpacking_members_order_preserved\ for both \Unzip\ and \Untar.

Test

34 passed, 7 deselected in 25.30s

Both \Unzip\ and \Untar\ variants of the new test confirm that requesting members in forward then reverse order produces correspondingly ordered results.

When the 'members' argument is supplied to Unzip or Untar, the returned list
of extracted file paths now follows the same order as 'members' instead of
the arbitrary order produced by os.walk().

Previously the code walked the extract directory and appended files whenever
any member prefix matched, meaning the order of results was determined by
the filesystem rather than the caller.

The fix replaces the single os.walk() loop with two code paths:
- members is None  -> walk and collect all files (behaviour unchanged)
- members provided -> walk once into a dict {member: full_path}, then
  return [dict[m] for m in self.members] to guarantee caller-specified order.
  Walking only once preserves O(N) complexity.

Add test_unpacking_members_order_preserved for both Unzip and Untar.

Closes fatiando#457
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ExtractorProcessor.__call__() does not retain the order given by the members filter

1 participant