Skip to content

Commit eebe867

Browse files
author
Leo Ji
committed
refactor: simplify _dedup_central_directory using NameToInfo
NameToInfo is already a {filename -> latest ZipInfo} mapping because each writestr/write call does NameToInfo[name] = zinfo, so later writes overwrite earlier ones (confirmed via CPython zipfile source). Replace the manual seen/deduped loop with a two-liner that uses public API: - namelist() to get all filenames (preserving insertion order) - dict.fromkeys() to deduplicate while keeping first-seen order - getinfo() to retrieve the latest ZipInfo for each unique name Only filelist needs to be updated; NameToInfo is already correct and does not need to be rewritten. The new list is computed before assignment so the ZipFile is never left in an inconsistent state on exception. Made-with: Cursor
1 parent 9add837 commit eebe867

File tree

1 file changed

+13
-15
lines changed

1 file changed

+13
-15
lines changed

src/zarr/storage/_zip.py

Lines changed: 13 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -136,24 +136,22 @@ def _dedup_central_directory(self) -> None:
136136
*last* (most recent) entry for every filename so that the on-disk central
137137
directory is clean.
138138
139-
Both ``filelist`` and ``NameToInfo`` are computed upfront and then
140-
swapped in together so that the ZipFile object is never left in an
141-
inconsistent state if an exception occurs mid-update.
139+
``NameToInfo`` is already a ``{filename: latest_ZipInfo}`` mapping because
140+
each ``writestr``/``write`` call does ``NameToInfo[name] = zinfo``, so
141+
later writes overwrite earlier ones. We therefore only need to rebuild
142+
``filelist`` from the values of ``NameToInfo``; ``NameToInfo`` itself
143+
requires no update.
144+
145+
The new list is computed before being assigned so that the ZipFile object
146+
is never left in an inconsistent state if an exception is raised.
142147
"""
143148
if self._zf.mode not in ("w", "a", "x"):
144149
return
145-
seen: set[str] = set()
146-
deduped: list[zipfile.ZipInfo] = []
147-
for info in reversed(self._zf.filelist):
148-
if info.filename not in seen:
149-
seen.add(info.filename)
150-
deduped.append(info)
151-
new_filelist = list(reversed(deduped))
152-
new_name_to_info = {info.filename: info for info in new_filelist}
153-
# Swap both attributes together; if anything above raised, the
154-
# original filelist/NameToInfo are still intact.
155-
self._zf.filelist = new_filelist
156-
self._zf.NameToInfo = new_name_to_info
150+
# namelist() and getinfo() are public API.
151+
# dict.fromkeys preserves first-seen order while deduplicating names;
152+
# getinfo() returns the latest ZipInfo for each name (NameToInfo[name]).
153+
unique_names = dict.fromkeys(self._zf.namelist())
154+
self._zf.filelist = [self._zf.getinfo(name) for name in unique_names]
157155

158156
async def clear(self) -> None:
159157
# docstring inherited

0 commit comments

Comments
 (0)