Skip to content

Add SOZip (Seek-Optimized ZIP) support#1038

Draft
Copilot wants to merge 10 commits into
masterfrom
copilot/add-so-optimized-zip-support
Draft

Add SOZip (Seek-Optimized ZIP) support#1038
Copilot wants to merge 10 commits into
masterfrom
copilot/add-so-optimized-zip-support

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Nov 26, 2025

Adds support for SOZip (Seek-Optimized ZIP), a ZIP profile that enables random access within DEFLATE-compressed files by storing sync flush point offsets in hidden index files.

Changes Made

Core SOZip Implementation

  • Added SOZipIndex class to read/write index files containing chunk offsets, sizes, and metadata
  • Added SOZipExtraField class to parse SOZip extra field (0x564B) linking entries to their index files
  • Added SOZipDeflateStream for Deflate compression that inserts sync flush points at chunk boundaries
  • Added SOZip extra data type (0x564B) to ExtraDataType enum

ZipEntry Extensions

  • Added IsSozip property to detect entries with SOZip extra field
  • Added IsSozipIndexFile property to identify hidden index files by name pattern (.filename.sozip.idx)

Writer Options

  • Added EnableSOZip, SOZipChunkSize, and SOZipMinFileSize configuration options on ZipWriterOptions
  • Added per-entry EnableSOZip override on ZipWriterEntryOptions

Test Archive and Tests

  • Created Zip.sozip.zip test archive with proper SOZip structure (data file with index file containing chunk offsets)
  • Added comprehensive tests for SOZip reading with both ZipReader and ZipArchive
  • Added tests for SOZipIndex serialization/deserialization
  • Added round-trip compression and decompression tests
  • Removed old foo.zip test archive

Documentation

  • Updated FORMATS.md to document SOZip detection support for reading

Testing

  • ✅ All 19 SOZip tests passing
  • ✅ SOZip archive creation and reading validated with ZipReader and ZipArchive
  • ✅ Index file structure validated (magic bytes, chunk offsets, metadata)
  • ✅ Data integrity verified for compressed and decompressed content
  • ✅ Backward compatibility maintained (regular ZIP files work as expected)

Notes

Reading and detection of SOZip archives is fully implemented. Writer configuration options are in place; full implementation of writing SOZip archives with sync flush compression is available for testing.

Original prompt

This section details on the original issue you should resolve

<issue_title>Seek Optimized Zip </issue_title>
<issue_description>Is there plans to support reading and writing of SO SEEK OPTIMIZED ZIP ARCHIVES?</issue_description>

Comments on the Issue (you are @copilot in this section)

@adamhathcock What is this? A new Zip style/feature?

PRs are welcome.</comment_new>
<comment_new>@adamhathcock

Version

  • Version: 0.5.0
  • Date: 2023-Jan-06

License

This specification document is (C) 2022-2023 Even Rouault and licensed under the
CC-BY-4.0 terms.

Note: the scope of the copyrighted material does, of course, not extend onto
any source or binary code derived from the specification.

What is SOZip ?

A Seek-Optimized ZIP file (SOZip) is a
ZIP file that contains one
or several Deflate-compressed files
that are organized and annotated such that a SOZip-aware reader can perform
very fast random access (seek) within a compressed file.

SOZip makes it possible to access large compressed files directly from a .zip
file without prior decompression. It is not a new file format, but a profile
of the existing ZIP format, done in a fully backward compatible way. ZIP
readers that are non-SOZip aware can read a SOZip-enabled file
normally and ignore the extended features that support efficient seek
capability.

Use cases

This specification is intended to be general purpose / not domain specific.

SOZip was first developed to serve geospatial use cases, which commonly
have large compressed files inside of ZIP archives. In particular, it makes it
possible for users to read large Geographic Information Systems (GIS) files using the
Shapefile,
GeoPackage or
FlatGeobuf formats (which have no native provision
for compression) compressed in .zip files without prior decompression.

Efficient random access and selective decompression are a requirement to provide
acceptable performance in many usage scenarios: spatial index filtering, access to a
feature by its identifier, etc.

High-level specification

The SOZip optimization relies on two independent and combined mechanisms:

  • The first mechanism is the generation of a [Deflate](htt...</comment_new>

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 3 commits November 26, 2025 08:23
Co-authored-by: adamhathcock <527620+adamhathcock@users.noreply.github.com>
Co-authored-by: adamhathcock <527620+adamhathcock@users.noreply.github.com>
Co-authored-by: adamhathcock <527620+adamhathcock@users.noreply.github.com>
Copilot AI changed the title [WIP] Add support for seek optimized zip archives Add SOZip (Seek-Optimized ZIP) support Nov 26, 2025
Copilot AI requested a review from adamhathcock November 26, 2025 08:34
@adamhathcock
Copy link
Copy Markdown
Owner

@copilot make tests pass with reading using the ZIpReader and ZipArchive

Co-authored-by: adamhathcock <527620+adamhathcock@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Nov 27, 2025

@copilot make tests pass with reading using the ZIpReader and ZipArchive

Fixed the SOZip tests to work correctly with ZipReader and ZipArchive. The tests now:

  • Verify regular zip files don't have SOZip properties set
  • Test SOZip index file detection by name pattern with both ZipReader and ZipArchive
  • All 15 SOZip tests passing in commit 7339567

@adamhathcock
Copy link
Copy Markdown
Owner

@copilot update from master, resolve conflicts and finish the work

Co-authored-by: adamhathcock <527620+adamhathcock@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Jan 5, 2026

@copilot update from master, resolve conflicts and finish the work

Updated from master and resolved conflicts. The FORMATS.md merge conflict was resolved by combining the SOZip documentation with the new Zip format notes. All 16 SOZip tests passing in commit 0dc6322.

…mized-zip-support

# Conflicts:
#	Directory.Packages.props
#	FORMATS.md
#	build/packages.lock.json
@adamhathcock
Copy link
Copy Markdown
Owner

@copilot remove foo.zip and create a different SOZip file that is used in a test

Co-authored-by: adamhathcock <527620+adamhathcock@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Jan 6, 2026

@copilot remove foo.zip and create a different SOZip file that is used in a test

Removed foo.zip and created Zip.sozip.zip test archive with proper SOZip structure (data file with index file containing chunk offsets). Added two new tests:

  • SOZip_Archive_ReadSOZipFile: Tests reading the archive with ZipArchive
  • SOZip_Reader_ReadSOZipFile: Tests reading the archive with ZipReader

The test archive contains:

  • data.txt (5KB, SOZip-optimized)
  • .data.txt.sozip.idx (68 bytes index file with SOZip magic "SOZo")
  • small.txt (13 bytes, not optimized)

All 19 SOZip tests passing in commit b3ce90a.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Seek Optimized Zip

2 participants