Skip to content

[WIP] Core: Add Vortex format to Iceberg#15915

Draft
robert3005 wants to merge 50 commits into
apache:mainfrom
spiraldb:main
Draft

[WIP] Core: Add Vortex format to Iceberg#15915
robert3005 wants to merge 50 commits into
apache:mainfrom
spiraldb:main

Conversation

@robert3005

Copy link
Copy Markdown

This pr isn't meant to be merged in the current state but given that File Format API landed on main I wanted to open the pr to see how Vortex integration in Iceberg would look like and follow up to the community blog post. Had to make couple of adjustments

  1. Extend the FileFormat enum
  2. Add notion of logical splits instead of physical one
    Vortex can be split on arbitrary row offsets, splitOffsets as defined in the api are purely a read concern and up to the engine/user

a10y and others added 30 commits March 25, 2026 11:42
If either half of an OR is an unconvertible expression,
we reduce the entire filter to an ALWAYS_TRUE, which means
that no filtering is pushed down.

This was discovered by the TPC-DS query

```
select * from customer_address
where
  (ca_country = 'United States' AND ca_state IN( 'SC', 'IN', 'VA'))
  OR
  (ca_country = 'United States' AND ca_state IN( 'WA', 'KS', 'KY'))
```

The IN expression is not currently pushable, but we can fix that by
implementing it in Vortex.
* fix: properly close native resources in Vortex scans

* fix system closing

* format, bump vortex dep

* bump vortex to correct release

* remove print
Signed-off-by: Robert Kruszewski <github@robertk.io>
robert3005 and others added 18 commits May 25, 2026 23:11
Add generic end-to-tend tests for Vortex
Signed-off-by: Robert Kruszewski <github@robertk.io>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Support case inensitive table scans, and enable TestLocalScan for Vortex
…mn (#31)

Support row-idx pushdown into Vortex scans matching the metadata column
Add support for position deletes in Vortex

Pushes position deletes into the Vortex scan so deleted rows are excluded natively instead of being read and filtered out afterwards.

DeleteFilter exposes the deleted positions for pushdown (skipped when the _is_deleted column is projected, since those rows must be marked rather than removed). GenericReader forwards them only when the reader advertises support via the new ReadBuilder.supportsPositionDeletes(), so Parquet/ORC/Avro keep applying deletes post-scan. VortexIterable serializes the positions as a portable 64-bit Roaring bitmap and applies EXCLUDE_ROARING row selection.

Also adds a Vortex position-delete writer (PositionDeleteVortexWriter, VortexFormatModel.forPositionDeletes) for writing path/pos delete files, plus TestVortexPositionDeletes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants