Support partitionBy in VortexSparkDataSource by robert3005 · Pull Request #7218 · vortex-data/vortex

robert3005 · 2026-03-31T15:49:30Z

Support partitionBy in spark writer

Signed-off-by: Robert Kruszewski <github@robertk.io>

codspeed-hq · 2026-03-31T15:56:24Z

Merging this PR will not alter performance

✅ 1106 untouched benchmarks
⏩ 1522 skipped benchmarks¹

_{Comparing rk/partitionby (4b0c14b) with develop (3ea259e)}

1522 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

a10y · 2026-03-31T21:48:08Z

+    private String getPartitionPath(InternalRow row) {
+        StringBuilder sb = new StringBuilder();
+        for (int i = 0; i < resolvedTransforms.length; i++) {
+            if (i > 0) {
+                sb.append("/");
+            }
+            ResolvedTransform rt = resolvedTransforms[i];
+            sb.append(URLEncoder.encode(rt.directoryKey, StandardCharsets.UTF_8));
+            sb.append("=");
+
+            String value = evaluateTransform(rt, row);
+            sb.append(URLEncoder.encode(value, StandardCharsets.UTF_8));
+        }
+        return sb.toString();
+    }


is this code really not in spark somewhere

I spent a long time looking, let me look again. In all fairness all of this logic is not datasource specific. I am very confused why they don't have shared handling. It could also be that we have to be a FileSource and I think while initially that's simpler there's things that it makes harder in the long term

All of this logic exists only for file datasources but then you're married to hadoop. I think we are fine to reimplement it.

robert3005 · 2026-04-01T11:14:35Z

We don't implement filter pushdown yet so even though we can read and write parititions we don't prune them. Also we don't remove the parititon columns from the data yet

robert3005 · 2026-04-01T11:25:43Z

This pr needs more work - we need to remove partition values from data

Signed-off-by: Robert Kruszewski <github@robertk.io>

Support partitionBy in spark reader/writer --------- Signed-off-by: Robert Kruszewski <github@robertk.io> Signed-off-by: Will Manning <will@willmanning.io>

kesavkolla · 2026-04-02T01:51:55Z

It would be nice to see full support for hive style partitioning with vortex. For both read/write and also filter pushdown

robert3005 · 2026-04-02T02:06:51Z

This pr added everything but filter pushdown. We don’t have filter pushdown in spark data source at all right now

robert3005 added 4 commits March 31, 2026 14:03

cleaning

8b569c9

Support partitionBy in spark writer

170e04a

Signed-off-by: Robert Kruszewski <github@robertk.io>

revert

487c250

Signed-off-by: Robert Kruszewski <github@robertk.io>

bleh

14bd56e

Signed-off-by: Robert Kruszewski <github@robertk.io>

robert3005 added the changelog/feature A new feature label Mar 31, 2026

robert3005 requested a review from a10y March 31, 2026 15:52

a10y reviewed Mar 31, 2026

View reviewed changes

a10y approved these changes Mar 31, 2026

View reviewed changes

more

4b0c14b

Signed-off-by: Robert Kruszewski <github@robertk.io>

robert3005 merged commit 8060ae0 into develop Apr 1, 2026
60 checks passed

robert3005 deleted the rk/partitionby branch April 1, 2026 13:55

lwwmanning pushed a commit that referenced this pull request Apr 1, 2026

Support partitionBy in VortexSparkDataSource (#7218)

b3de15b

Support partitionBy in spark reader/writer --------- Signed-off-by: Robert Kruszewski <github@robertk.io> Signed-off-by: Will Manning <will@willmanning.io>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support partitionBy in VortexSparkDataSource#7218

Support partitionBy in VortexSparkDataSource#7218
robert3005 merged 5 commits intodevelopfrom
rk/partitionby

robert3005 commented Mar 31, 2026 •

edited

Loading

Uh oh!

codspeed-hq Bot commented Mar 31, 2026 •

edited

Loading

Uh oh!

a10y Mar 31, 2026

Uh oh!

robert3005 Apr 1, 2026

Uh oh!

robert3005 Apr 1, 2026

Uh oh!

robert3005 commented Apr 1, 2026

Uh oh!

robert3005 commented Apr 1, 2026

Uh oh!

Uh oh!

kesavkolla commented Apr 2, 2026

Uh oh!

robert3005 commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

robert3005 commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq Bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Footnotes

Uh oh!

a10y Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

robert3005 Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

robert3005 Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

robert3005 commented Apr 1, 2026

Uh oh!

robert3005 commented Apr 1, 2026

Uh oh!

Uh oh!

kesavkolla commented Apr 2, 2026

Uh oh!

robert3005 commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

robert3005 commented Mar 31, 2026 •

edited

Loading

codspeed-hq Bot commented Mar 31, 2026 •

edited

Loading