Support partitionBy in VortexSparkDataSource#7218
Conversation
Merging this PR will not alter performance
Comparing Footnotes
|
| private String getPartitionPath(InternalRow row) { | ||
| StringBuilder sb = new StringBuilder(); | ||
| for (int i = 0; i < resolvedTransforms.length; i++) { | ||
| if (i > 0) { | ||
| sb.append("/"); | ||
| } | ||
| ResolvedTransform rt = resolvedTransforms[i]; | ||
| sb.append(URLEncoder.encode(rt.directoryKey, StandardCharsets.UTF_8)); | ||
| sb.append("="); | ||
|
|
||
| String value = evaluateTransform(rt, row); | ||
| sb.append(URLEncoder.encode(value, StandardCharsets.UTF_8)); | ||
| } | ||
| return sb.toString(); | ||
| } |
There was a problem hiding this comment.
is this code really not in spark somewhere
There was a problem hiding this comment.
I spent a long time looking, let me look again. In all fairness all of this logic is not datasource specific. I am very confused why they don't have shared handling. It could also be that we have to be a FileSource and I think while initially that's simpler there's things that it makes harder in the long term
There was a problem hiding this comment.
All of this logic exists only for file datasources but then you're married to hadoop. I think we are fine to reimplement it.
|
We don't implement filter pushdown yet so even though we can read and write parititions we don't prune them. Also we don't remove the parititon columns from the data yet |
|
This pr needs more work - we need to remove partition values from data |
Support partitionBy in spark reader/writer --------- Signed-off-by: Robert Kruszewski <github@robertk.io> Signed-off-by: Will Manning <will@willmanning.io>
|
It would be nice to see full support for hive style partitioning with vortex. For both read/write and also filter pushdown |
|
This pr added everything but filter pushdown. We don’t have filter pushdown in spark data source at all right now |
Support partitionBy in spark writer