[core] FormatTable supports Blob Format#8191
Conversation
| enum Format { | ||
| ORC, | ||
| PARQUET, | ||
| BLOB, |
There was a problem hiding this comment.
Adding BLOB here also exposes format-table projection paths. For a table like (payload BLOB, ds INT) PARTITIONED BY (ds), projecting only ds makes FormatReadBuilder remove partition columns before creating the file reader, so the projectedRowType passed to BlobFileFormat is empty. BlobFileFormat currently requires a BLOB field and throws, whereas other format tables can satisfy partition-only projections. Please handle this case, for example by reading only the blob file metadata to get the row count and then appending partition columns, or by adding an explicit supported projection path with a test.
| } | ||
| if (writer instanceof FileAwareFormatWriter) { | ||
| FileAwareFormatWriter fileAwareFormatWriter = (FileAwareFormatWriter) writer; | ||
| fileAwareFormatWriter.setFile(path); |
There was a problem hiding this comment.
setFile(path) is not enough for the withBlobConsumer path. BlobFormatWriter invokes the consumer while writing and the emitted descriptor points at this target path, but this writer is backed by a TwoPhaseOutputStream, so the target file is not visible until FormatTableCommit commits it; if a later write/commit fails, abort()/FormatTableCommit.abort() discards it anyway. This violates the TableWrite.withBlobConsumer contract that these files are left for the caller to clean up, and leaves already-emitted descriptors dangling. Please either make the consumer path use visible/non-deleted files like SingleFileWriter does with deleteFileUponAbort(), or defer/avoid emitting descriptors until the file has actually been committed, and add a failure-path test.
There was a problem hiding this comment.
@JingsongLi Thanks for your reivew! But this scenario is a little bit tricky.
Currently FormatTable on DFS uses RENAME to do two-phase-commit. So the set path is not real, only exists after commit! At that case, if commit failed and aborted, it's meaningless to retain the written files, because they are in temp dir and not equal to path stored in BlobDescriptors.
(However in python, no two-phase commit implemented, so I still retain written files on abortion)
Here're my thinkings:
- Maybe we could explicitly warn users that in FormatTable, returned blobDescriptors are only valid after commit? Or maybe introduce a PendingBlobDescriptor for format tables, all same as BlobDescriptors but BlobRef could warn users the Descriptor is still pending, rather than throws
path not exists. - I think this "visible after commit" is acceptable for batch scenarios, for example: in Spark/Ray, FormatTable commit is a part of job, exported descriptors will be visible only after the job is succesfully finished.
- Or maybe we do not use two-phase commit for BlobFormatTables? Just filter out the broken files on read.
Thanks again for your review! I'll close this PR and find an another way if you think this scenario is not suitable for paimon FormatTable.
|
@steFaiz Why not just using Paimon table to store objects? |
@JingsongLi Thanks for your question! Let me explain this. My scenario is:
Why append table is not suitable?
I'm exploring use Paimon Format Table to replace oss, just act as an archive for blobs. Users always refer to blobs by descriptor-only(not full scan) and can utilize paimon's blob packing, partition management and table management. |
Purpose
Supports Blob Format in FormatTable.
The situation is to replace ObjectStore by Paimon on DFS, unifying storage engines. Consider this situation:
The key advantages are:
Restriction
Now we only permit one non-partition column Blob Format Table.
Tests
See
org.apache.paimon.table.format.FormatTableBlobTest