Problem
When creating a table with 'file_format_version' = '2.2' and '<column>.lance.encoding' = 'blob', the write fails with:
Error: Invalid user input: Legacy blob columns (field metadata key "lance-encoding:blob") are not supported
for file version >= 2.2. Use the blob v2 extension type (ARROW:extension:name = "lance.blob.v2") and the
new blob APIs (e.g. lance::blob::blob_field / lance::blob::BlobArrayBuilder).
Root Cause
SchemaConverter.addBlobMetadata() unconditionally sets the legacy lance-encoding:blob field metadata. For file format >= 2.2, lance-core requires the blob v2 Arrow extension type
(ARROW:extension:name = "lance.blob.v2") with a Struct<data: LargeBinary?, uri: Utf8?> schema instead.
Proposed Changes
- When
file_format_version >= 2.2 and blob encoding is requested, emit the blob v2 extension type metadata and struct schema instead of legacy metadata.
- Update
BlobUtils to add blob v2 constants and detection.
- Update
LanceArrowUtils.scala to handle the blob v2 struct type on the read path.
- Keep legacy behavior for
file_format_version < 2.2.
Workaround
Use 'file_format_version' = '2.1' with legacy blob encoding.
Problem
When creating a table with
'file_format_version' = '2.2'and'<column>.lance.encoding' = 'blob', the write fails with:Error: Invalid user input: Legacy blob columns (field metadata key "lance-encoding:blob") are not supported
for file version >= 2.2. Use the blob v2 extension type (ARROW:extension:name = "lance.blob.v2") and the
new blob APIs (e.g. lance::blob::blob_field / lance::blob::BlobArrayBuilder).
Root Cause
SchemaConverter.addBlobMetadata()unconditionally sets the legacylance-encoding:blobfield metadata. For file format >= 2.2, lance-core requires the blob v2 Arrow extension type(
ARROW:extension:name = "lance.blob.v2") with aStruct<data: LargeBinary?, uri: Utf8?>schema instead.Proposed Changes
file_format_version >= 2.2and blob encoding is requested, emit the blob v2 extension type metadata and struct schema instead of legacy metadata.BlobUtilsto add blob v2 constants and detection.LanceArrowUtils.scalato handle the blob v2 struct type on the read path.file_format_version < 2.2.Workaround
Use
'file_format_version' = '2.1'with legacy blob encoding.