Appending dataset corrupts blob columns


## Repro

When appending rows to an existing Lance dataset that has large_binary blob columns (with lance-encoding:blob metadata), the appended fragment stores those columns as struct<position: uint64, size: uint64> instead of large_binary — even when passing the correct schema explicitly with data_storage_version="stable".

## Consequences:

`ds.take_blobs()` fails on rows from the appended fragment
`ds.optimize.compact_files()` fails with schema mismatch between the original and appended fragments
The dataset looks fine superficially (row counts correct, to_table() works) but blob reads are broken
Repro: write a dataset with blob columns using mode="overwrite", then append more rows with mode="append" using the same schema. Inspect `ds.schema.field("rgb").type` — it'll say large_binary (from the first fragment), but rows in the second fragment are actually struct-encoded.

Workaround: Always write all rows in a single write_dataset(mode="overwrite") call. Never use mode="append" with blob columns.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Appending dataset corrupts blob columns #6381

Repro

Consequences:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Appending dataset corrupts blob columns #6381

Description

Repro

Consequences:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions