Commit a0d4293
fix(sqlite): reconstruct all basic payload types + validate at build time (#27)
* fix(sqlite): reconstruct LargeUtf8 and Utf8View on read-back
* fix(sqlite): reconstruct List<Utf8View> on read-back
* fix(sqlite): use LargeStringBuilder for List<LargeUtf8> read-back
* fix(sqlite): reconstruct LargeList payloads on read-back
* test(sqlite): cover null and empty list cells on read-back
* feat(sqlite): validate payload types at build time
PR #27 closed the reachable LargeUtf8/Utf8View/list read-back gaps, but
the underlying defect class remained: the write side accepted any Arrow
type (silent TEXT/NULL fallback) while the read side reconstructed only a
subset, so an unsupported payload column built a "successful" sidecar that
then failed on the first query (runtimedb#631).
Close the class structurally:
- Add `payload_type_supported` as the single source of truth for which
Arrow types round-trip, and `validate_payload_schema` to gate every
build entry point (`SqliteSidecarBuilder::begin`, `open_or_build`). An
unsupported column now fails at index-build time with the column named,
instead of at query time — which is where runtimedb's index-create step
will surface it.
- Widen actual support to the common parquet/DuckDB scalar types that
previously fell through: Boolean, Date32, Date64, and Timestamp (all
units, time zone preserved on read-back).
Tests: reject a Decimal128 payload at build time; round-trip Boolean,
Date32, and a tz-carrying Timestamp.
* feat(sqlite): support all basic scalar payload types
Round out the supported set with the remaining primitive Arrow types so a
parquet/DuckDB column of any basic type round-trips instead of being
rejected at build time:
- small integers: Int8, Int16, UInt8, UInt16 (→ INTEGER)
- time-of-day: Time32 (Second/Millisecond), Time64 (Microsecond/Nanosecond)
(→ INTEGER, unit restored from the schema on read-back)
- binary: Binary, LargeBinary (→ BLOB)
These add match arms only; each column hits a single arm, so there is no
runtime cost for types a given sidecar doesn't use. Float16 (needs the
`half` crate) and Decimal128/256 (need lossless text encoding) remain out
of scope and are still rejected early by validate_payload_schema.
Test: round-trip all eight new types with null cells, asserting both the
reconstructed Arrow type and the values.
* fix(sqlite): key off output field 0 in payload validation
`validate_payload_schema` was passed `begin`'s `key_col_index`, but that
indexes the input batch while the function skips that position in the
*output* schema — whose key is always field 0. For any `key_col_index != 0`
this skipped a real payload column (leaving an unsupported type to fail at
query time, the exact defect this guards against) and validated the key
column instead.
Drop the parameter and always skip output field 0, matching how `finish()`
and `open_or_build` derive the key from `schema.field(0)`. Add a regression
test driving `begin` with a non-zero `key_col_index` and an unsupported
payload column.
---------
Co-authored-by: Anoop Narang <anoop@hotdata.dev>1 parent 3d8c7a2 commit a0d4293
2 files changed
Lines changed: 1095 additions & 60 deletions
0 commit comments