Skip to content

feat: add NamespaceClientTableContext for cached namespace info#96

Closed
jackye1995 wants to merge 186 commits into
mainfrom
ns-cached-info
Closed

feat: add NamespaceClientTableContext for cached namespace info#96
jackye1995 wants to merge 186 commits into
mainfrom
ns-cached-info

Conversation

@jackye1995
Copy link
Copy Markdown
Owner

Summary

  • Introduce NamespaceClientTableContext (Rust struct, Python class, Java class) that holds cached describe_table/declare_table response data (location, storage_options, managed_versioning)
  • Context is passed all the way to Rust layer where all decisions about storage options merging and managed versioning are made — Python and Java make no decisions
  • Remove deprecated Dataset.create/open overloads and createWithFfiSchema JNI path in Java

esteban and others added 30 commits March 10, 2026 02:15
…6146)

fix CI error: `FAILED
python/tests/test_integration.py::test_duckdb_pushdown_extension_types -
_duckdb.Error: DeprecationWarning: fetch_arrow_table() is deprecated,
use to_arrow_table() instead.`
20%+ faster for 2GB index, could be more for larger index
)

This PR fixes the regression benchmarks workflow failing to resolve the
pinned `google-github-actions/auth` action. The workflow had quoted the
entire `uses` value, which caused the trailing `# v2` comment to be
parsed as part of the action ref.
There was a conflict table in transaction.rs but this was incomplete
(some rows/columns missing) and seemed to be imprecise or incorrect in a
few spots. I've attempted to more thoroughly document this in
transaction.md instead.
…ance-format#6160)

Previously, `adjust_child_validity` would call `ArrayData::try_new` with
a null bitmap on a `DataType::Null` array, causing an `.unwrap()` panic
with `InvalidArgumentError("Arrays of type Null cannot contain a null
bitmask")`.

The trigger: when a user inserts rows where a struct sub-field has only
null values, Arrow infers `DataType::Null` for that column. If a
subsequent fragment omits that nullable sub-field, Lance inserts a
`NullReader` to fill it in. `MergeStream` then merges the real batch
(with null struct rows) and the `NullReader` batch (all-null struct),
recursing into the struct where `adjust_child_validity` is called with
the `Null`-typed child and a non-empty parent validity — triggering the
panic.

Fix: skip the bitmask operation when `child.data_type() ==
DataType::Null`. A `Null` array is always entirely null by definition
and needs no validity adjustment.

Closes lance-format#6159

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…e-format#6163)

Previously, when `FragReuseIndexDetails` exceeded 204800 bytes
(triggered by large compactions with many fragments), the code wrote the
details to an external file (`details.binpb`). On local filesystems,
`ObjectStore::create` returns a `LocalWriter` that atomically renames a
temp file to the final path in `Writer::shutdown`. However,
`frag_reuse.rs` imported `tokio::io::AsyncWriteExt` but not
`lance_io::traits::Writer`, so `writer.shutdown()` resolved to
`AsyncWriteExt::shutdown` (flush/close only) — the temp file was deleted
on drop without being persisted. Any subsequent `load_indices` call
would fail with `Not found: .../details.binpb`.

Fixed by using UFCS `Writer::shutdown(writer.as_mut()).await?` to
explicitly call the lance trait method, matching the existing pattern in
`ivf.rs` and `blob.rs`.

Fixes lance-format#6161

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
This breaks the "build_partitions" stage into "build_partitions" and
"merge_partitions", and also updates the progress reporting on the
shuffle phase to be in terms of rows instead of batches.
This PR moves a few unrelated clippy cleanups out of lance-format#6168 so the blob
empty-range fix can stay focused on the regression it addresses. The
changes here are all mechanical simplifications with no intended
behavior change.
…t#6175)

This PR moves the Linux and Windows workflows that currently run on Warp
onto GitHub-hosted runners. The goal is to reduce reliance on custom
runners and take advantage of the sponsored larger GitHub-hosted
machines for the slowest CI paths.

This is focused on the current CI bottlenecks we observed in recent
successful PR runs, especially Rust ARM and Python Windows jobs, while
keeping the existing macOS and benchmark-specific runners unchanged
until we verify equivalent GitHub-hosted options for them.

Context:
- Recent PR history shows Rust `linux-arm` and Python `windows` as the
dominant critical-path jobs.
- This change upgrades those jobs to larger GitHub-hosted runners where
available (`ubuntu-24.04-8x`, `ubuntu-24.04-arm64-8x`,
`windows-latest-4x`) and aligns the remaining Linux/Windows workflows
with the same runner family.
- I validated the workflow YAML locally after the runner migration; no
product code or test logic changed.

---

Updates:

- Rust linux-arm:40.7 -> 19.4,about -52%
- Rust windows-build:27.7 -> 21.0,about -24%
- Python windows:36.5 -> 23.1,about -37%
- Python Linux 3.13 ARM:26.9 -> 20.7,about -23%
- Python Linux 3.13 x86_64:26.8 -> 19.1,about -29%
- Python Linux 3.9 x86_64:25.9 -> 19.2,about -26%
Improvements lance-format#4247 alicloud
storage config doc.

Signed-off-by: FarmerChillax <farmerchillax@outlook.com>
Blob reads should return empty bytes when the logical blob is empty or
the cursor is already at EOF. Today `BlobFile::read` / `read_up_to` can
still issue a `get_range(start..end)` request with `start == end`, which
is tolerated by local readers but rejected by cloud object stores.

This showed up while investigating `random_blob` failures on the
original-scale `laion10m-full` dataset, where legacy blob reads on S3
failed with errors like `Range started at 1 and ended at 1`. The fix
short-circuits empty reads and restores the cursor to blob-relative
semantics after `read()`, and adds regression coverage for both the
empty-range case and packed-blob cursor behavior.
<img width="1340" height="800" alt="image"
src="https://github.com/user-attachments/assets/355caf26-14cb-4823-9474-6e4c9e780823"
/>

- FTS indexing is ~2.5x faster, this removes merge phase, and produces
large partitions directly.
- memory footprint is reduced by ~60%, this compresses posting lists
while building them, which can save a lot of memory, and reduces
fragmented objects in memory.

This also bumps the default worker memory budget from 256MiB to 1GiB
because we need to produce larger partition directly, but the memory
footprint is still much less.

This adds a new param `memory_limit` so that users can control how the
indexing should work

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Co-authored-by: LuQQiu <luqiujob@gmail.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
…#6187)

This fixes the reader panic in lance-format#6185 when a page keeps nullable rep/def
layer metadata but does not materialize any definition levels. The
decoder now treats that page-local state as all-valid and includes a
regression test that reproduces the mixed-page case before the fix.

Closes lance-format#6185.
)

This fixes the merge-insert fast path for delete-by-source operations
while preserving the existing `UpdateIf` semantics. It also keeps
full-schema `FixedSizeList` merges on the optimized path so target-side
payload columns are pruned from the join build side.

Fix lancedb/lancedb#3094
This updates the benchmark TPC-H datagen path to use DuckDB's
`to_arrow_reader()` API instead of the deprecated `fetch_arrow_reader()`
call.

The benchmark CI treats `DeprecationWarning` as an error, so this
removes the warning that was breaking the random access benchmark job. I
also dropped a leftover `print(ds.count_rows())` debug statement to keep
benchmark logs clean.
In retrospect the old name was somewhat presumptuous. It would probably
be good to get the Arrow project's permission before taking up cargo
real estate. This also adds a README which was preventing the publish.
…mat#6145)

## Summary

Closes lance-format#6138

This PR extends `index_matches_criteria()` in
`rust/lance/src/index/scalar.rs` to handle vector index types in
addition to scalar indices.

## Problem

Previously, `index_matches_criteria()` contained an early return at
lines 464-467 that rejected all non-scalar (vector) indices. This made
it impossible to use `describe_indices` to filter for vector indices on
a specific column.

## Solution

- Removed the early return that rejected all vector indices
- Refactored FTS and exact equality checks to only apply to scalar
indices (these checks are not relevant for vector indices)
- Vector indices now pass through when matching basic criteria (name and
column filters)

## Changes

- 1 file modified: `rust/lance/src/index/scalar.rs`
- 15 lines added, 16 lines removed
- Updated existing test `test_index_matches_criteria_vector_index()` to
reflect the new expected behavior

## Testing

- Updated the existing unit test for vector index criteria matching
- The test now correctly expects vector indices to match basic criteria
instead of being rejected

## AI Disclosure

This contribution was developed with the assistance of Claude (AI by
Anthropic). The implementation approach, code, and PR description were
AI-assisted. All changes are focused on resolving the specific issue
described above.

Co-Authored-By: AI Assistant (Claude) <ai-assistant@contributor-bot.dev>

Signed-off-by: ndpvt-web <ndpvt-web@users.noreply.github.com>
Co-authored-by: ndpvt-web <ndpvt-web@users.noreply.github.com>
Co-authored-by: AI Assistant (Claude) <ai-assistant@contributor-bot.dev>
…er (lance-format#6197)

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
…rmat#6194)

This PR makes two changes to ensure stale credentials are not used:
(1) In the Directory namespace if either vending is not enabled or a
credential vendor is not configured we return `None` for storage
options.
(2) The `DynamicStorageOptionsCredentialProvider` falls back to the
default credential provider (lazily loaded) if it is not able to
retrieve credentials.

Closes lance-format/lance-spark#292

---------

Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
…ance-format#6119)

SimpleIndex (HNSW over centroids) previously only supported fp32
centroids, causing fp16 vector workloads to fall back to brute-force
partition assignment — O(K×D) per vector instead of O(log K × D). For
31K centroids × 1024 dims this is a ~600x difference per vector.

Cast fp16 centroids to fp32 at HNSW construction time (one-time cost)
and cast fp16 query vectors at search time (1024 floats per query,
negligible vs the distance computations saved).

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Xuanwo <github@xuanwo.io>
lance-format#6142)

Previously we would use the default file version when creating new index
files. This was originally done to get some testing of the 2.0 format
before it was made the default. However, this led to a bit of a
potential compatibility problem. If we change the default file version
then the files created by the new release would become unreadable on
very old versions that didn't know how to read that file, even if the
dataset itself had an older file version and the old version knew how to
handle the index otherwise.

To avoid this we change things in this PR so that new index files use
the same format version as the dataset. This should mean the indexes are
always readable if the dataset is readable, regardless of what version
was used to write the index.

---

Parts of this PR were written with Claude (Opus 4.6) and I take full
responsibility for its contents.
jackye1995 and others added 28 commits April 8, 2026 13:30
…t#6415)

Before this fix, it is possible for table uri to have a trailing ?
because we do not clear the query component.
…tion (lance-format#6439)

Source rows with NULL ON key columns were silently dropped because the
action assignment logic used `ON_col IS NOT NULL` as a proxy for "source
row is present in the join output". This conflates a legitimate NULL key
with a NULL introduced by the outer join on the target side.

Fix by injecting a `lit(true)` sentinel column into the source DataFrame
before the join. After the join the sentinel is non-null for every
source row and null only for target-only rows, making source row
detection independent of ON column values.

Strip the sentinel in `prepare_stream_schema` before writing and
propagate it through projection pushdown in `necessary_children_exprs`.

Before the join, inject a constant `lit(true)` column
(`__merge_source_sentinel`) into every source row. After the join:

- Source rows (whether matched or unmatched) → sentinel = true
- Target-only rows (no source match) → sentinel = NULL (outer join
NULL-fill)

[assign_action](https://github.com/lance-format/lance/blob/6112a34bfe38618f07c099217dc3d89fd39ca6bb/rust/lance/src/dataset/write/merge_insert/assign_action.rs#L77)
now uses sentinel IS NOT NULL to detect source row presence, making it
correct regardless of what values the ON columns hold.

The sentinel is a pure logical column — it never touches disk. It's
stripped in
[prepare_stream_schema](https://github.com/lance-format/lance/blob/6112a34bfe38618f07c099217dc3d89fd39ca6bb/rust/lance/src/dataset/write/merge_insert/exec/write.rs#L383)
before any data is written, and
[necessary_children_exprs](https://github.com/lance-format/lance/blob/6112a34bfe38618f07c099217dc3d89fd39ca6bb/rust/lance/src/dataset/write/merge_insert/logical_plan.rs#L148)
is updated to propagate it through DataFusion's projection pushdown.

Example that was broken before:

```
Target: (id=1, record_type="A") and (id=0, record_type=NULL)
Source: (id=2, record_type=NULL) — new row, should be inserted
ON: ["id", "record_type"]
Old behavior: source row silently dropped (Action::Nothing)
New behavior: source row correctly inserted (Action::Insert)

```

Fixes: lance-format#4644

---------

Signed-off-by: Pratik <pratikrocks.dey11@gmail.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…t#6440)

Most workflows lacked a `permissions` block, causing GitHub security
warnings. Added `permissions: contents: read` at the top level for all
affected workflows.

Special cases:
- `benchmark-comment-trigger`: also needs `pull-requests: read` to call
the pulls REST API
- `nightly_run`: `run` job needs `actions: write` to dispatch
`file_verification.yml`
- `rust`: `clippy` job-level permissions updated to include `contents:
read` alongside `checks: write`
- `cargo-publish`: `build` job updated to include `contents: read`
alongside `id-token: write`

Workflows already having correct permissions (`claude.yml`,
`claude-code-review.yml`, `pr-title.yml`, `stale.yml`,
`rust-benchmark.yml`, `docs-deploy.yml`, `codex-fix-ci.yml`,
`codex-backport-pr.yml`, `file_verification.yml`, `cargo-publish.yml`)
were left unchanged or minimally updated.
## Feature

### What is the new feature?
This PR adds a first-class blob-aware `to_pandas()` API to the Lance
Python bindings on `LanceDataset`, `LanceScanner`, and `LanceFragment`.

### Why do we need this feature?
Today, pandas export goes through `to_table().to_pandas()`, which is
constrained by Arrow blob representations. That means blob columns
either surface as descriptor structs or must be eagerly materialized as
bytes before pandas conversion. For large blobs, eager materialization
is the wrong default because it can pull a large amount of binary data
into memory unexpectedly.

### How does it work?
The new `to_pandas(*, blob_mode=...)` API keeps Arrow-facing behavior
unchanged and adds pandas-specific blob handling:

- `blob_mode="lazy"` (default) returns `lance.BlobFile` objects in
pandas object columns.
- `blob_mode="bytes"` eagerly reads blobs into Python `bytes`.
- `blob_mode="descriptions"` preserves the old `to_table().to_pandas()`
behavior.

Implementation details:
- Add Python-side blob helpers to detect top-level blob columns and map
direct alias projections back to source blob columns.
- Snapshot Python scanner builder options so `LanceScanner.to_pandas()`
can reconstruct the same scan with `_rowaddr` and blob descriptions.
- Rebuild the scan internally with `with_row_address=True` and
`blob_handling="blobs_descriptions"`, convert non-blob columns through
Arrow's `to_pandas()`, and backfill blob columns via `take_blobs(...,
addresses=...)`.
- Preserve Arrow APIs (`to_table()` / `blob_handling`) unchanged.
- Raise a clear `NotImplementedError` for transformed blob projections
that cannot be mapped back to a single source blob column.

## Testing

- `cd python && uv run pytest python/tests/test_blob.py -q`
- `cd python && uv run --extra dev ruff check python/lance/dataset.py
python/lance/fragment.py python/tests/test_blob.py`
## Summary
- add `PrewarmOptions` and `FtsPrewarmOptions` on the Rust side, with
dataset plumbing for `prewarm_index_with_options`
- add Python `prewarm_index(..., *, with_position=False)` support for
FTS indices while keeping the default prewarm path unchanged
Unreleased version after creating v5.0.0-rc.1
…ance-format#6389)

## Summary

- Replace `panic!()` in `initial_upload_size()` with a warn-and-clamp
fallback when `LANCE_INITIAL_UPLOAD_SIZE` is set outside the valid
`[5MB, 5GB]` range, so misconfiguration can't crash the process
- Extract `MAX_UPLOAD_PART_SIZE` constant for the 5GB upper bound
- Extract `clamp_initial_upload_size` as a pure helper and add boundary
unit tests

## Motivation

Setting `LANCE_INITIAL_UPLOAD_SIZE` to a value outside the valid range
previously crashed the entire process via `panic!()` — a
disproportionate response to a perf-tuning env var misconfiguration. Per
review feedback (lance-format#6389 (comment)), a crash (or even a propagated
`Result`) forces every caller to handle a purely operator-side mistake.
Clamping to the valid range and emitting a single warning lets the
workload proceed and surfaces the misconfiguration to operators. This
also matches the silent-fallback behavior of the sibling env vars
`LANCE_UPLOAD_CONCURRENCY` and `LANCE_CONN_RESET_RETRIES`.

## What Changed

**`initial_upload_size()`**: Return type stays `usize`. Out-of-range
values are clamped into `[5MB, 5GB]` and a single `tracing::warn!` is
emitted with `requested` and `clamped` fields. The existing `OnceLock`
cache guarantees the warning fires at most once per process, so no
separate rate-limiting logic is needed. Non-numeric and unset values
continue to fall back silently to the 5MB default.

**`clamp_initial_upload_size(raw) -> (usize, bool)`**: Pure helper
extracted for testability. Returns the clamped value and whether
clamping occurred.

**`MAX_UPLOAD_PART_SIZE`**: New constant for the 5GB upper bound.

## Behavioral Equivalence

| Input | Before | After |
|-------|--------|-------|
| Env not set | 5MB default | 5MB default |
| Non-numeric (e.g. `"abc"`) | 5MB default | 5MB default |
| Valid integer in `[5MB, 5GB]` | Returns the value | Returns the value
|
| Integer `< 5MB` | **`panic!()`** | Clamped to 5MB + `warn!` (once) |
| Integer `> 5GB` | **`panic!()`** | Clamped to 5GB + `warn!` (once) |

No API changes — `ObjectWriter::new()` signature is unchanged.

## Test plan

- [x] New boundary unit tests: below min, min boundary, in-range, max
boundary, above max, `usize::MAX`
- [x] `cargo test -p lance-io --lib object_writer` — 7 tests pass
- [x] `cargo clippy -p lance-io --tests -- -D warnings` — clean
- [x] `cargo fmt -p lance-io -- --check` — clean
- [x] `cargo check --workspace --tests` — full workspace compiles
Upgrades the pinned Rust toolchain from 1.91.0 to 1.94.0.

The only code change needed was boxing two futures in
`build_partial_fixture` in the `distributed_vector_build` bench, where
1.94's stricter layout computation overflowed the default recursion
limit. Boxing makes the awaited futures' sizes constant (a pointer),
breaking the recursion.

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
## Feature

### What is the new feature?
This PR adds native `float16` and `float64` support for `IVF_FLAT` and
`IVF_HNSW_FLAT`.

### Why do we need this feature?
Flat IVF indexing previously only worked end to end for `float32`. That
meant users could not build, merge, reload, and query flat IVF indexes
on `float16` or `float64` vectors without running into
`Float32`-specific assumptions in flat storage, writer initialization,
and merge/query paths.

### How does it work?
The implementation makes flat IVF paths dispatch on the actual Arrow
element type from stored flat data instead of assuming `Float32`.

- `FlatFloatStorage` now dispatches distance calculators for `float16`,
`float32`, and `float64`.
- Query/training helpers that previously special-cased `Float32` now
accept the native float dtype where needed.
- Tests now cover flat storage distance, partition serde roundtrip, IVF
create/query/remap, and distributed merge behavior for `float16` /
`float64`.

## Validation

- `cargo fmt --all`
- `cargo check -p lance-index --lib`
- `cargo check -p lance --lib`
- `cargo test -p lance-index test_flat_float_storage_distance_f16 --
--nocapture`
- `cargo test -p lance-index
test_merge_ivf_flat_preserves_float64_schema -- --nocapture`
- `cargo test -p lance test_build_ivf_flat -- --nocapture`
- `cargo test -p lance test_create_ivf_hnsw_flat -- --nocapture`
- `cargo test -p lance test_create_ivf_flat_f16 -- --nocapture`

## Benchmark Note

I also benchmarked float32 `IVF_FLAT` before vs after, no obvious
performance diffs
…at#6428)

## Summary

Stacked on lance-format#6388. Please merge that PR first.

- Adds `batch_size_bytes: Option<u64>` to `FileReaderOptions` and
propagates it through all 6 `SchedulerDecoderConfig` creation sites in
the file reader
- Adds `batch_size_bytes` field + setter to `Scanner`, wired through
both `scan_fragments` (via `LanceScanConfig`) and `pushdown_scan` (via
`FileReaderOptions` in `ScanConfig`)
- Adds `batch_size_bytes` to `LanceScanConfig`, with `try_new_v2`
injecting it into `FragReadConfig` via `FileReaderOptions`
- Exposes `batch_size_bytes` in the Python API:
`LanceDataset.scanner()`, `to_table()`, `to_batches()`, `ScannerBuilder`

## Test plan

- [x] `cargo check -p lance-file -p lance --tests` — clean
- [x] `cargo clippy -p lance-file -p lance --tests -- -D warnings` —
clean
- [x] `cargo fmt --all` — applied
- [x] `cargo test -p lance-encoding -- byte_sized` — 3/3 pass
- [x] `cargo test -p lance -- test_scan` — 38/38 pass

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…rmat#6344)

`arrow-cast` 57.0.0 added native support for `FixedSizeList →
FixedSizeList` casting, which was the only reason
`lance_arrow::cast::cast_with_options` existed. This removes the wrapper
and updates all call sites to use `arrow_cast::cast_with_options`
directly.

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously, when a manifest commit failed (conflict, error, or retry
exhaustion), the `.txn` file written to `_transactions/` was left
orphaned. These files would accumulate until the GC cleanup interval (7+
days by default).

This PR adds `cleanup_transaction_file()` — a best-effort delete helper
— and calls it from all three commit failure paths
(`do_commit_new_dataset`, `do_commit_detached_transaction`,
`commit_transaction`). Failures to delete are logged as warnings and do
not surface to the caller.

In the retry loop of `commit_transaction`, the previous iteration's
transaction file is cleaned up before each retry attempt, since a new
transaction file is written on each iteration.

Fixes lance-format#6125

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
## Bug Fix

### What is the bug?
FTS v2 indices built with `with_position=true` can reconstruct cached
posting lists with the wrong shared-position codec during prewarm. The
prewarm path rebuilds `PostingList` values from projected `RecordBatch`
objects and re-infers the positions codec from batch metadata, even
though the reader has already resolved the correct partition-level
`positions_layout`.

### What issues or incorrect behavior does the bug cause?
After `prewarm_index(..., with_position=True)`, phrase queries can fail
even though the same query succeeds before prewarm. In practice this
shows up as shared position stream decode failures in the cached path
because `PackedDelta` data can be interpreted as the legacy codec.

### How does this PR fix the problem?
This PR threads `positions_layout` from `PostingListReader` through the
prewarm reconstruction path into `CompressedPostingList::from_batch`, so
cached postings reuse the already-parsed shared-position codec instead
of guessing from projected batch metadata. It also adds a regression
test that covers the V2 + positions + tail-remainder case and verifies
prewarm preserves correct phrase-query behavior.

## Validation

- `cargo test -p lance-index test_prewarm_with_ -- --nocapture`
…:DeepSizeOf (lance-format#6480)

## Summary

- Fix `CachedFileMetadata::DeepSizeOf` to include `column_metadatas` and
`column_infos` — the two largest fields that were previously omitted
(marked TODO since initial implementation)
- This caused the moka cache weigher to underestimate entry sizes by
~100x, preventing eviction and causing unbounded memory growth on
random-access workloads

## Problem

`LanceCache` uses moka with a weighted capacity of 1 GB
(`DEFAULT_METADATA_CACHE_SIZE`). The weigher calls `DeepSizeOf` on
`CachedFileMetadata`, but the implementation only counted `file_schema`
and `file_buffers` — ignoring `column_metadatas` (protobuf
`ColumnMetadata`) and `column_infos` (`Vec<Arc<ColumnInfo>>` containing
page encodings).

Each cache entry's true size is hundreds of KB, but was reported as ~1
KB. Moka never reached the 1 GB limit, so entries accumulated
indefinitely.

## Profiling Evidence

Tested on a 221M-row dataset with random `ds.take()` (64 rows per call):

| Metric | Before | After |
|--------|--------|-------|
| RSS growth (30 iters) | **+7,503 MB** | **+535 MB** |
| Growth rate | 250 MB/iter (linear, no plateau) | 18 MB/iter (plateaus
~500 MB) |

jemalloc heap profiling (debug build, 243K symbols) showed 99.9% of
leaked memory in `LanceCache::get_or_insert_with_key` →
`FileReader::meta_to_col_infos` and `prost::encoding::message::merge`.

## Approach

Since protobuf-generated types (`pbfile::ColumnMetadata`,
`pb::ColumnEncoding`, etc.) don't implement `DeepSizeOf`, we use
`prost::Message::encoded_len() * 4` as an approximation for in-memory
size. The 4x multiplier accounts for heap allocations in
repeated/string/bytes fields that are larger in memory than on the wire.

## Test plan

- [x] Added `test_deep_size_of_includes_column_metadata` for V2_0 and
V2_1 file formats
- [x] Verified fix reduces memory growth from 250 MB/iter to 18 MB/iter
on a production dataset
- [x] `cargo test -p lance-file` passes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…-format#6488)

## Summary

- Add missing `(SubIndexType::Flat, QuantizationType::FlatBin)` match
arm in `optimize_vector_indices_v2`

The v2 function handles all other sub-index/quantization combinations
but misses the FlatBin case for binary vector IVF_FLAT indices, hitting
the catch-all `unimplemented!` panic during incremental indexing
(`optimize_indices`). The v1 function already handles this correctly.
…t#6435)

This teaches `merge_insert` to keep the delete-by-source fast path even
when a scalar index exists on the join key. The actual indexed join path
is still only used when unmatched target rows are kept, so the presence
of index metadata should not force these operations back to the legacy
full-join path.

This also adds regression coverage for full-schema `FixedSizeList`
merges with `when_not_matched_by_source(Delete)` both with and without a
scalar index. That closes the gap behind lance-format#6195 and preserves the earlier
fix for lancedb/lancedb#3094.
…lance-format#6477)

## Summary

- Change `DataFile.fields` and `DataFile.column_indices` from `Vec<i32>`
to `Arc<[i32]>` so that fragments with identical field lists share a
single heap allocation
- Add `DataFileFieldInterner` that deduplicates these slices during
manifest deserialization
- In homogeneous tables (the common case), every fragment carries the
same field list, so at 20M fragments this saves **~2.4 GB** of redundant
heap allocations

## Motivation

When dataset manifests grow large (>1 GB with millions of fragments),
opening the dataset becomes very expensive in terms of memory. Each
`DataFile` previously owned its own `Vec<i32>` for `fields` and
`column_indices`, even though in most tables every fragment has the
exact same field list. This PR deduplicates those allocations at
deserialization time.

### Per-fragment memory breakdown (before)

| Field | Size per fragment |
|-------|------------------|
| `fields: Vec<i32>` (10 fields) | ~64 bytes |
| `column_indices: Vec<i32>` (10 cols) | ~64 bytes |
| **Total redundant** | **~128 bytes x 20M = ~2.4 GB** |

### After this change

With interning, all 20M fragments share a single `Arc<[i32]>` allocation
(~80 bytes total instead of 2.4 GB).

## Changes

- **`lance-table/src/format/fragment.rs`** — Core struct change
(`Vec<i32>` → `Arc<[i32]>`), custom `Serialize`/`Deserialize` impls, and
`DataFileFieldInterner`
- **`lance-table/src/format/manifest.rs`** — Use interner during
manifest deserialization
- **`lance/src/dataset/fragment.rs`**, **`merge_insert.rs`**,
**`io/commit.rs`** — Tombstoning and field-remapping rebuilt as new
`Arc<[i32]>` instead of in-place mutation
- **`python/src/fragment.rs`**, **`java/lance-jni/src/fragment.rs`** —
FFI boundary conversions
- Various test files — Updated struct literals and assertions

## Compatibility

- No format change — protobuf schema is unchanged
- Serde JSON output is identical (custom impl serializes `Arc<[i32]>` as
`[i32]`)
- All public API signatures that take `Vec<i32>` (e.g.,
`DataFile::new()`, `Fragment::add_file()`) still accept `Vec<i32>` and
convert internally

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…mory (lance-format#6499)

## Summary

- Change `RowDatasetVersionMeta::Inline` from `Vec<u8>` to `Arc<[u8]>`
so that fragments with identical version metadata share a single heap
allocation
- Extend `DataFileFieldInterner` to deduplicate these inline byte
payloads during manifest deserialization
- Introduce `InternCache<T>`: a hybrid cache that uses Vec linear scan
for ≤16 entries and upgrades to HashMap for larger caches
- Add custom `Serialize`/`Deserialize` impls for `RowDatasetVersionMeta`
to handle `Arc<[u8]>` transparently

## Motivation

Follow-up to lance-format#6477 (interning `DataFile.fields`/`column_indices`). After
a compaction, all fragments are stamped with the same version metadata
(both `last_updated_at_version_meta` and `created_at_version_meta`), but
each fragment previously owned its own `Vec<u8>` copy.

### Per-fragment memory breakdown (before)

| Field | Size per fragment |
|-------|------------------|
| `last_updated_at_version_meta: Inline(Vec<u8>)` | ~24 bytes + payload
|
| `created_at_version_meta: Inline(Vec<u8>)` | ~24 bytes + payload |
| **Total redundant at 20M fragments** | **~480 MB+** |

### After this change

With interning, all 20M fragments share a single `Arc<[u8]>` allocation
per unique payload.

## Benchmark results

Microbenchmark at 100K fragments (10 fields per fragment):

| Scenario | No interning | With interning | Delta |
|----------|-------------|----------------|-------|
| **Uniform (1 unique version)** | 24.5 ms | 17.9 ms | **27% faster** |
| **Diverse (10 unique)** | 25.7 ms | 19.7 ms | **23% faster** |
| **Diverse (100 unique)** | 26.0 ms | 23.4 ms | **10% faster** |
| **Diverse (500 unique)** | 26.0 ms | 22.8 ms | **12% faster** |

| Memory (100K fragments) | No interning | With interning | Savings |
|------------------------|-------------|----------------|---------|
| **10 fields** | 39.47 MB | 29.74 MB | **24.6%** |
| **50 fields** | 69.99 MB | 29.74 MB | **57.5%** |

Both memory and speed improve across all scenarios. The hybrid
`InternCache` uses fast Vec scan for the common case (1-3 unique values)
and upgrades to HashMap when diversity exceeds 16 entries.

Run with: `cargo bench -p lance-table --bench manifest_intern`

## Changes

- **`rust/lance-table/src/rowids/version.rs`** — `Inline(Vec<u8>)` →
`Inline(Arc<[u8]>)`, custom serde impls, updated protobuf conversions
- **`rust/lance-table/src/format/fragment.rs`** — `InternCache<T>`
(Vec/HashMap hybrid), extended `DataFileFieldInterner` with version meta
interning
- **`rust/lance-table/benches/manifest_intern.rs`** — Microbenchmark
covering uniform and diverse scenarios

## Compatibility

- No format change — protobuf schema is unchanged
- Serde JSON output is identical (custom impl serializes `Arc<[u8]>` as
`[u8]`)
- `from_sequence()` still works as before (converts internally)

## Test plan

- [x] `cargo check --workspace --tests` passes
- [x] `cargo clippy -p lance-table -p lance -- -D warnings` passes
- [x] All 88 `lance-table` tests pass
- [x] `cargo fmt --all -- --check` passes
- [x] Microbenchmark validates performance across uniform and diverse
scenarios
- [ ] CI

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rmat#6308)

- `list_all_tables`
- `restore_table`
- `update_table_schema_metadata`
- `get_table_stats`
- `explain_table_query_plan`
- `analyze_table_query_plan`

---------

Co-authored-by: zhangyue19921010 <zhangyue.1010@bytedance.com>
## Summary

- Adds `#[instrument]` attributes from the `tracing` crate to key
functions across the `mem_wal` module
- Covers write path (`RegionWriter::open`, `put`, `close`), flush path
(`MemTableFlusher::flush`, `flush_with_indexes`), WAL operations,
manifest store, memtable inserts, scanner/planner, point lookups, and
vector search
- Uses appropriate trace levels (`info` for high-level operations,
`debug` for internals) with relevant fields (region_id, epoch, row
counts, batch counts)

## Test plan

- [x] `cargo check` passes — no functional changes, only attribute
additions
- [x] Existing `mem_wal` tests continue to pass
- [ ] Tracing output verified with `RUST_LOG=debug` showing instrumented
spans

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
)

## Summary

Refactor `FullZipScheduler::create_page_load_task` to accept a
pre-submitted I/O future instead of deferring I/O submission until the
async task executes. This allows the I/O requests to be submitted
immediately during scheduling, enabling the object store layer to batch
and parallelize them. close lance-format#6504

## I/O Model Change

### Before: Lazy I/O submission (serialized)

Previously, `create_page_load_task` received a
`FullZipReadSource::Remote(io)` along with byte ranges and priority. The
actual `io.submit_request()` call happened **inside** the async block,
meaning the I/O request was not submitted until the future was first
polled.

When decoding multiple pages (e.g. across many fragments), this created
a sequential I/O pattern:

```
Page 1: [schedule] -> [poll] -> [submit I/O] -> [wait response] -> [decode]
Page 2:                                          [schedule] -> [poll] -> [submit I/O] -> [wait response] -> [decode]
Page 3:                                                                                   [schedule] -> [poll] -> ...
```

Each page's I/O request could only be submitted after the previous task
started executing. The I/O scheduler had no visibility into upcoming
requests, preventing it from batching or parallelizing them effectively.

### After: Eager I/O submission (pipelined)

Now, `io.submit_request()` is called **before** constructing the
`PageLoadTask`, and the resulting future is passed into
`create_page_load_task`. All I/O requests for all pages are submitted
upfront during the scheduling phase:

```
[schedule all pages] --> submit I/O page 1 -+
                     --> submit I/O page 2 -+
                     --> submit I/O page 3 -+  (all in-flight concurrently)
                     --> submit I/O page N -+
                                            |
                     [poll] -> [await page 1 response] -> [decode]
                     [poll] -> [await page 2 response] -> [decode]
                     [poll] -> [await page 3 response] -> [decode]
```

The object store layer can now see all pending requests at once and
optimize I/O through batching, connection multiplexing, and parallel
fetches. The async tasks only await the already-in-flight I/O futures.

## Changes

- `rust/lance-encoding/src/encodings/logical/primitive.rs`:
- Changed `create_page_load_task` signature to accept
`BoxFuture<'static, Result<Vec<Bytes>>>` instead of `FullZipReadSource`
+ byte ranges + priority
- Moved `io.submit_request()` calls to happen eagerly at both call sites
(`schedule_ranges_with_rep_index` and the non-rep-index path), before
constructing the page load task

## Performance

Tested with a multi-fragment dataset containing fixed-width columns
(768-dim float32 vectors, 40 fragments, 50 rows/fragment):

| Benchmark | Before (p50) | After (p50) | Speedup |
|---|---|---|---|
| Fixed-width column scan | 3453 ms | 523 ms | **6.6x** |

The improvement comes entirely from I/O pipelining — the decoding logic
itself is unchanged. The effect is most pronounced with many fragments
or pages, where the serialized I/O submission was the dominant
bottleneck.
## Summary
- Add `blob_max_pack_file_bytes` to `WriteParams`, allowing users to
override the default 1 GiB maximum pack (`.blob`) sidecar file size
- Thread the configuration through the full write path: `WriteParams` ->
`WriterGenerator` -> `WriterOptions` -> `BlobPreprocessor` ->
`PackWriter`
- Expose the option in Python (`write_dataset`) and Java
(`WriteParams.Builder`) bindings

## Test plan
- [x] All 37 existing blob tests pass (`cargo test -p lance blob`)
- [x] Clippy clean on `lance` and `lance-jni` crates
- [x] Verify Python binding works end-to-end with
`blob_max_pack_file_bytes` kwarg
- [x] Verify Java binding compiles with `./mvnw compile`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…s Rust, Python, Java

Introduce NamespaceClientTableContext struct/class that holds cached
describe_table/declare_table response data (location, storage_options,
managed_versioning). This is passed all the way to Rust where all
decisions about storage options merging and managed versioning are made.

- Rust: NamespaceClientTableContext in lance-namespace crate, with
  from_describe_table_response/from_declare_table_response constructors.
  DatasetBuilder::from_namespace_context and
  Dataset::write_into_namespace_context accept the context.
- Python: NamespaceClientTableContext class in namespace module. All APIs
  (dataset, write_dataset, fragments, file reader/writer/session, TF)
  accept namespace_client_table_context parameter. PyO3 binding extracts
  context fields in Rust.
- Java: NamespaceClientTableContext class with static factory methods.
  All builders (OpenDatasetBuilder, WriteDatasetBuilder, CommitBuilder,
  WriteFragmentBuilder) accept the context. JNI binding extracts context
  fields in Rust.

Also removes deprecated Dataset.create/open overloads and
createWithFfiSchema JNI path in Java.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the enhancement New feature or request label Apr 15, 2026
@jackye1995 jackye1995 closed this Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.