Distributed scalar-index build races on per-fragment zonemap.lance

## Background

`AddIndexExec`'s `FragmentBasedIndexJob` spawns one Spark task per fragment, all sharing a single index UUID via `IndexOptions.withIndexUUID(uuid)`. On the lance-core side this maps each task's per-fragment build to the same `indices_dir().join(uuid)` directory (`rust/lance/src/index/scalar.rs:287` — `LanceIndexStore::from_dataset_for_new(dataset, uuid)`).

For index types that emit a **single artifact per build**, all N executors race to write the same object-store path. For ZONEMAP that artifact is `zonemap.lance` (`rust/lance-index/src/scalar/zonemap.rs:50,745`); whichever task commits last wins, and the read path (`getZonemapStats`) returns only one fragment's worth of zone stats.

## Reproduction

`AddIndexTest.testCreateZonemapIndex` on a 20-fragment table (Spark `local[10]`, 20 single-row inserts):

```java
Set<Integer> fragmentIdsWithStats =
    zonemapStats("id").stream()
        .map(ZoneStats::getFragmentId)
        .collect(Collectors.toSet());
// expected: 20, actual: 1
```

This was the empirical confirmation that surfaced during review of #513. The current PR keeps the loose `assertFalse(zonemapStats("id").isEmpty())` and documents the limitation.

## Why this is also a BTree concern

The same code path is used for `USING btree` (default fragment build_mode). The race is functionally invisible there because btree's user-visible behavior is correctness-preserving: a fragment without an index entry simply gets a full scan — slower, never wrong. That's why the existing `testCreateIndexDistributed` has not caught it. The race is real, just silent.

## Fix options

**(A) Single-call driver-side build for affected types.** Have AddIndexExec call `dataset.createIndex(opts)` once on the driver with all fragment IDs. `build_scalar_index` then loads training data across every fragment and produces a single consistent artifact. Loses per-fragment parallelism (acceptable for ZONEMAP — small payload — possibly not for large btree builds).

**(B) Scalar-index segment commit path.** Mirror lance-core's vector segment-commit pattern (`commit_existing_index_segments`, `rust/lance/src/dataset/index.rs:230`) for scalar indexes. Each task gets a fresh UUID and writes to its own directory; the driver collects N IndexMetadata segments and commits them as separate entries with the same name. `getZonemapStats` already supports multi-segment reads (`java/lance-jni/src/blocking_dataset.rs:3437–3470`). Requires a lance-core API extension.

**(C) Per-task UUID with manifest-side join.** Have lance-spark commit N IndexMetadata entries (different UUIDs, same name) directly via `AddIndexOperation.withNewIndices(List)`. The lance-core `describe_indices` already groups by name (`rust/lance/src/index.rs:957–962`), so consumers see one logical index. Most code-local to lance-spark.

I attempted (A) first but ran into manifest-visibility issues that need deeper Lance-runtime debugging (driver-side `createIndex` returns successfully but `getIndexes` reports zero entries — possibly session-cache opacity). (C) is the most attractive given existing lance-core infrastructure.

## Scope of this issue

- Investigate (A) vs (C); pick the smaller-blast-radius fix.
- Apply to ZONEMAP first (visible symptom), then BTree fragment-mode (silent symptom).
- Add a strict per-fragment-coverage test once the fix lands: `assertEquals(fragmentsIndexed, fragmentIdsWithStats.size())`.

## Related

- #513 — adds `USING zonemap` SQL support; documents this limitation in the PR but does not fix.
- #511 — uses `getZonemapStats` to feed Spark CBO column statistics; the race directly bounds the achievable coverage on un-ANALYZEd tables.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributed scalar-index build races on per-fragment zonemap.lance #514

Background

Reproduction

Why this is also a BTree concern

Fix options

Scope of this issue

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Distributed scalar-index build races on per-fragment zonemap.lance #514

Description

Background

Reproduction

Why this is also a BTree concern

Fix options

Scope of this issue

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions