Skip to content

chore: migrate daft.io.lance to daft_lance#6957

Merged
rchowell merged 6 commits into
mainfrom
rchowell/lance-update
May 19, 2026
Merged

chore: migrate daft.io.lance to daft_lance#6957
rchowell merged 6 commits into
mainfrom
rchowell/lance-update

Conversation

@rchowell
Copy link
Copy Markdown
Contributor

Changes Made

Removes ALL daft.io.lance logic in favor of the daft_lance extension. This simply re-exports all symbols and points to the same underlying logic. The pylance version comes from daft_lance which uses 6.0.0

Related Issues

@rchowell rchowell requested a review from a team as a code owner May 18, 2026 21:59
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 18, 2026

Rust Dependency Diff

Head: 7f70f1e21f9b51f448efb5f065103f1c4dac1000 vs Base: 1b68109cd91c9d2456ccb9cd6005c9f73699023b.

OK: Within budget.

  • New Crates: 4
  • Removed Crates: 0

Added

  • chacha20: 0.10.0
  • cpufeatures: 0.3.0
  • rand: 0.10.1
  • rand_core: 0.10.1

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 18, 2026

Greptile Summary

This PR migrates daft.io.lance from owning its own Lance implementation to being a thin shim over the new daft-lance extension package. All implementation code (~1,500 lines across six modules) is replaced with re-exports from daft_lance, and the pylance dependency is swapped out for daft-lance==0.3.2.

  • read_lance is re-exported directly (no deprecation); merge_columns, merge_columns_df, compact_files, and create_scalar_index are wrapped with DeprecationWarning directing users to call them from daft_lance instead.
  • All tests updated to patch the new daft_lance.* module paths, and explain-output assertions updated to reflect the new factory-function module location (daft_lance.lance_scan:_lancedb_table_factory_function).

Confidence Score: 4/5

The change is safe to merge; it is a mechanical replacement of implementation code with re-exports and all logic now lives in the versioned daft-lance package.

Every public function still works via re-export, tests are updated to patch the correct module paths, and the dependency swap is clean. The only nits are stylistic: inline imports inside four deprecated wrapper functions violate the project import-placement rule, and read_lance lacks a deprecation warning while the other four functions have one.

daft/io/lance/_lance.py — inline imports and the missing deprecation warning on read_lance are the only items worth a second look.

Important Files Changed

Filename Overview
daft/io/lance/_lance.py Replaced ~600 lines of implementation with a thin shim: re-exports read_lance directly from daft_lance._lance and wraps the other four public functions with DeprecationWarnings; inline imports inside the wrapper bodies violate the project import style rule.
daft/io/lance/init.py Adds an early ImportError guard for daft_lance, then re-exports the five public symbols from _lance; straightforward and correct.
daft/io/lance/lance_scan.py Now a thin re-export shim for LanceDBScanOperator and both private factory functions from daft_lance.lance_scan; tests and explain output updated to match the new module path.
daft/io/lance/lance_data_sink.py Replaced ~260 lines of LanceDataSink implementation with a single re-export from daft_lance.lance_data_sink; test patches updated to target daft_lance.lance_data_sink.lance.
pyproject.toml Replaced pylance<0.40.0 with daft-lance in the lance extra and pylance==0.39.0 with daft-lance==0.3.2 in dev dependencies; consistent with the migration intent.
tests/io/lancedb/test_lance_batching.py Updated four patch targets from patch.object(LanceDataSink, '_import_lance') to patch('daft_lance.lance_data_sink.lance', ...) to match the new implementation's module-level lance reference.
tests/io/lancedb/test_lancedb_factory_function.py Removed test_importerror_when_lance_missing (which tested daft_lance internals, not Daft code) per the project rule about not testing external library behavior.
tests/dataframe/test_explain.py Updated expected explain output from daft.io.lance.lance_scan: to daft_lance.lance_scan: to match the new module location of the factory functions.
docs/connectors/lance.md Updated code examples and prose to reference daft_lance.merge_columns and daft_lance.compact_files instead of the deprecated daft.io.lance namespace; package installation link added.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["daft.io.lance (public API)"] -->|"import guard\n(ImportError if missing)"| B["daft_lance package"]
    A --> C["daft.io.lance._lance"]
    C -->|"direct re-export"| D["daft_lance._lance.read_lance"]
    C -->|"DeprecationWarning wrapper"| E["daft_lance._lance.merge_columns"]
    C -->|"DeprecationWarning wrapper"| F["daft_lance._lance.merge_columns_df"]
    C -->|"DeprecationWarning wrapper"| G["daft_lance._lance.compact_files"]
    C -->|"DeprecationWarning wrapper"| H["daft_lance._lance.create_scalar_index"]
    A2["daft.io.lance.lance_scan"] -->|"re-export"| I["daft_lance.lance_scan\nLanceDBScanOperator"]
    A3["daft.io.lance.lance_data_sink"] -->|"re-export"| J["daft_lance.lance_data_sink.LanceDataSink"]
    A4["Other daft.io.lance.* modules"] -->|"re-export"| K["daft_lance.* equivalents"]
Loading

Reviews (1): Last reviewed commit: "chore: migrate daft.io.lance to daft_lan..." | Re-trigger Greptile

Comment thread daft/io/lance/_lance.py Outdated
Comment thread daft/io/lance/_lance.py Outdated
@blacksmith-sh

This comment has been minimized.

@rchowell
Copy link
Copy Markdown
Contributor Author

Fixing up the checkpoint signature code/test. The daft_lance read_lance method does indeed have a checkpoint kwarg.

@PublicAPI
def read_lance(
    uri: str | pathlib.Path,
    io_config: IOConfig | None = None,
    version: str | int | None = None,
    asof: str | None = None,
    block_size: int | None = None,
    commit_lock: object | None = None,
    index_cache_size: int | None = None,
    default_scan_options: dict[str, str] | None = None,
    metadata_cache_size_bytes: int | None = None,
    fragment_group_size: int | None = None,
    include_fragment_id: bool | None = None,
    checkpoint: CheckpointConfig | None = None,  # <--- v0.3.2

@codecov
Copy link
Copy Markdown

codecov Bot commented May 19, 2026

Codecov Report

❌ Patch coverage is 80.00000% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.57%. Comparing base (1b68109) to head (1884cc4).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
daft/io/lance/lance_compaction.py 0.00% 2 Missing ⚠️
daft/io/lance/lance_scalar_index.py 0.00% 2 Missing ⚠️
daft/io/lance/utils.py 0.00% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #6957      +/-   ##
==========================================
- Coverage   75.70%   75.57%   -0.14%     
==========================================
  Files        1139     1142       +3     
  Lines      162472   161042    -1430     
==========================================
- Hits       123006   121704    -1302     
+ Misses      39466    39338     -128     
Files with missing lines Coverage Δ
daft/io/lance/__init__.py 100.00% <100.00%> (ø)
daft/io/lance/_lance.py 100.00% <100.00%> (+9.21%) ⬆️
daft/io/lance/lance_data_sink.py 100.00% <100.00%> (+13.83%) ⬆️
daft/io/lance/lance_merge_column.py 100.00% <100.00%> (+59.03%) ⬆️
daft/io/lance/lance_scan.py 100.00% <100.00%> (+10.31%) ⬆️
daft/io/lance/point_lookup.py 100.00% <100.00%> (+25.00%) ⬆️
daft/io/lance/lance_compaction.py 0.00% <0.00%> (-90.63%) ⬇️
daft/io/lance/lance_scalar_index.py 0.00% <0.00%> (-83.15%) ⬇️
daft/io/lance/utils.py 0.00% <0.00%> (-59.58%) ⬇️

... and 47 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Member

@universalmind303 universalmind303 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think this deviates from our lance integration plan slightly. AFAIK, the plan was to keep read_lance and write_lance as first class integrations, but everything else goes through daft_lance.

I also noticed that this PR skips updating write_lance all together. So we should update this PR to have write_lance delegate to daft_lance.write_lance

Comment thread daft/io/lance/_lance.py Outdated
@rchowell rchowell changed the title chore!: migrate daft.io.lance to daft_lance chore: migrate daft.io.lance to daft_lance May 19, 2026
@rchowell rchowell enabled auto-merge (squash) May 19, 2026 19:29
@rchowell rchowell merged commit 8ae8dfc into main May 19, 2026
40 checks passed
@rchowell rchowell deleted the rchowell/lance-update branch May 19, 2026 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: support pylance 4.0.1

2 participants