|
1 | 1 | # lance-c |
2 | | -The C binding to Lance. |
| 2 | + |
| 3 | +The C/C++ binding to [Lance](https://github.com/lancedb/lance), providing native access to the Lance columnar format via a stable C ABI and header-only C++ RAII wrappers. |
| 4 | + |
| 5 | +- **C header:** [`include/lance.h`](include/lance.h) |
| 6 | +- **C++ wrappers:** [`include/lance.hpp`](include/lance.hpp) (header-only, RAII, exceptions) |
| 7 | +- **Data exchange:** [Arrow C Data Interface](https://arrow.apache.org/docs/format/CDataInterface.html) for zero-copy interop |
| 8 | + |
| 9 | +## Roadmap |
| 10 | + |
| 11 | +Based on the [liblance RFC](https://github.com/lance-format/lance/discussions/6035). |
| 12 | + |
| 13 | +### Phase 1: Core Read Path + C++ Wrappers (MVP) |
| 14 | + |
| 15 | +| Status | Component | Description | |
| 16 | +|--------|-----------|-------------| |
| 17 | +| [x] | Infrastructure | `lance-c` crate with Cargo.toml, Tokio runtime initialization | |
| 18 | +| [x] | Error handling | Thread-local error codes/messages for cross-FFI safety | |
| 19 | +| [x] | C header | `lance.h` with Arrow C Data Interface structs | |
| 20 | +| [x] | Dataset operations | Open/close with URI + storage options + version support | |
| 21 | +| [x] | Schema export | Arrow C Data Interface for zero-copy schema exchange | |
| 22 | +| [x] | Scanner builder | Column projection, SQL filters, limit/offset, batch size, row ID, fragment filtering | |
| 23 | +| [x] | ArrowArrayStream export | `lance_scanner_to_arrow_stream()` blocking API | |
| 24 | +| [x] | Batch iteration | `lance_scanner_next()` blocking function | |
| 25 | +| [x] | Poll + waker iteration | `lance_scanner_poll_next()` for async engines (Velox, Presto) | |
| 26 | +| [x] | Random access | Index-based row retrieval via `lance_dataset_take()` | |
| 27 | +| [x] | C++ wrappers | Header-only RAII library (`lance::Dataset`, `lance::Scanner`, `lance::Batch`) | |
| 28 | +| [x] | Builder pattern | Fluent Scanner API (`.limit().offset().batch_size().with_row_id()`) | |
| 29 | + |
| 30 | +### Phase 2: Vector Search & Indexing |
| 31 | + |
| 32 | +| Status | Component | Description | |
| 33 | +|--------|-----------|-------------| |
| 34 | +| [ ] | Vector search | Nearest-neighbor via scanner with metric/k/nprobes | |
| 35 | +| [ ] | Full-text search | FTS queries through scanner interface | |
| 36 | +| [ ] | Vector index creation | IVF_PQ, IVF_FLAT, IVF_SQ, HNSW variants | |
| 37 | +| [ ] | Scalar index creation | BTree, Bitmap, Inverted, Label-List indexes | |
| 38 | +| [ ] | Index management | List and drop index operations | |
| 39 | +| [ ] | C++ wrappers | `create_vector_index()` and `create_scalar_index()` methods | |
| 40 | + |
| 41 | +### Phase 3: Write Path & Mutations |
| 42 | + |
| 43 | +| Status | Component | Description | |
| 44 | +|--------|-----------|-------------| |
| 45 | +| [ ] | Dataset write | Create / append / overwrite from ArrowArrayStream | |
| 46 | +| [x] | Fragment writer | Batch-at-a-time fragment file writing (no commit) via `lance_write_fragments()` | |
| 47 | +| [ ] | Delete operations | Predicate-based deletion | |
| 48 | +| [ ] | Update operations | Expression-based row updates | |
| 49 | +| [ ] | Merge-insert | Upsert functionality with builder pattern | |
| 50 | +| [ ] | Schema evolution | Add/drop/alter columns with expressions | |
| 51 | +| [ ] | Version management | Checkout, restore, list version operations | |
| 52 | + |
| 53 | +### Phase 4: Advanced Features |
| 54 | + |
| 55 | +| Status | Component | Description | |
| 56 | +|--------|-----------|-------------| |
| 57 | +| [x] | Fragment-level access | Fragment enumeration, ID listing, scanner fragment filtering | |
| 58 | +| [ ] | Compaction | Fragment consolidation operations | |
| 59 | +| [ ] | Statistics export | Row counts, column stats for query planning | |
| 60 | +| [x] | Cloud storage | S3, GCS, Azure via storage options pass-through | |
| 61 | +| [ ] | Package distribution | vcpkg and Conan recipe packaging | |
| 62 | + |
| 63 | +### Additional (not in RFC) |
| 64 | + |
| 65 | +| Status | Component | Description | |
| 66 | +|--------|-----------|-------------| |
| 67 | +| [x] | Async scan | Callback-based `lance_scanner_scan_async()` for non-blocking scans | |
| 68 | +| [x] | Dataset metadata | `lance_dataset_version()`, `lance_dataset_count_rows()`, `lance_dataset_latest_version()` | |
| 69 | + |
| 70 | +## Building |
| 71 | + |
| 72 | +```bash |
| 73 | +cargo build --release |
| 74 | +``` |
| 75 | + |
| 76 | +The build produces `liblance_c.{so,dylib,dll}` and the headers in `include/`. |
| 77 | + |
| 78 | +## Usage |
| 79 | + |
| 80 | +### C |
| 81 | + |
| 82 | +```c |
| 83 | +#include "lance.h" |
| 84 | + |
| 85 | +LanceDataset* ds = lance_dataset_open("data.lance", NULL, 0); |
| 86 | +if (!ds) { |
| 87 | + printf("Error: %s\n", lance_last_error_message()); |
| 88 | + return 1; |
| 89 | +} |
| 90 | + |
| 91 | +struct ArrowArrayStream stream; |
| 92 | +LanceScanner* scanner = lance_scanner_new(ds, NULL, NULL); |
| 93 | +lance_scanner_to_arrow_stream(scanner, &stream); |
| 94 | +// consume stream... |
| 95 | + |
| 96 | +lance_scanner_close(scanner); |
| 97 | +lance_dataset_close(ds); |
| 98 | +``` |
| 99 | +
|
| 100 | +### C++ |
| 101 | +
|
| 102 | +```cpp |
| 103 | +#include "lance.hpp" |
| 104 | +
|
| 105 | +auto ds = lance::Dataset::open("data.lance"); |
| 106 | +printf("rows: %llu, version: %llu\n", ds.count_rows(), ds.version()); |
| 107 | +
|
| 108 | +ArrowArrayStream stream; |
| 109 | +ds.scan() |
| 110 | + .limit(100) |
| 111 | + .batch_size(1024) |
| 112 | + .to_arrow_stream(&stream); |
| 113 | +// consume stream... |
| 114 | +``` |
| 115 | + |
| 116 | +## License |
| 117 | + |
| 118 | +Apache-2.0 |
0 commit comments