Skip to content

Commit c91adb4

Browse files
committed
commit
1 parent 8382878 commit c91adb4

156 files changed

Lines changed: 240 additions & 201 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/java.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ on:
2323
- ready_for_review
2424
- reopened
2525
paths:
26-
- docs/src/rest.yaml
26+
- docs/src/spec.yaml
2727
- java/**
2828
- .github/workflows/java.yml
2929

Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212

1313
.PHONY: lint
1414
lint:
15-
uv run openapi-spec-validator --errors all docs/src/rest.yaml
15+
uv run openapi-spec-validator --errors all docs/src/spec.yaml
1616

1717
.PHONY: clean-rust
1818
clean-rust:

docs/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212

1313
# Java model docs source and destination
1414
JAVA_DOCS_SRC := ../java/lance-namespace-apache-client/docs
15-
MODELS_DEST := src/client/operations/models
15+
MODELS_DEST := src/namespace/operations/models
1616

1717
# API files to exclude (Java-specific, not data models)
1818
API_FILES := DataApi.md IndexApi.md MetadataApi.md NamespaceApi.md TableApi.md TagApi.md TransactionApi.md

docs/mkdocs.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
site_name: Lance Namespace
2-
site_description: open specification on top of the storage-based Lance data format to standardize access to a collection of Lance tables
1+
site_name: Lance Catalog & Namespace
2+
site_description: Open specification for managing collections of Lance tables through catalog specs (Directory and REST) and a unified Namespace SDK
33
site_url: https://lance.org/format/namespace/
44
docs_dir: src
55

docs/src/.pages

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,4 @@
11
nav:
22
- index.md
3-
- Client Spec: client
4-
- Directory Namespace: dir
5-
- REST Namespace: rest
6-
- Catalog Integrations: integrations
7-
- Partitioning Spec: partitioning-spec.md
3+
- Catalog Specs: catalog
4+
- Namespace Client Spec: namespace

docs/src/catalog/.pages

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
title: Catalog Specs
2+
nav:
3+
- Overview: index.md
4+
- Directory Catalog: dir
5+
- REST Catalog: rest
Lines changed: 17 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,10 @@
1-
# Lance Directory Namespace Catalog Spec
1+
# Directory Catalog Format Specification
22

3-
**Lance directory namespace** is a catalog that stores tables in a directory structure
4-
on any local or remote storage system. It has gone through 2 major spec versions so far:
3+
The **Lance Directory Catalog** is a storage-native catalog format that stores tables in a directory structure on any local or remote storage system. It requires no external metadata service — only a filesystem or object store.
4+
5+
Machine learning workloads frequently operate on datasets stored in object storage and favor minimal operational dependencies, even in production environments. However, existing lakehouse formats typically require an external catalog service, while storage-only approaches lack the transactional guarantees required for reliable production use. The Directory Catalog addresses this gap by providing a catalog built directly on top of the Lance table format.
6+
7+
The Directory Catalog has gone through 2 major spec versions:
58

69
- **V1 (Directory Listing)**: A lightweight, simple 1-level namespace that discovers tables by scanning the directory.
710
- **V2 (Manifest)**: A more advanced implementation backed by a manifest table (a Lance table) that supports nested namespaces and better performance at scale.
@@ -13,11 +16,11 @@ This mode is ideal for getting started quickly with Lance tables.
1316

1417
### Directory Layout
1518

16-
A directory namespace maps to a directory on storage, called the **namespace directory**.
17-
A Lance table corresponds to a subdirectory in the namespace directory that has the format `<table_name>.lance`,
19+
A directory catalog maps to a directory on storage, called the **catalog directory**.
20+
A Lance table corresponds to a subdirectory in the catalog directory that has the format `<table_name>.lance`,
1821
called a **table directory**.
1922

20-
Consider the following example namespace directory layout:
23+
Consider the following example catalog directory layout:
2124

2225
```
2326
.
@@ -38,15 +41,15 @@ Consider the following example namespace directory layout:
3841
└── .lance-reserved # Marker: table4 is reserved but not created
3942
```
4043

41-
This describes a Lance directory namespace with the namespace directory at `/my/dir1/`.
44+
This describes a Lance Directory Catalog with the catalog directory at `/my/dir1/`.
4245
It contains active tables `table1` and `table2` at table directories
4346
`/my/dir1/table1.lance` and `/my/dir1/table2.lance`.
4447
Table `table3` exists on storage but is deregistered (excluded from table listings).
4548
Table `table4` is reserved but not yet created with data.
4649

4750
### Table Existence
4851

49-
In V1, a table exists in a Lance directory namespace if a table directory of the specific name exists
52+
In V1, a table exists in a Lance Directory Catalog if a table directory of the specific name exists
5053
and the table is not marked as deregistered.
5154
In object store terms, this means the prefix `<table_name>.lance/` has at least one file in it
5255
and the file `<table_name>.lance/.lance-deregistered` does not exist.
@@ -65,16 +68,18 @@ is created inside the table directory. This causes the table to be excluded from
6568
and to return "not found" for `DescribeTable` and `TableExists` operations, while preserving the table data
6669
for potential re-registration.
6770

68-
## V2: Manifest
71+
## V2: Manifest
6972

70-
V2 uses a special `__manifest` table (a Lance table) stored in the namespace directory to track all tables
73+
V2 uses a special `__manifest` table (a Lance table) stored in the catalog directory to track all tables
7174
and namespaces. This provides several advantages over V1:
7275

7376
- **Nested namespaces**: Support for hierarchical namespace organization
7477
- **Better performance**: Table discovery queries the manifest table instead of scanning the directory and leverages Lance's random access capability.
7578
- **Metadata support**: All operations can be supported, e.g. namespaces can have associated properties/metadata, tables can be renamed.
7679
- **Optimized directory path**: Hash-based directory naming prevents conflicts and maximizes throughput in object storage.
7780

81+
Because the catalog metadata is itself stored as a Lance table, the catalog inherits the transactional semantics, snapshot isolation, and schema evolution guarantees of the table format, while also benefiting from Lance's random-access-friendly file layout and table-level indexing capabilities.
82+
7883
### Directory Layout
7984

8085
```
@@ -145,7 +150,7 @@ In [compatibility mode](#compatibility-mode), root namespace tables use `<table_
145150

146151
### Table Version Management
147152

148-
V2 optionally supports managed table versioning, where table versions are tracked in the `__manifest` table instead of relying on Lance's native version management. When enabled, the directory namespace acts as an [external manifest store](https://lance.org/format/table/transaction/#external-manifest-store). This feature must be enabled for the entire namespace.
153+
V2 optionally supports managed table versioning, where table versions are tracked in the `__manifest` table instead of relying on Lance's native version management. When enabled, the directory catalog acts as an [external manifest store](https://lance.org/format/table/transaction/#external-manifest-store). This feature must be enabled for the entire catalog.
149154

150155
#### Enabling Table Version Management
151156

@@ -186,7 +191,7 @@ Example metadata JSON:
186191

187192
## Compatibility Mode
188193

189-
By default, the directory namespace operates in compatibility mode, supporting both V1 and V2 tables simultaneously. This allows gradual migration from V1 to V2 without disrupting existing workflows.
194+
By default, the directory catalog operates in compatibility mode, supporting both V1 and V2 tables simultaneously. This allows gradual migration from V1 to V2 without disrupting existing workflows.
190195

191196
In compatibility mode:
192197

docs/src/catalog/index.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# Lance Catalog Specs
2+
3+
A **catalog** manages collections of tables and provides table discovery, management, and transactional coordination. Catalog implementations vary widely across deployments, ranging from lightweight environments to enterprise platforms integrating with authorization systems or metadata services such as Apache Hive metastores.
4+
5+
To support this range of environments, Lance provides two catalog approaches:
6+
7+
## Directory Catalog
8+
9+
The **[Directory Catalog](dir/index.md)** is a storage-native catalog format that requires only a filesystem or object store — no additional services are needed. This makes it suitable for lightweight deployments, or even embedded in-process databases.
10+
11+
Key characteristics:
12+
13+
- **Zero infrastructure**: Requires only storage (local filesystem, S3, GCS, Azure, etc.)
14+
- **Transactional guarantees**: Catalog metadata is stored as a Lance table, inheriting transactional semantics, snapshot isolation, and schema evolution guarantees
15+
- **Simple deployment**: Ideal for ML/AI workloads that favor minimal operational dependencies
16+
17+
## REST Catalog
18+
19+
The **[REST Catalog](rest/index.md)** is an OpenAPI-based protocol that enables reading, writing, and managing Lance tables through a REST API. This is ideal for enterprise environments that require integration with existing governance, access control, and compliance systems.
20+
21+
Key characteristics:
22+
23+
- **Enterprise integration**: Connect to existing metadata services and authorization systems
24+
- **Standardized API**: OpenAPI specification enables consistent client/server implementations
25+
- **External manifest store**: Table version management APIs can act as an external manifest store for governance policies
26+
27+
## Supported Catalogs
28+
29+
Beyond the natively maintained catalog specs, Lance supports integration with external catalog systems through the [Namespace Client Spec](../namespace/index.md). Namespace Client implementation specs for systems like Apache Polaris, Unity Catalog, Apache Hive Metastore, and Apache Iceberg REST Catalog are maintained separately and can be found in the [Supported Catalogs](../namespace/supported-catalogs/index.md) section.
Lines changed: 26 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,26 @@
1-
# Lance REST Namespace Catalog Spec
1+
# REST Catalog API Specification
22

3-
In an enterprise environment, typically there is a requirement to store tables in a metadata service
4-
for more advanced governance features around access control, auditing, lineage tracking, etc.
5-
**Lance REST Namespace** is an OpenAPI catalog protocol that enables reading, writing and managing Lance tables
6-
by connecting those metadata services or building a custom metadata server in a standardized way.
7-
The REST server definition can be found in the [OpenAPI specification](https://editor-next.swagger.io/?url=https://raw.githubusercontent.com/lance-format/lance-namespace/refs/heads/main/docs/src/rest.yaml).
3+
In enterprise environments, ML teams often must integrate with existing catalog systems to satisfy governance, access control, and compliance requirements. The **Lance REST Catalog** is an OpenAPI protocol that enables reading, writing, and managing Lance tables by connecting to metadata services or building a custom metadata server in a standardized way.
84

9-
## Duality with Client-Side Access Spec
5+
The REST Catalog specification, defined as an OpenAPI document, describes the data models and metadata operations needed to discover and manage Lance tables. It also defines data operations such as `QueryTable` and `InsertIntoTable` which exchange Arrow record batches via Apache Arrow IPC streams for efficient data transfer and interoperability with Arrow-native compute engines.
106

11-
The Lance Namespace client-side access spec defines request and response models using OpenAPI.
12-
The REST namespace spec leverages this fact — the REST API is largely identical to the client-side access spec,
7+
The REST server definition can be found in the [OpenAPI specification](https://editor-next.swagger.io/?url=https://raw.githubusercontent.com/lance-format/lance-namespace/refs/heads/main/docs/src/spec.yaml).
8+
9+
## External Manifest Store
10+
11+
The REST Catalog also exposes table version management APIs that can act as an external manifest store. When used, table commits are coordinated through the catalog before the resulting table metadata is written to storage. This enables organizations to enforce governance policies such as auditing, access control, and commit validation while still preserving the Lance table format as the authoritative source of table state.
12+
13+
## Duality with Namespace Client Spec
14+
15+
The Lance Namespace Client spec defines request and response models using OpenAPI.
16+
The REST Catalog spec leverages this fact — the REST API is largely identical to the Namespace Client spec,
1317
with the request and response schemas directly used as HTTP request and response bodies.
1418

1519
This duality minimizes data conversion between client and server:
1620
a client can serialize its request model directly to JSON for the HTTP body,
1721
and deserialize the HTTP response body directly into the response model.
1822

19-
There are a few exceptions where the REST spec diverges from the client-side access spec.
23+
There are a few exceptions where the REST spec diverges from the Namespace Client spec.
2024
For example, for some operations like `InsertIntoTable`, `CreateTable`, `MergeInsertIntoTable`,
2125
the HTTP request body is used for transmitting Arrow IPC binary data,
2226
and the operation request fields are transmitted through query parameters instead.
@@ -48,7 +52,7 @@ When the information in the request body is missing, the server must use the inf
4852
## Identity Header Mapping
4953

5054
All request schemas include an optional `identity` field for authentication.
51-
For REST Namespace, the identity fields are mapped to HTTP headers:
55+
For REST Catalog, the identity fields are mapped to HTTP headers:
5256

5357
| Identity Field | REST Form | Location |
5458
|----------------|-----------------|----------|
@@ -65,7 +69,7 @@ All request schemas include an optional `context` field for passing arbitrary ke
6569
This allows clients to send implementation-specific context that can be used by the server
6670
or forwarded to downstream services.
6771

68-
For REST Namespace, context entries are mapped to HTTP headers using the naming convention:
72+
For REST Catalog, context entries are mapped to HTTP headers using the naming convention:
6973

7074
| Context Entry | REST Form | Location |
7175
|----------------------------|-------------------------------|----------|
@@ -297,33 +301,33 @@ Both request and response bodies are direct objects (map of string to string) in
297301
|----------------|---------------|-------------------------------------------------------------------|
298302
| `metadata` | Response body | Direct object `{"key": "value", ...}` (not `{"metadata": {...}}`) |
299303

300-
## Namespace Server and Adapter
304+
## REST Catalog Server and Adapter
301305

302-
Any REST HTTP server that implements this OpenAPI protocol is called a **Lance Namespace server**.
303-
If you are a metadata service provider that is building a custom implementation of Lance namespace,
306+
Any REST HTTP server that implements this OpenAPI protocol is called a **Lance REST Catalog server**.
307+
If you are a metadata service provider that is building a custom implementation of Lance catalog,
304308
building a REST server gives you standardized integration to Lance
305309
without the need to worry about tool support and
306310
continuously distribute newer library versions compared to using an implementation.
307311

308312
If the main purpose of this server is to be a proxy on top of an existing metadata service,
309313
converting back and forth between Lance REST API models and native API models of the metadata service,
310-
then this Lance namespace server is called a **Lance Namespace adapter**.
314+
then this Lance REST Catalog server is called a **Lance Catalog adapter**.
311315

312316
## Choosing between an Adapter vs an Implementation
313317

314-
Any adapter can always be directly a Lance namespace implementation bypassing the REST server,
318+
Any adapter can always be directly a Lance catalog implementation bypassing the REST server,
315319
and vise versa. In fact, an implementation is basically the backend of an adapter.
316-
For example, we natively support a Lance HMS Namespace implementation,
317-
as well as a Lance namespace adapter for HMS by using the HMS Namespace implementation to fulfill requests in the Lance REST server.
320+
For example, we natively support a Lance HMS Catalog implementation,
321+
as well as a Lance catalog adapter for HMS by using the HMS Catalog implementation to fulfill requests in the Lance REST server.
318322

319-
If you are considering between a Lance namespace adapter vs implementation to build or use in your environment,
323+
If you are considering between a Lance catalog adapter vs implementation to build or use in your environment,
320324
here are some criteria to consider:
321325

322326
1. **Multi-Language Feasibility & Maintenance Cost**: If you want a single strategy that works across all Lance language bindings, an adapter is preferred.
323327
Sometimes it is not even possible for an integration to go with the implementation approach since it cannot support all the languages.
324328
Sometimes an integration is popular or important enough that it is viable to build an implementation and maintain one library per language.
325-
2. **Tooling Support**: each tool needs to declare the Lance namespace implementations it supports.
326-
That means there will be a preference for tools to always support a REST namespace,
329+
2. **Tooling Support**: each tool needs to declare the Lance catalog implementations it supports.
330+
That means there will be a preference for tools to always support a REST catalog,
327331
but it might not always support a specific implementation. This favors the adapter approach.
328332
3. **Security**: if you have security concerns about the adapter being a man-in-the-middle, you should choose an implementation
329333
4. **Performance**: after all, adapter adds one layer of indirection and is thus not the most performant solution.

docs/src/client/index.md

Lines changed: 0 additions & 48 deletions
This file was deleted.

0 commit comments

Comments
 (0)