You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/src/catalog/dir/index.md
+17-12Lines changed: 17 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,10 @@
1
-
# Lance Directory Namespace Catalog Spec
1
+
# Directory Catalog Format Specification
2
2
3
-
**Lance directory namespace** is a catalog that stores tables in a directory structure
4
-
on any local or remote storage system. It has gone through 2 major spec versions so far:
3
+
The **Lance Directory Catalog** is a storage-native catalog format that stores tables in a directory structure on any local or remote storage system. It requires no external metadata service — only a filesystem or object store.
4
+
5
+
Machine learning workloads frequently operate on datasets stored in object storage and favor minimal operational dependencies, even in production environments. However, existing lakehouse formats typically require an external catalog service, while storage-only approaches lack the transactional guarantees required for reliable production use. The Directory Catalog addresses this gap by providing a catalog built directly on top of the Lance table format.
6
+
7
+
The Directory Catalog has gone through 2 major spec versions:
5
8
6
9
-**V1 (Directory Listing)**: A lightweight, simple 1-level namespace that discovers tables by scanning the directory.
7
10
-**V2 (Manifest)**: A more advanced implementation backed by a manifest table (a Lance table) that supports nested namespaces and better performance at scale.
@@ -13,11 +16,11 @@ This mode is ideal for getting started quickly with Lance tables.
13
16
14
17
### Directory Layout
15
18
16
-
A directory namespace maps to a directory on storage, called the **namespace directory**.
17
-
A Lance table corresponds to a subdirectory in the namespace directory that has the format `<table_name>.lance`,
19
+
A directory catalog maps to a directory on storage, called the **catalog directory**.
20
+
A Lance table corresponds to a subdirectory in the catalog directory that has the format `<table_name>.lance`,
18
21
called a **table directory**.
19
22
20
-
Consider the following example namespace directory layout:
23
+
Consider the following example catalog directory layout:
21
24
22
25
```
23
26
.
@@ -38,15 +41,15 @@ Consider the following example namespace directory layout:
38
41
└── .lance-reserved # Marker: table4 is reserved but not created
39
42
```
40
43
41
-
This describes a Lance directory namespace with the namespace directory at `/my/dir1/`.
44
+
This describes a Lance Directory Catalog with the catalog directory at `/my/dir1/`.
42
45
It contains active tables `table1` and `table2` at table directories
43
46
`/my/dir1/table1.lance` and `/my/dir1/table2.lance`.
44
47
Table `table3` exists on storage but is deregistered (excluded from table listings).
45
48
Table `table4` is reserved but not yet created with data.
46
49
47
50
### Table Existence
48
51
49
-
In V1, a table exists in a Lance directory namespace if a table directory of the specific name exists
52
+
In V1, a table exists in a Lance Directory Catalog if a table directory of the specific name exists
50
53
and the table is not marked as deregistered.
51
54
In object store terms, this means the prefix `<table_name>.lance/` has at least one file in it
52
55
and the file `<table_name>.lance/.lance-deregistered` does not exist.
@@ -65,16 +68,18 @@ is created inside the table directory. This causes the table to be excluded from
65
68
and to return "not found" for `DescribeTable` and `TableExists` operations, while preserving the table data
66
69
for potential re-registration.
67
70
68
-
## V2: Manifest
71
+
## V2: Manifest
69
72
70
-
V2 uses a special `__manifest` table (a Lance table) stored in the namespace directory to track all tables
73
+
V2 uses a special `__manifest` table (a Lance table) stored in the catalog directory to track all tables
71
74
and namespaces. This provides several advantages over V1:
72
75
73
76
-**Nested namespaces**: Support for hierarchical namespace organization
74
77
-**Better performance**: Table discovery queries the manifest table instead of scanning the directory and leverages Lance's random access capability.
75
78
-**Metadata support**: All operations can be supported, e.g. namespaces can have associated properties/metadata, tables can be renamed.
76
79
-**Optimized directory path**: Hash-based directory naming prevents conflicts and maximizes throughput in object storage.
77
80
81
+
Because the catalog metadata is itself stored as a Lance table, the catalog inherits the transactional semantics, snapshot isolation, and schema evolution guarantees of the table format, while also benefiting from Lance's random-access-friendly file layout and table-level indexing capabilities.
82
+
78
83
### Directory Layout
79
84
80
85
```
@@ -145,7 +150,7 @@ In [compatibility mode](#compatibility-mode), root namespace tables use `<table_
145
150
146
151
### Table Version Management
147
152
148
-
V2 optionally supports managed table versioning, where table versions are tracked in the `__manifest` table instead of relying on Lance's native version management. When enabled, the directory namespace acts as an [external manifest store](https://lance.org/format/table/transaction/#external-manifest-store). This feature must be enabled for the entire namespace.
153
+
V2 optionally supports managed table versioning, where table versions are tracked in the `__manifest` table instead of relying on Lance's native version management. When enabled, the directory catalog acts as an [external manifest store](https://lance.org/format/table/transaction/#external-manifest-store). This feature must be enabled for the entire catalog.
149
154
150
155
#### Enabling Table Version Management
151
156
@@ -186,7 +191,7 @@ Example metadata JSON:
186
191
187
192
## Compatibility Mode
188
193
189
-
By default, the directory namespace operates in compatibility mode, supporting both V1 and V2 tables simultaneously. This allows gradual migration from V1 to V2 without disrupting existing workflows.
194
+
By default, the directory catalog operates in compatibility mode, supporting both V1 and V2 tables simultaneously. This allows gradual migration from V1 to V2 without disrupting existing workflows.
A **catalog** manages collections of tables and provides table discovery, management, and transactional coordination. Catalog implementations vary widely across deployments, ranging from lightweight environments to enterprise platforms integrating with authorization systems or metadata services such as Apache Hive metastores.
4
+
5
+
To support this range of environments, Lance provides two catalog approaches:
6
+
7
+
## Directory Catalog
8
+
9
+
The **[Directory Catalog](dir/index.md)** is a storage-native catalog format that requires only a filesystem or object store — no additional services are needed. This makes it suitable for lightweight deployments, or even embedded in-process databases.
-**Transactional guarantees**: Catalog metadata is stored as a Lance table, inheriting transactional semantics, snapshot isolation, and schema evolution guarantees
15
+
-**Simple deployment**: Ideal for ML/AI workloads that favor minimal operational dependencies
16
+
17
+
## REST Catalog
18
+
19
+
The **[REST Catalog](rest/index.md)** is an OpenAPI-based protocol that enables reading, writing, and managing Lance tables through a REST API. This is ideal for enterprise environments that require integration with existing governance, access control, and compliance systems.
20
+
21
+
Key characteristics:
22
+
23
+
-**Enterprise integration**: Connect to existing metadata services and authorization systems
-**External manifest store**: Table version management APIs can act as an external manifest store for governance policies
26
+
27
+
## Supported Catalogs
28
+
29
+
Beyond the natively maintained catalog specs, Lance supports integration with external catalog systems through the [Namespace Client Spec](../namespace/index.md). Namespace Client implementation specs for systems like Apache Polaris, Unity Catalog, Apache Hive Metastore, and Apache Iceberg REST Catalog are maintained separately and can be found in the [Supported Catalogs](../namespace/supported-catalogs/index.md) section.
Copy file name to clipboardExpand all lines: docs/src/catalog/rest/index.md
+26-22Lines changed: 26 additions & 22 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,22 +1,26 @@
1
-
# Lance REST Namespace Catalog Spec
1
+
# REST Catalog API Specification
2
2
3
-
In an enterprise environment, typically there is a requirement to store tables in a metadata service
4
-
for more advanced governance features around access control, auditing, lineage tracking, etc.
5
-
**Lance REST Namespace** is an OpenAPI catalog protocol that enables reading, writing and managing Lance tables
6
-
by connecting those metadata services or building a custom metadata server in a standardized way.
7
-
The REST server definition can be found in the [OpenAPI specification](https://editor-next.swagger.io/?url=https://raw.githubusercontent.com/lance-format/lance-namespace/refs/heads/main/docs/src/rest.yaml).
3
+
In enterprise environments, ML teams often must integrate with existing catalog systems to satisfy governance, access control, and compliance requirements. The **Lance REST Catalog** is an OpenAPI protocol that enables reading, writing, and managing Lance tables by connecting to metadata services or building a custom metadata server in a standardized way.
8
4
9
-
## Duality with Client-Side Access Spec
5
+
The REST Catalog specification, defined as an OpenAPI document, describes the data models and metadata operations needed to discover and manage Lance tables. It also defines data operations such as `QueryTable` and `InsertIntoTable` which exchange Arrow record batches via Apache Arrow IPC streams for efficient data transfer and interoperability with Arrow-native compute engines.
10
6
11
-
The Lance Namespace client-side access spec defines request and response models using OpenAPI.
12
-
The REST namespace spec leverages this fact — the REST API is largely identical to the client-side access spec,
7
+
The REST server definition can be found in the [OpenAPI specification](https://editor-next.swagger.io/?url=https://raw.githubusercontent.com/lance-format/lance-namespace/refs/heads/main/docs/src/spec.yaml).
8
+
9
+
## External Manifest Store
10
+
11
+
The REST Catalog also exposes table version management APIs that can act as an external manifest store. When used, table commits are coordinated through the catalog before the resulting table metadata is written to storage. This enables organizations to enforce governance policies such as auditing, access control, and commit validation while still preserving the Lance table format as the authoritative source of table state.
12
+
13
+
## Duality with Namespace Client Spec
14
+
15
+
The Lance Namespace Client spec defines request and response models using OpenAPI.
16
+
The REST Catalog spec leverages this fact — the REST API is largely identical to the Namespace Client spec,
13
17
with the request and response schemas directly used as HTTP request and response bodies.
14
18
15
19
This duality minimizes data conversion between client and server:
16
20
a client can serialize its request model directly to JSON for the HTTP body,
17
21
and deserialize the HTTP response body directly into the response model.
18
22
19
-
There are a few exceptions where the REST spec diverges from the client-side access spec.
23
+
There are a few exceptions where the REST spec diverges from the Namespace Client spec.
20
24
For example, for some operations like `InsertIntoTable`, `CreateTable`, `MergeInsertIntoTable`,
21
25
the HTTP request body is used for transmitting Arrow IPC binary data,
22
26
and the operation request fields are transmitted through query parameters instead.
@@ -48,7 +52,7 @@ When the information in the request body is missing, the server must use the inf
48
52
## Identity Header Mapping
49
53
50
54
All request schemas include an optional `identity` field for authentication.
51
-
For REST Namespace, the identity fields are mapped to HTTP headers:
55
+
For REST Catalog, the identity fields are mapped to HTTP headers:
52
56
53
57
| Identity Field | REST Form | Location |
54
58
|----------------|-----------------|----------|
@@ -65,7 +69,7 @@ All request schemas include an optional `context` field for passing arbitrary ke
65
69
This allows clients to send implementation-specific context that can be used by the server
66
70
or forwarded to downstream services.
67
71
68
-
For REST Namespace, context entries are mapped to HTTP headers using the naming convention:
72
+
For REST Catalog, context entries are mapped to HTTP headers using the naming convention:
|`metadata`| Response body | Direct object `{"key": "value", ...}` (not `{"metadata": {...}}`) |
299
303
300
-
## Namespace Server and Adapter
304
+
## REST Catalog Server and Adapter
301
305
302
-
Any REST HTTP server that implements this OpenAPI protocol is called a **Lance Namespace server**.
303
-
If you are a metadata service provider that is building a custom implementation of Lance namespace,
306
+
Any REST HTTP server that implements this OpenAPI protocol is called a **Lance REST Catalog server**.
307
+
If you are a metadata service provider that is building a custom implementation of Lance catalog,
304
308
building a REST server gives you standardized integration to Lance
305
309
without the need to worry about tool support and
306
310
continuously distribute newer library versions compared to using an implementation.
307
311
308
312
If the main purpose of this server is to be a proxy on top of an existing metadata service,
309
313
converting back and forth between Lance REST API models and native API models of the metadata service,
310
-
then this Lance namespace server is called a **Lance Namespace adapter**.
314
+
then this Lance REST Catalog server is called a **Lance Catalog adapter**.
311
315
312
316
## Choosing between an Adapter vs an Implementation
313
317
314
-
Any adapter can always be directly a Lance namespace implementation bypassing the REST server,
318
+
Any adapter can always be directly a Lance catalog implementation bypassing the REST server,
315
319
and vise versa. In fact, an implementation is basically the backend of an adapter.
316
-
For example, we natively support a Lance HMS Namespace implementation,
317
-
as well as a Lance namespace adapter for HMS by using the HMS Namespace implementation to fulfill requests in the Lance REST server.
320
+
For example, we natively support a Lance HMS Catalog implementation,
321
+
as well as a Lance catalog adapter for HMS by using the HMS Catalog implementation to fulfill requests in the Lance REST server.
318
322
319
-
If you are considering between a Lance namespace adapter vs implementation to build or use in your environment,
323
+
If you are considering between a Lance catalog adapter vs implementation to build or use in your environment,
320
324
here are some criteria to consider:
321
325
322
326
1.**Multi-Language Feasibility & Maintenance Cost**: If you want a single strategy that works across all Lance language bindings, an adapter is preferred.
323
327
Sometimes it is not even possible for an integration to go with the implementation approach since it cannot support all the languages.
324
328
Sometimes an integration is popular or important enough that it is viable to build an implementation and maintain one library per language.
325
-
2.**Tooling Support**: each tool needs to declare the Lance namespace implementations it supports.
326
-
That means there will be a preference for tools to always support a REST namespace,
329
+
2.**Tooling Support**: each tool needs to declare the Lance catalog implementations it supports.
330
+
That means there will be a preference for tools to always support a REST catalog,
327
331
but it might not always support a specific implementation. This favors the adapter approach.
328
332
3.**Security**: if you have security concerns about the adapter being a man-in-the-middle, you should choose an implementation
329
333
4.**Performance**: after all, adapter adds one layer of indirection and is thus not the most performant solution.
0 commit comments