Skip to content

Commit 94b5e16

Browse files
committed
feat: add spec for Iceberg namespace
1 parent e81eb3d commit 94b5e16

4 files changed

Lines changed: 54 additions & 4 deletions

File tree

README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,8 @@
44

55
**Lance Namespace** is an open specification on top of the storage-based Lance data format
66
to standardize access to a collection of Lance tables (a.k.a. Lance datasets).
7-
It describes how a metadata service like Apache Hive MetaStore (HMS), Apache Gravitino, Unity Catalog, etc.
8-
should store and use Lance tables, as well as how ML/AI tools and analytics compute engines should integrate with Lance tables.
7+
It describes how a metadata service like Apache Hive MetaStore (HMS), Apache Iceberg REST Catalog (IRC),
8+
Apache Gravitino, Unity Catalog, etc. should store and use Lance tables,
9+
as well as how ML/AI tools and analytics compute engines should integrate with Lance tables.
910

1011
For more details, please visit the [documentation website](https://lancedb.github.io/lance-namespace).

docs/mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,5 +77,6 @@ nav:
7777
- REST: spec/impls/rest.md
7878
- Directory: spec/impls/dir.md
7979
- Apache Hive MetaStore: spec/impls/hive.md
80+
- Apache Icbeerg REST Catalog: spec/impls/iceberg.md
8081
- Contributing: contributing.md
8182
- Lance: https://lancedb.github.io/lance

docs/src/index.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,9 @@
66

77
**Lance Namespace Specification** is an open specification on top of the storage-based Lance data format
88
to standardize access to a collection of Lance tables (a.k.a. Lance datasets).
9-
It describes how a metadata service like Apache Hive MetaStore (HMS), Apache Gravitino, Unity Catalog, etc.
10-
should store and use Lance tables, as well as how ML/AI tools and analytics compute engines should integrate with Lance tables.
9+
It describes how a metadata service like Apache Hive MetaStore (HMS), Apache Iceberg REST Catalog (IRC),
10+
Apache Gravitino, Unity Catalog, etc. should store and use Lance tables,
11+
as well as how ML/AI tools and analytics compute engines should integrate with Lance tables.
1112

1213
## Why _Namespace_ not _Catalog_?
1314

docs/src/spec/impls/iceberg.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# Lance Iceberg Namespace
2+
3+
**Lance Iceberg Namespace** is an implementation using Apache Iceberg REST Catalog (IRC).
4+
For more details about IRC, please read the [IRC Specification](https://github.com/apache/iceberg/blob/apache-iceberg-1.9.0/open-api/rest-catalog-open-api.yaml).
5+
6+
!!! note
7+
8+
This implementation is designed against the IRC spec as of Iceberg release version 1.9.0.
9+
10+
## Namespace Mapping
11+
12+
An IRC server can be viewed as the root Lance namespace.
13+
The Iceberg multi-level namespaces map to the multi-level child namespaces.
14+
Whether the namespace is leveled and the number of levels depend on the specific IRC provider.
15+
16+
## Table Definition
17+
18+
A Lance table should appear as a table object in IRC in the shape of Iceberg [TableMetadata](https://github.com/apache/iceberg/blob/apache-iceberg-1.9.0/open-api/rest-catalog-open-api.yaml#L2482),
19+
with the following requirements:
20+
21+
1. the [`location`](https://github.com/apache/iceberg/blob/apache-iceberg-1.9.0/open-api/rest-catalog-open-api.yaml#L2494) must point to the root location of the Lance table
22+
2. the [`properties`](https://github.com/apache/iceberg/blob/apache-iceberg-1.9.0/open-api/rest-catalog-open-api.yaml#L2499) must follow:
23+
1. there is a key `table_type` set to `lance` (case insensitive)
24+
2. there is a key `managed_by` set to either `storage` or `impl` (case insensitive). If not set, default to `storage`
25+
3. the [`current-snapshot-id`](https://github.com/apache/iceberg/blob/apache-iceberg-1.9.0/open-api/rest-catalog-open-api.yaml#L2535) is set to the latest numeric version number of the table. This field will only be respected if `managed_by=impl`
26+
27+
## Requirement for Implementation Managed Table
28+
29+
An update to the implementation managed table must go through IRC [UpdateTable](https://github.com/apache/iceberg/blob/apache-iceberg-1.9.0/open-api/rest-catalog-open-api.yaml#L997) API
30+
or [CommitTransaction](https://github.com/apache/iceberg/blob/apache-iceberg-1.9.0/open-api/rest-catalog-open-api.yaml#L1336) API
31+
with a requirement that the [`assert-ref-snapshot-id`](https://github.com/apache/iceberg/blob/apache-iceberg-1.9.0/open-api/rest-catalog-open-api.yaml#L3051) is the current Lance table version.
32+
If the commit fails due to unresolvable concurrent commits, the IRC server must fail with `409 Conflict` according to the IRC spec.
33+
34+
## Using Lance Table in IRC with Iceberg Tooling
35+
36+
In order to use the table with Iceberg tooling (e.g. `pyiceberg`), the implementation can optionally set the following
37+
in Iceberg [TableMetadata](https://github.com/apache/iceberg/blob/apache-iceberg-1.9.0/open-api/rest-catalog-open-api.yaml#L2482):
38+
39+
1. there is at least one schema in the list of [`schemas`](https://github.com/apache/iceberg/blob/apache-iceberg-1.9.0/open-api/rest-catalog-open-api.yaml#L2504)
40+
1. the schema reflects the latest schema of the Lance table
41+
2. the schema has ID `1`
42+
3. the data type conversion follows Apache Arrow to Apache Iceberg data type conversion.
43+
2. the [`current-schema-id`](https://github.com/apache/iceberg/blob/apache-iceberg-1.9.0/open-api/rest-catalog-open-api.yaml#L2508C9-L2508C26) is set to `1`
44+
3. there is at least one snapshot in the list of [`snapshots`](https://github.com/apache/iceberg/blob/apache-iceberg-1.9.0/open-api/rest-catalog-open-api.yaml#L2529).
45+
1. the snapshot should have [`snapshot-id`](https://github.com/apache/iceberg/blob/apache-iceberg-1.9.0/open-api/rest-catalog-open-api.yaml#L2399) set to the latest numeric version number of the table.
46+
4. there is at least one snapshot log in the list of [`snapshot-log`](https://github.com/apache/iceberg/blob/apache-iceberg-1.9.0/open-api/rest-catalog-open-api.yaml#L2542)
47+
1. the snapshot log should have [`snapshot-id`](https://github.com/apache/iceberg/blob/apache-iceberg-1.9.0/open-api/rest-catalog-open-api.yaml#L2461) set to the latest numeric version number of the table.

0 commit comments

Comments
 (0)