feat: add spec for Iceberg namespace by jackye1995 · Pull Request #118 · lance-format/lance-namespace

jackye1995 · 2025-07-08T05:08:37Z

No description provided.

bryanck · 2025-07-08T13:59:19Z

+1. the [`location`](https://github.com/apache/iceberg/blob/apache-iceberg-1.9.0/open-api/rest-catalog-open-api.yaml#L2494) must point to the root location of the Lance table
+2. the [`properties`](https://github.com/apache/iceberg/blob/apache-iceberg-1.9.0/open-api/rest-catalog-open-api.yaml#L2499) must follow:
+    1. there is a key `table_type` set to `lance` (case insensitive)
+    2. there is a key `managed_by` set to either `storage` or `impl` (case insensitive). If not set, default to `storage`


What is this property for?

for this: https://lancedb.github.io/lance-namespace/spec/implementations/#implementation-and-storage

if the default is storage, should we add a equivalent section to "Requirement for Implementation Managed Table"

yes good point, let me do that

ended up not adding a section since there is no particular requirement I can think of except for the ones in table definition. Will add one later if there is additional points unique to storage managed tables.

bryanck · 2025-07-08T14:17:56Z

+## Requirement for Implementation Managed Table
+
+An update to the implementation managed table must go through IRC [UpdateTable](https://github.com/apache/iceberg/blob/apache-iceberg-1.9.0/open-api/rest-catalog-open-api.yaml#L997) API 
+or [CommitTransaction](https://github.com/apache/iceberg/blob/apache-iceberg-1.9.0/open-api/rest-catalog-open-api.yaml#L1336) API


Is the plan to support all IRC metadata updates? Currently Lance doesn't have branching, partitioning, etc. Maybe you could add how incompatibilities will be handled.

I guess this part is still hypothetical. I think we will need to start with storage managed table, which will only leverage LoadTable and CreateTable, and all the update commits directly go against the storage.

bryanck · 2025-07-08T14:26:21Z

+An update to the implementation managed table must go through IRC [UpdateTable](https://github.com/apache/iceberg/blob/apache-iceberg-1.9.0/open-api/rest-catalog-open-api.yaml#L997) API 
+or [CommitTransaction](https://github.com/apache/iceberg/blob/apache-iceberg-1.9.0/open-api/rest-catalog-open-api.yaml#L1336) API
+with a requirement that the [`assert-ref-snapshot-id`](https://github.com/apache/iceberg/blob/apache-iceberg-1.9.0/open-api/rest-catalog-open-api.yaml#L3051) is the current Lance table version.
+If the commit fails due to unresolvable concurrent commits, the IRC server must fail with `409 Conflict` according to the IRC spec.


Does Lance support concurrent commits or are all commits serialized using the version number?

Does Lance support concurrent commits or are all commits serialized using the version number?

I think it's both, it's basically the same as Iceberg. There is a rebase process among concurrent commits, and then there is the commit to next version (equivalent to Iceberg's commit to catalog), if that fails it retries rebase and commit against the new concurrent commit.

bryanck · 2025-07-08T15:38:05Z

+
+1. the [`location`](https://github.com/apache/iceberg/blob/apache-iceberg-1.9.0/open-api/rest-catalog-open-api.yaml#L2494) must point to the root location of the Lance table
+2. the [`properties`](https://github.com/apache/iceberg/blob/apache-iceberg-1.9.0/open-api/rest-catalog-open-api.yaml#L2499) must follow:
+    1. there is a key `table_type` set to `lance` (case insensitive)


When a request comes in it seems we would need to know the type before we load the metadata?

In my mind the IRC server should be able to manipulate what is returned as the response of LoadTable, for example if it's a Lance table, it can just generate a TableMetadata JSON payload without the need to read any actual Iceberg metadata JSON file, and provide this information in the properties section. Is that feasible?

HMS with iceberg works like this. It look up the generic table object and then see what the table_type property is

this also reminds me of Polaris generic table
https://polaris.apache.org/in-dev/unreleased/generic-table/

it can just generate a TableMetadata JSON payload without the need to read any actual Iceberg metadata JSON file, and provide this information in the properties section. Is that feasible?

i think its feasible as long as the reader uses the TableMetadata and not re-read the metadata.json file (i think spark does this today).

HMS with iceberg works like this. It look up the generic table object and then see what the table_type property is

yes exactly, this is derived from the HMS experience.

i think its feasible as long as the reader uses the TableMetadata and not re-read the metadata.json file (i think spark does this today).

Yes. Maybe I should add a line that the metadata-location must not be set.

kevinjqliu

cool!

kevinjqliu · 2025-07-08T22:40:29Z

-It describes how a metadata service like Apache Hive MetaStore (HMS), Apache Gravitino, Unity Catalog, etc.
-should store and use Lance tables, as well as how ML/AI tools and analytics compute engines should integrate with Lance tables.
+It describes how a metadata service like Apache Hive MetaStore (HMS), Apache Iceberg REST Catalog (IRC),
+Apache Gravitino, Unity Catalog, etc. should store and use Lance tables, 


are the "Apache Gravitino, Unity Catalog" integrations via IRC or custom?

I think we are considering 2 approaches - (1) through HMS, since that is what have been the most mature integration route in both Unity and Gravitino, (2) through REST, which we will add another REST server in these projects.

See apache/gravitino#7358 for example.

For IRC, I am actually not sure. Maybe we could integrate through this Iceberg way, but both in Gravitino and Unity, IRC is purposely built for Iceberg tables at this moment. It would require some paradigm shift. But if we think this is a good direction, happy to push for that approach in both communities.

kevinjqliu · 2025-07-08T22:42:50Z

+
+## Namespace Mapping
+
+An IRC server can be viewed as the root Lance namespace.


nit: iceberg tables in IRC are addressed as (namespace, table) where namespace can be "nested"/"leveled"

its not clear to me how lance tables will be presented here

I think the mapping is that

IRC server: Lance root namespace

IRC multi-level namespace: Lance multi-level namespace

IRC tables in a namesapce: Lance tables in a namesapce

Do you think the wording is not clear? Or did I misunderstand the comment?

kevinjqliu · 2025-07-08T22:46:58Z

+
+1. the [`location`](https://github.com/apache/iceberg/blob/apache-iceberg-1.9.0/open-api/rest-catalog-open-api.yaml#L2494) must point to the root location of the Lance table
+2. the [`properties`](https://github.com/apache/iceberg/blob/apache-iceberg-1.9.0/open-api/rest-catalog-open-api.yaml#L2499) must follow:
+    1. there is a key `table_type` set to `lance` (case insensitive)


HMS with iceberg works like this. It look up the generic table object and then see what the table_type property is

this also reminds me of Polaris generic table
https://polaris.apache.org/in-dev/unreleased/generic-table/

kevinjqliu · 2025-07-08T22:47:44Z

+
+1. the [`location`](https://github.com/apache/iceberg/blob/apache-iceberg-1.9.0/open-api/rest-catalog-open-api.yaml#L2494) must point to the root location of the Lance table
+2. the [`properties`](https://github.com/apache/iceberg/blob/apache-iceberg-1.9.0/open-api/rest-catalog-open-api.yaml#L2499) must follow:
+    1. there is a key `table_type` set to `lance` (case insensitive)


it can just generate a TableMetadata JSON payload without the need to read any actual Iceberg metadata JSON file, and provide this information in the properties section. Is that feasible?

i think its feasible as long as the reader uses the TableMetadata and not re-read the metadata.json file (i think spark does this today).

kevinjqliu · 2025-07-08T22:50:30Z

+1. the [`location`](https://github.com/apache/iceberg/blob/apache-iceberg-1.9.0/open-api/rest-catalog-open-api.yaml#L2494) must point to the root location of the Lance table
+2. the [`properties`](https://github.com/apache/iceberg/blob/apache-iceberg-1.9.0/open-api/rest-catalog-open-api.yaml#L2499) must follow:
+    1. there is a key `table_type` set to `lance` (case insensitive)
+    2. there is a key `managed_by` set to either `storage` or `impl` (case insensitive). If not set, default to `storage`


if the default is storage, should we add a equivalent section to "Requirement for Implementation Managed Table"

jackye1995 · 2025-07-09T16:54:17Z

Merging to proceed with the next step

github-actions Bot added enhancement New feature or request spec Restful openapi spec labels Jul 8, 2025

bryanck reviewed Jul 8, 2025

View reviewed changes

kevinjqliu reviewed Jul 8, 2025

View reviewed changes

jackye1995 added 2 commits July 8, 2025 17:25

feat: add spec for Iceberg namespace

0e11724

add no metadata-location requirement

1b02c83

jackye1995 force-pushed the iceberg branch from 94b5e16 to 1b02c83 Compare July 9, 2025 16:48

jackye1995 merged commit 6f2f3f3 into lance-format:main Jul 9, 2025
9 checks passed


		## Namespace Mapping

		An IRC server can be viewed as the root Lance namespace.

Uh oh!

Conversation

jackye1995 commented Jul 8, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kevinjqliu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jackye1995 commented Jul 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants