You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
is a fully managed, unified metastore service for data lakes on Google Cloud.
4
5
5
6
To use Google BigLake Metastore with Lance, you can leverage BigLake's [Iceberg REST Catalog](https://docs.cloud.google.com/biglake/docs/blms-rest-catalog),
6
7
which exposes an Apache Iceberg REST Catalog-compatible interface.
7
8
8
-
## Configuration
9
-
10
-
Configure your Lance Iceberg namespace to connect to the BigLake Metastore endpoint:
**[Google Dataproc Metastore](https://docs.cloud.google.com/dataproc-metastore/docs/overview)** is a fully managed,
4
4
highly available, autohealing, serverless metastore that runs on Google Cloud.
5
5
6
6
To use Google Dataproc Metastore with Lance, you can leverage Dataproc's [Hive metastore](https://cloud.google.com/dataproc-metastore/docs/hive-metastore),
7
-
which exposes a Hive MetaStore-compatible interface.
7
+
which exposes a Apache Hive MetaStore-compatible interface.
8
8
9
-
Simply configure your Lance Hive namespace to connect to Dataproc's Hive MetaStore endpoint.
10
-
All the features and configurations of the Lance Hive Namespace ([V2](hive2.md) or [V3](hive3.md)) apply when using Dataproc Metastore.
9
+
See Lance Namespace integration with Hive metastore ([V2](hive2.md) or [V3](hive3.md)) for more details.
# AWS Glue Data Catalog Lance Namespace Implementation Spec
2
2
3
-
This document describes how the AWS Glue Data Catalog implements the Lance Namespace client spec.
3
+
This document describes how the AWS Glue Data Catalog
4
+
implements the Lance Namespace client spec.
4
5
5
6
## Background
6
7
7
-
AWS Glue Data Catalog is a fully managed metadata repository that stores structural and operational metadata for data assets. It is compatible with the Apache Hive Metastore API and can be used as a central metadata repository for data lakes. For details on AWS Glue, see the [AWS Glue Data Catalog Documentation](https://docs.aws.amazon.com/glue/).
8
+
AWS Glue Data Catalog is a fully managed metadata repository that stores structural and operational metadata for data assets.
9
+
It is compatible with the Apache Hive Metastore API and can be used as a central metadata repository for data lakes.
10
+
For details on AWS Glue, see the [AWS Glue Data Catalog Documentation](https://docs.aws.amazon.com/glue/latest/dg/manage-catalog.html).
@@ -22,9 +25,15 @@ The **secret_access_key** property is optional and specifies the AWS secret acce
22
25
23
26
The **session_token** property is optional and specifies the AWS session token for temporary credentials.
24
27
25
-
The **root** property is optional and specifies the storage root location of the lakehouse on Glue catalog. Default value is the current working directory.
28
+
The **assume_role_arn** property is optional and specifies the ARN of the IAM role to assume for Glue operations.
26
29
27
-
The **storage.*** prefix properties are optional and specify additional storage configurations to access tables (e.g., `storage.region=us-west-2`).
30
+
The **assume_role_region** property is optional and specifies the AWS region for the STS client when assuming a role.
31
+
32
+
The **assume_role_external_id** property is optional and specifies the external ID for cross-account role assumption. For more details, see [AWS external ID documentation](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user_externalid.html).
33
+
34
+
The **assume_role_session_name** property is optional and specifies the session name for the assumed role session. For more details, see [AWS role session name documentation](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_iam-condition-keys.html#ck_rolesessionname).
35
+
36
+
The **assume_role_timeout_sec** property is optional and specifies the duration in seconds for which the assumed role session is valid (default: 3600). At the end of the timeout, a new set of role session credentials will be fetched through the STS client.
1.**Default AWS credential provider chain**: When no explicit credentials are provided, the client uses the default AWS credential provider chain
34
43
2.**Static credentials**: Set `access_key_id` and `secret_access_key` for basic AWS credentials
35
44
3.**Session credentials**: Additionally provide `session_token` for temporary AWS credentials
45
+
4.**Assume role credentials**: Set `assume_role_arn` to assume an IAM role. Optionally configure `assume_role_region`, `assume_role_external_id`, `assume_role_session_name`, and `assume_role_timeout_sec` to customize the role assumption behavior
36
46
37
47
## Object Mapping
38
48
39
49
### Namespace
40
50
41
-
The **root namespace** is represented by the AWS Glue Data Catalog itself.
51
+
AWS Glue Data Catalog supports a recursive catalog structure through the [GetCatalog](https://docs.aws.amazon.com/glue/latest/webapi/API_GetCatalog.html) and [GetCatalogs](https://docs.aws.amazon.com/glue/latest/webapi/API_GetCatalogs.html) APIs.
52
+
This allows for multi-level namespace hierarchies.
53
+
54
+
The **root namespace** is represented by the default AWS Glue Data Catalog, which has a catalog ID of `None` or equal to the caller's AWS account ID.
55
+
56
+
A **child catalog** within the root catalog forms a child namespace. The [GetCatalogs](https://docs.aws.amazon.com/glue/latest/webapi/API_GetCatalogs.html) API supports `ParentCatalogId` parameter to traverse the catalog hierarchy.
42
57
43
-
A **child namespace**is a database in Glue, forming a 2-level namespace hierarchy.
58
+
A **database**within a catalog represents the leaf namespace level. Databases are created within a specific catalog using the `CatalogId` parameter in the [CreateDatabase](https://docs.aws.amazon.com/glue/latest/webapi/API_CreateDatabase.html) API.
44
59
45
-
The **namespace identifier** is the database name.
60
+
The **namespace identifier** follows a hierarchical pattern:
61
+
- For catalogs: the catalog name (e.g., `my_catalog`)
62
+
- For databases: the catalog chain joined with database name using the `$` delimiter (e.g., `catalog$database` or `parent_catalog$child_catalog$database`)
46
63
47
-
**Namespace properties** are stored in the Glue Database object's parameters map.
64
+
**Namespace properties** are stored in:
65
+
- Catalog's `Parameters` map for catalog-level namespaces
66
+
- Database's `Parameters` map for database-level namespaces
48
67
49
68
### Table
50
69
51
70
A **table** is represented as a [Table](https://docs.aws.amazon.com/glue/latest/webapi/API_Table.html) object in AWS Glue with `TableType` set to `EXTERNAL_TABLE`.
52
71
53
-
The **table identifier** is constructed by joining database and table name with the `$` delimiter (e.g., `database$table`).
72
+
The **table identifier** is constructed by joining the full namespace path and table name with the `$` delimiter (e.g., `catalog$database$table`).
54
73
55
74
The **table location** is stored in the [`StorageDescriptor.Location`](https://docs.aws.amazon.com/glue/latest/webapi/API_StorageDescriptor.html#Glue-Type-StorageDescriptor-Location) field, pointing to the root location of the Lance table.
56
75
@@ -60,6 +79,189 @@ The **table location** is stored in the [`StorageDescriptor.Location`](https://d
60
79
61
80
A table in AWS Glue is identified as a Lance table when it meets the following criteria: the `TableType` is `EXTERNAL_TABLE`, and the `Parameters` map contains a key `table_type` with value `lance` (case insensitive). The `StorageDescriptor.Location` must point to a valid Lance table root directory.
62
81
63
-
## Optimistic Concurrency Control
82
+
## Basic Operations
83
+
84
+
### CreateNamespace
85
+
86
+
Creates a new catalog or database in AWS Glue.
87
+
88
+
The implementation:
89
+
90
+
1. Parse the namespace identifier to determine if it is a catalog or database level
91
+
2. For catalog-level namespace:
92
+
- Construct a [CreateCatalog](https://docs.aws.amazon.com/glue/latest/webapi/API_CreateCatalog.html) request with name and properties
93
+
- Set the `Parameters` map with the provided namespace properties
94
+
3. For database-level namespace:
95
+
- Verify the parent catalog exists
96
+
- Construct a [CreateDatabase](https://docs.aws.amazon.com/glue/latest/webapi/API_CreateDatabase.html) request with database name and `CatalogId`
97
+
- Set the `Parameters` map with the provided namespace properties
If the namespace already exists and mode is CREATE, return error code `2` (NamespaceAlreadyExists).
103
+
104
+
If the parent catalog does not exist, return error code `1` (NamespaceNotFound).
105
+
106
+
If access is denied, return error code `16` (Forbidden).
107
+
108
+
If the Glue service is unavailable, return error code `17` (ServiceUnavailable).
109
+
110
+
### ListNamespaces
111
+
112
+
Lists catalogs or databases in AWS Glue.
113
+
114
+
The implementation:
115
+
116
+
1. Parse the parent namespace identifier
117
+
2. For root namespace (no parent):
118
+
- Use [GetCatalogs](https://docs.aws.amazon.com/glue/latest/webapi/API_GetCatalogs.html) with `IncludeRoot=true` to list all catalogs
119
+
- Use `ParentCatalogId` set to account ID and `Recursive=false` for direct children
120
+
3. For catalog-level namespace:
121
+
- Use [GetDatabases](https://docs.aws.amazon.com/glue/latest/webapi/API_GetDatabases.html) with the catalog's `CatalogId`
122
+
- Additionally use [GetCatalogs](https://docs.aws.amazon.com/glue/latest/webapi/API_GetCatalogs.html) with `ParentCatalogId` to list child catalogs
123
+
4. Sort the results and apply pagination using `NextToken`
124
+
125
+
**Error Handling:**
126
+
127
+
If the parent namespace does not exist, return error code `1` (NamespaceNotFound).
128
+
129
+
If access is denied, return error code `16` (Forbidden).
130
+
131
+
If the Glue service is unavailable, return error code `17` (ServiceUnavailable).
132
+
133
+
### DescribeNamespace
134
+
135
+
Retrieves properties and metadata for a catalog or database.
136
+
137
+
The implementation:
138
+
139
+
1. Parse the namespace identifier to determine the level
140
+
2. For catalog-level namespace:
141
+
- Use [GetCatalog](https://docs.aws.amazon.com/glue/latest/webapi/API_GetCatalog.html) with the catalog ID
142
+
- Extract properties from the `Parameters` map
143
+
3. For database-level namespace:
144
+
- Use [GetDatabase](https://docs.aws.amazon.com/glue/latest/webapi/API_GetDatabase.html) with the database name and `CatalogId`
145
+
- Extract properties from the Database's `Parameters` map
146
+
147
+
**Error Handling:**
148
+
149
+
If the namespace does not exist, return error code `1` (NamespaceNotFound).
150
+
151
+
If access is denied, return error code `16` (Forbidden).
152
+
153
+
If the Glue service is unavailable, return error code `17` (ServiceUnavailable).
154
+
155
+
### DropNamespace
156
+
157
+
Removes a catalog or database from AWS Glue. Only RESTRICT mode is supported; CASCADE mode is not implemented.
158
+
159
+
The implementation:
160
+
161
+
1. Parse the namespace identifier to determine the level
162
+
2. Check if the namespace exists (handle SKIP mode if not)
163
+
3. For catalog-level namespace:
164
+
- Verify the catalog has no child catalogs or databases
165
+
- Use [DeleteCatalog](https://docs.aws.amazon.com/glue/latest/webapi/API_DeleteCatalog.html) with the catalog ID
166
+
4. For database-level namespace:
167
+
- Verify the database is empty (no tables)
168
+
- Use [DeleteDatabase](https://docs.aws.amazon.com/glue/latest/webapi/API_DeleteDatabase.html) with the database name and `CatalogId`
169
+
170
+
**Error Handling:**
171
+
172
+
If the namespace does not exist and mode is FAIL, return error code `1` (NamespaceNotFound).
173
+
174
+
If the namespace is not empty, return error code `3` (NamespaceNotEmpty).
175
+
176
+
If access is denied, return error code `16` (Forbidden).
177
+
178
+
If the Glue service is unavailable, return error code `17` (ServiceUnavailable).
179
+
180
+
### DeclareTable
181
+
182
+
Declares a new Lance table in AWS Glue without creating the underlying data.
183
+
184
+
The implementation:
185
+
186
+
1. Parse the table identifier to extract catalog, database, and table name
187
+
2. Verify the parent namespace (database) exists using [GetDatabase](https://docs.aws.amazon.com/glue/latest/webapi/API_GetDatabase.html)
188
+
3. Construct a [CreateTable](https://docs.aws.amazon.com/glue/latest/webapi/API_CreateTable.html) request with:
189
+
-`CatalogId`: the catalog ID from the namespace
190
+
-`DatabaseName`: the database name
191
+
-`TableInput.Name`: the table name
192
+
-`TableInput.TableType`: `EXTERNAL_TABLE`
193
+
-`TableInput.Parameters`: include `table_type=lance` and other properties
194
+
-`TableInput.StorageDescriptor.Location`: the specified table location
195
+
4. POST the CreateTable request to Glue
196
+
197
+
**Error Handling:**
198
+
199
+
If the parent namespace does not exist, return error code `1` (NamespaceNotFound).
200
+
201
+
If the table already exists, return error code `5` (TableAlreadyExists).
202
+
203
+
If access is denied, return error code `16` (Forbidden).
204
+
205
+
If the Glue service is unavailable, return error code `17` (ServiceUnavailable).
206
+
207
+
### ListTables
208
+
209
+
Lists all Lance tables in a database.
210
+
211
+
The implementation:
212
+
213
+
1. Parse the namespace identifier to extract catalog and database
214
+
2. Verify the namespace exists using [GetDatabase](https://docs.aws.amazon.com/glue/latest/webapi/API_GetDatabase.html)
215
+
3. Use [GetTables](https://docs.aws.amazon.com/glue/latest/webapi/API_GetTables.html) with `CatalogId` and `DatabaseName`
216
+
4. Filter tables where `Parameters.table_type=lance` (case insensitive)
217
+
5. Sort the results and apply pagination using `NextToken`
218
+
219
+
**Error Handling:**
220
+
221
+
If the namespace does not exist, return error code `1` (NamespaceNotFound).
222
+
223
+
If access is denied, return error code `16` (Forbidden).
224
+
225
+
If the Glue service is unavailable, return error code `17` (ServiceUnavailable).
226
+
227
+
### DescribeTable
228
+
229
+
Retrieves metadata for a Lance table. Only `load_detailed_metadata=false` is supported. When `load_detailed_metadata=false`, only the table location and storage_options are returned; other fields (version, table_uri, schema, stats) are null.
230
+
231
+
The implementation:
232
+
233
+
1. Parse the table identifier to extract catalog, database, and table name
234
+
2. Use [GetTable](https://docs.aws.amazon.com/glue/latest/webapi/API_GetTable.html) with `CatalogId`, `DatabaseName`, and `Name`
235
+
3. Validate that the table is a Lance table (check `Parameters.table_type=lance`)
236
+
4. Return the table location from `StorageDescriptor.Location` and storage_options from `Parameters`
237
+
238
+
**Error Handling:**
239
+
240
+
If the table does not exist, return error code `4` (TableNotFound).
241
+
242
+
If the table is not a Lance table, return error code `13` (InvalidInput).
243
+
244
+
If access is denied, return error code `16` (Forbidden).
245
+
246
+
If the Glue service is unavailable, return error code `17` (ServiceUnavailable).
247
+
248
+
### DeregisterTable
249
+
250
+
Removes a Lance table registration from AWS Glue without deleting the underlying data.
251
+
252
+
The implementation:
253
+
254
+
1. Parse the table identifier to extract catalog, database, and table name
255
+
2. Use [GetTable](https://docs.aws.amazon.com/glue/latest/webapi/API_GetTable.html) to retrieve and validate the table is a Lance table
256
+
3. Use [DeleteTable](https://docs.aws.amazon.com/glue/latest/webapi/API_DeleteTable.html) with `CatalogId`, `DatabaseName`, and `Name`
257
+
4. The underlying Lance table data at `StorageDescriptor.Location` is not deleted
258
+
259
+
**Error Handling:**
260
+
261
+
If the table does not exist, return error code `4` (TableNotFound).
262
+
263
+
If the table is not a Lance table, return error code `13` (InvalidInput).
264
+
265
+
If access is denied, return error code `16` (Forbidden).
64
266
65
-
Updates to Lance tables in AWS Glue should use the `VersionId` for conditional updates through the [UpdateTable](https://docs.aws.amazon.com/glue/latest/webapi/API_UpdateTable.html) API. If the `VersionId` does not match the expected version, the update fails to prevent concurrent modification conflicts.
267
+
If the Glue service is unavailable, return error code `17` (ServiceUnavailable).
0 commit comments