Skip to content

Commit a97c3f4

Browse files
committed
feat: more fixes (lance-format#5)
1 parent 43ab0f5 commit a97c3f4

57 files changed

Lines changed: 596 additions & 547 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

docs/src/glue.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,18 +6,19 @@ implements the Lance Namespace client spec.
66
## Background
77

88
AWS Glue Data Catalog is a fully managed metadata repository that stores structural and operational metadata for data assets.
9-
It is compatible with the Apache Hive Metastore API and can be used as a central metadata repository for data lakes.
9+
It is based on the Apache Hive Metastore API, but uses JSON RPC instead of Apache Thrift for request response.
10+
It can be used as a central metadata repository for data lakes.
1011
For details on AWS Glue, see the [AWS Glue Data Catalog Documentation](https://docs.aws.amazon.com/glue/latest/dg/manage-catalog.html).
1112

1213
## Namespace Implementation Configuration Properties
1314

1415
The Lance Glue namespace implementation accepts the following configuration properties:
1516

16-
The **catalog_id** property is optional and specifies the Catalog ID of the Glue catalog (defaults to AWS account ID).
17+
The **catalog_id** property is optional and specifies the Catalog ID of the Glue catalog to use as the starting point. When not specified, it is resolved to the caller's AWS account ID.
1718

1819
The **endpoint** property is optional and specifies a custom Glue service endpoint for API compatible metastores.
1920

20-
The **region** property is optional and specifies the AWS region for all Glue operations.
21+
The **region** property is optional and specifies the AWS region for all Glue operations. When not specified, it is resolved to the default AWS region in the caller's environment.
2122

2223
The **access_key_id** property is optional and specifies the AWS access key ID for static credentials.
2324

@@ -51,7 +52,7 @@ The Glue namespace supports multiple authentication methods:
5152
AWS Glue Data Catalog supports a recursive catalog structure through the [GetCatalog](https://docs.aws.amazon.com/glue/latest/webapi/API_GetCatalog.html) and [GetCatalogs](https://docs.aws.amazon.com/glue/latest/webapi/API_GetCatalogs.html) APIs.
5253
This allows for multi-level namespace hierarchies.
5354

54-
The **root namespace** is represented by the default AWS Glue Data Catalog, which has a catalog ID of `None` or equal to the caller's AWS account ID.
55+
The **root namespace** is represented by the default AWS Glue Data Catalog. When the `catalog_id` property is not specified or set to `None`, it is resolved to the caller's AWS account ID. Users can specify a different `catalog_id` to use another AWS account's Glue catalog as the starting point.
5556

5657
A **child catalog** within the root catalog forms a child namespace. The [GetCatalogs](https://docs.aws.amazon.com/glue/latest/webapi/API_GetCatalogs.html) API supports `ParentCatalogId` parameter to traverse the catalog hierarchy.
5758

docs/src/gravitino.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Lance Gravitino Namespace
1+
# Apache Gravitino
22

33
**Apache Gravitino** is a high-performance, geo-distributed, and federated metadata lake.
44
It manages metadata directly in different sources, types, and regions, providing unified metadata access for data and AI assets.

docs/src/hive2.md

Lines changed: 23 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Lance Hive 2.x Namespace Implementation Spec
1+
# Apache Hive 2.X MetaStore Lance Namespace Implementation Spec
22

33
This document describes how the Hive 2.x MetaStore implements the Lance Namespace client spec.
44

@@ -14,8 +14,6 @@ The **client.pool-size** property is optional and specifies the size of the HMS
1414

1515
The **root** property is optional and specifies the storage root location of the lakehouse on Hive catalog. Default value is the current working directory.
1616

17-
The **storage.*** prefix properties are optional and specify additional storage configurations to access tables (e.g., `storage.region=us-west-2`).
18-
1917
## Object Mapping
2018

2119
### Namespace
@@ -56,7 +54,9 @@ The implementation:
5654

5755
**Error Handling:**
5856

59-
If the namespace already exists and mode is CREATE, return error code `2` (NamespaceAlreadyExists). If the HMS connection fails, return error code `17` (ServiceUnavailable).
57+
If the namespace already exists and mode is CREATE, return error code `2` (NamespaceAlreadyExists).
58+
59+
If the HMS connection fails, return error code `17` (ServiceUnavailable).
6060

6161
### ListNamespaces
6262

@@ -83,7 +83,9 @@ The implementation:
8383

8484
**Error Handling:**
8585

86-
If the namespace does not exist, return error code `1` (NamespaceNotFound). If the HMS connection fails, return error code `17` (ServiceUnavailable).
86+
If the namespace does not exist, return error code `1` (NamespaceNotFound).
87+
88+
If the HMS connection fails, return error code `17` (ServiceUnavailable).
8789

8890
### DropNamespace
8991

@@ -113,13 +115,17 @@ The implementation:
113115
1. Parse the table identifier to extract database and table name
114116
2. Verify the parent namespace exists
115117
3. Create an HMS Table object with `tableType=EXTERNAL_TABLE`
116-
4. Set the storage descriptor with the specified or default location
118+
4. Set the storage descriptor with the specified or default location. When location is not specified, it defaults to `{root}/{database}.db/{table}`
117119
5. Add `table_type=lance` to the table parameters
118120
6. Register the table in HMS
119121

120122
**Error Handling:**
121123

122-
If the parent namespace does not exist, return error code `1` (NamespaceNotFound). If the table already exists, return error code `5` (TableAlreadyExists). If the HMS connection fails, return error code `17` (ServiceUnavailable).
124+
If the parent namespace does not exist, return error code `1` (NamespaceNotFound).
125+
126+
If the table already exists, return error code `5` (TableAlreadyExists).
127+
128+
If the HMS connection fails, return error code `17` (ServiceUnavailable).
123129

124130
### ListTables
125131

@@ -135,18 +141,20 @@ The implementation:
135141

136142
**Error Handling:**
137143

138-
If the namespace does not exist, return error code `1` (NamespaceNotFound). If the HMS connection fails, return error code `17` (ServiceUnavailable).
144+
If the namespace does not exist, return error code `1` (NamespaceNotFound).
145+
146+
If the HMS connection fails, return error code `17` (ServiceUnavailable).
139147

140148
### DescribeTable
141149

142-
Retrieves metadata for a Lance table. Only `load_detailed_metadata=false` is supported. When `load_detailed_metadata=false`, only the table location and storage_options are returned; other fields (version, table_uri, schema, stats) are null.
150+
Retrieves metadata for a Lance table. Only `load_detailed_metadata=false` is supported. When `load_detailed_metadata=false`, only the table location is returned; other fields (version, table_uri, schema, stats) are null.
143151

144152
The implementation:
145153

146154
1. Parse the table identifier
147155
2. Retrieve the Table object from HMS
148156
3. Validate that it is a Lance table (check `table_type=lance`)
149-
4. Return the table location from `storageDescriptor.location` and storage_options from `parameters`
157+
4. Return the table location from `storageDescriptor.location`
150158

151159
**Error Handling:**
152160

@@ -168,4 +176,8 @@ The implementation:
168176

169177
**Error Handling:**
170178

171-
If the table does not exist, return error code `4` (TableNotFound). If the table is not a Lance table, return error code `13` (InvalidInput). If the HMS connection fails, return error code `17` (ServiceUnavailable).
179+
If the table does not exist, return error code `4` (TableNotFound).
180+
181+
If the table is not a Lance table, return error code `13` (InvalidInput).
182+
183+
If the HMS connection fails, return error code `17` (ServiceUnavailable).

docs/src/hive3.md

Lines changed: 29 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Lance Hive 3.x Namespace Implementation Spec
1+
# Apache Hive 3+.X MetaStore Lance Namespace Implementation Spec
22

33
This document describes how the Hive 3.x MetaStore implements the Lance Namespace client spec.
44

@@ -8,14 +8,12 @@ Apache Hive MetaStore (HMS) is a centralized metadata repository for Apache Hive
88

99
## Namespace Implementation Configuration Properties
1010

11-
The Lance Hive 3.x namespace implementation accepts the following configuration properties:
11+
The Lance Hive 3+.x namespace implementation accepts the following configuration properties:
1212

1313
The **client.pool-size** property is optional and specifies the size of the HMS client connection pool. Default value is `3`.
1414

1515
The **root** property is optional and specifies the storage root location of the lakehouse on Hive catalog. Default value is the current working directory.
1616

17-
The **storage.*** prefix properties are optional and specify additional storage configurations to access tables (e.g., `storage.region=us-west-2`).
18-
1917
## Object Mapping
2018

2119
### Namespace
@@ -57,7 +55,11 @@ The implementation:
5755

5856
**Error Handling:**
5957

60-
If the namespace already exists and mode is CREATE, return error code `2` (NamespaceAlreadyExists). If the parent catalog does not exist when creating a database, return error code `1` (NamespaceNotFound). If the HMS connection fails, return error code `17` (ServiceUnavailable).
58+
If the namespace already exists and mode is CREATE, return error code `2` (NamespaceAlreadyExists).
59+
60+
If the parent catalog does not exist when creating a database, return error code `1` (NamespaceNotFound).
61+
62+
If the HMS connection fails, return error code `17` (ServiceUnavailable).
6163

6264
### ListNamespaces
6365

@@ -72,7 +74,9 @@ The implementation:
7274

7375
**Error Handling:**
7476

75-
If the parent namespace does not exist, return error code `1` (NamespaceNotFound). If the HMS connection fails, return error code `17` (ServiceUnavailable).
77+
If the parent namespace does not exist, return error code `1` (NamespaceNotFound).
78+
79+
If the HMS connection fails, return error code `17` (ServiceUnavailable).
7680

7781
### DescribeNamespace
7882

@@ -86,7 +90,9 @@ The implementation:
8690

8791
**Error Handling:**
8892

89-
If the namespace does not exist, return error code `1` (NamespaceNotFound). If the HMS connection fails, return error code `17` (ServiceUnavailable).
93+
If the namespace does not exist, return error code `1` (NamespaceNotFound).
94+
95+
If the HMS connection fails, return error code `17` (ServiceUnavailable).
9096

9197
### DropNamespace
9298

@@ -116,13 +122,17 @@ The implementation:
116122
1. Parse the table identifier to extract catalog, database, and table name
117123
2. Verify the parent namespace exists
118124
3. Create an HMS Table object with `tableType=EXTERNAL_TABLE`
119-
4. Set the storage descriptor with the specified or default location
125+
4. Set the storage descriptor with the specified or default location. When location is not specified, it defaults to `{root}/{database}.db/{table}` for the default `hive` catalog (hive2-compatible), or `{root}/{catalog}/{database}.db/{table}` for other catalogs
120126
5. Add `table_type=lance` to the table parameters
121127
6. Register the table in HMS
122128

123129
**Error Handling:**
124130

125-
If the parent namespace does not exist, return error code `1` (NamespaceNotFound). If the table already exists, return error code `5` (TableAlreadyExists). If the HMS connection fails, return error code `17` (ServiceUnavailable).
131+
If the parent namespace does not exist, return error code `1` (NamespaceNotFound).
132+
133+
If the table already exists, return error code `5` (TableAlreadyExists).
134+
135+
If the HMS connection fails, return error code `17` (ServiceUnavailable).
126136

127137
### ListTables
128138

@@ -138,18 +148,20 @@ The implementation:
138148

139149
**Error Handling:**
140150

141-
If the namespace does not exist, return error code `1` (NamespaceNotFound). If the HMS connection fails, return error code `17` (ServiceUnavailable).
151+
If the namespace does not exist, return error code `1` (NamespaceNotFound).
152+
153+
If the HMS connection fails, return error code `17` (ServiceUnavailable).
142154

143155
### DescribeTable
144156

145-
Retrieves metadata for a Lance table. Only `load_detailed_metadata=false` is supported. When `load_detailed_metadata=false`, only the table location and storage_options are returned; other fields (version, table_uri, schema, stats) are null.
157+
Retrieves metadata for a Lance table. Only `load_detailed_metadata=false` is supported. When `load_detailed_metadata=false`, only the table location is returned; other fields (version, table_uri, schema, stats) are null.
146158

147159
The implementation:
148160

149161
1. Parse the table identifier
150162
2. Retrieve the Table object from HMS
151163
3. Validate that it is a Lance table (check `table_type=lance`)
152-
4. Return the table location from `storageDescriptor.location` and storage_options from `parameters`
164+
4. Return the table location from `storageDescriptor.location`
153165

154166
**Error Handling:**
155167

@@ -171,4 +183,8 @@ The implementation:
171183

172184
**Error Handling:**
173185

174-
If the table does not exist, return error code `4` (TableNotFound). If the table is not a Lance table, return error code `13` (InvalidInput). If the HMS connection fails, return error code `17` (ServiceUnavailable).
186+
If the table does not exist, return error code `4` (TableNotFound).
187+
188+
If the table is not a Lance table, return error code `13` (InvalidInput).
189+
190+
If the HMS connection fails, return error code `17` (ServiceUnavailable).

0 commit comments

Comments
 (0)