Skip to content

Commit 46ce66c

Browse files
authored
[docs](multi-catalog)update en docs (#16160)
1 parent b7379da commit 46ce66c

File tree

7 files changed

+304
-16
lines changed

7 files changed

+304
-16
lines changed

docs/en/docs/lakehouse/multi-catalog/dlf.md

Lines changed: 75 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
{
3-
"title": "Aliyun DLF",
3+
"title": "Alibaba Cloud DLF",
44
"language": "en"
55
}
66
---
@@ -25,7 +25,79 @@ under the License.
2525
-->
2626

2727

28-
# Aliyun DLF
28+
# Alibaba Cloud DLF
29+
30+
Data Lake Formation (DLF) is the unified metadata management service of Alibaba Cloud. It is compatible with the Hive Metastore protocol.
31+
32+
> [What is DLF](https://www.alibabacloud.com/product/datalake-formation)
33+
34+
Doris can access DLF the same way as it accesses Hive Metastore.
35+
36+
## Connect to DLF
37+
38+
1. Create `hive-site.xml`
39+
40+
Create the `hive-site.xml` file, and put it in the `fe/conf` directory.
41+
42+
```
43+
<?xml version="1.0"?>
44+
<configuration>
45+
<!--Set to use dlf client-->
46+
<property>
47+
<name>hive.metastore.type</name>
48+
<value>dlf</value>
49+
</property>
50+
<property>
51+
<name>dlf.catalog.endpoint</name>
52+
<value>dlf-vpc.cn-beijing.aliyuncs.com</value>
53+
</property>
54+
<property>
55+
<name>dlf.catalog.region</name>
56+
<value>cn-beijing</value>
57+
</property>
58+
<property>
59+
<name>dlf.catalog.proxyMode</name>
60+
<value>DLF_ONLY</value>
61+
</property>
62+
<property>
63+
<name>dlf.catalog.uid</name>
64+
<value>20000000000000000</value>
65+
</property>
66+
<property>
67+
<name>dlf.catalog.accessKeyId</name>
68+
<value>XXXXXXXXXXXXXXX</value>
69+
</property>
70+
<property>
71+
<name>dlf.catalog.accessKeySecret</name>
72+
<value>XXXXXXXXXXXXXXXXX</value>
73+
</property>
74+
</configuration>
75+
```
76+
77+
* `dlf.catalog.endpoint`: DLF Endpoint. See [Regions and Endpoints of DLF](https://www.alibabacloud.com/help/en/data-lake-formation/latest/regions-and-endpoints).
78+
* `dlf.catalog.region`: DLF Region. See [Regions and Endpoints of DLF](https://www.alibabacloud.com/help/en/data-lake-formation/latest/regions-and-endpoints).
79+
* `dlf.catalog.uid`: Alibaba Cloud account. You can find the "Account ID" in the upper right corner on the Alibaba Cloud console.
80+
* `dlf.catalog.accessKeyId`:AccessKey, which you can create and manage on the [Alibaba Cloud console](https://ram.console.aliyun.com/manage/ak).
81+
* `dlf.catalog.accessKeySecret`:SecretKey, which you can create and manage on the [Alibaba Cloud console](https://ram.console.aliyun.com/manage/ak).
82+
83+
Other configuration items are fixed and require no modifications.
84+
85+
2. Restart FE, and create Catalog via the `CREATE CATALOG` statement.
86+
87+
Doris will read and parse `fe/conf/hive-site.xml`.
88+
89+
```sql
90+
CREATE CATALOG hive_with_dlf PROPERTIES (
91+
"type"="hms",
92+
"hive.metastore.uris" = "thrift://127.0.0.1:9083"
93+
)
94+
```
95+
96+
`type` should always be `hms`; while `hive.metastore.uris` can be arbitary since it is not used in real practice, but it should follow the format of Hive Metastore Thrift URI.
97+
98+
After the above steps, you can access metadata in DLF the same way as you access Hive MetaStore.
99+
100+
Doris supports accessing Hive/Iceberg/Hudi metadata in DLF.
101+
29102

30-
TODO: translate
31103

docs/en/docs/lakehouse/multi-catalog/hive.md

Lines changed: 146 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,4 +26,149 @@ under the License.
2626

2727
# Hive
2828

29-
TODO: translate
29+
Once Doris is connected to Hive Metastore or made compatible with Hive Metastore metadata service, it can access databases and tables in Hive and conduct queries.
30+
31+
Besides Hive, many other systems, such as Iceberg and Hudi, use Hive Metastore to keep their metadata. Thus, Doris can also access these systems via Hive Catalog.
32+
33+
## Usage
34+
35+
When connnecting to Hive, Doris:
36+
37+
1. Supports Hive version 1/2/3;
38+
2. Supports both Managed Table and External Table;
39+
3. Can identify metadata of Hive, Iceberg, and Hudi stored in Hive Metastore;
40+
4. Supports Hive tables with data stored in JuiceFS, which can be used the same way as normal Hive tables (put `juicefs-hadoop-x.x.x.jar` in `fe/lib/` and `apache_hdfs_broker/lib/`).
41+
42+
## Create Catalog
43+
44+
```sql
45+
CREATE CATALOG hive PROPERTIES (
46+
'type'='hms',
47+
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
48+
'hadoop.username' = 'hive',
49+
'dfs.nameservices'='your-nameservice',
50+
'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
51+
'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
52+
'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
53+
'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
54+
);
55+
```
56+
57+
In addition to `type` and `hive.metastore.uris` , which are required, you can specify other parameters regarding the connection.
58+
59+
For example, to specify HDFS HA:
60+
61+
```sql
62+
CREATE CATALOG hive PROPERTIES (
63+
'type'='hms',
64+
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
65+
'hadoop.username' = 'hive',
66+
'dfs.nameservices'='your-nameservice',
67+
'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
68+
'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
69+
'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
70+
'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
71+
);
72+
```
73+
74+
To specify HDFS HA and Kerberos authentication information:
75+
76+
```sql
77+
CREATE CATALOG hive PROPERTIES (
78+
'type'='hms',
79+
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
80+
'hive.metastore.sasl.enabled' = 'true',
81+
'dfs.nameservices'='your-nameservice',
82+
'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
83+
'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
84+
'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider',
85+
'hadoop.security.authentication' = 'kerberos',
86+
'hadoop.kerberos.keytab' = '/your-keytab-filepath/your.keytab',
87+
'hadoop.kerberos.principal' = 'your-principal@YOUR.COM',
88+
'yarn.resourcemanager.address' = 'your-rm-address:your-rm-port',
89+
'yarn.resourcemanager.principal' = 'your-rm-principal/_HOST@YOUR.COM'
90+
);
91+
```
92+
93+
To provide Hadoop KMS encrypted transmission information:
94+
95+
```sql
96+
CREATE CATALOG hive PROPERTIES (
97+
'type'='hms',
98+
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
99+
'dfs.encryption.key.provider.uri' = 'kms://http@kms_host:kms_port/kms'
100+
);
101+
```
102+
103+
Or to connect to Hive data stored in JuiceFS:
104+
105+
```sql
106+
CREATE CATALOG hive PROPERTIES (
107+
'type'='hms',
108+
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
109+
'hadoop.username' = 'root',
110+
'fs.jfs.impl' = 'io.juicefs.JuiceFileSystem',
111+
'fs.AbstractFileSystem.jfs.impl' = 'io.juicefs.JuiceFS',
112+
'juicefs.meta' = 'xxx'
113+
);
114+
```
115+
116+
In Doris 1.2.1 and newer, you can create a Resource that contains all these parameters, and reuse the Resource when creating new Catalogs. Here is an example:
117+
118+
```sql
119+
# 1. Create Resource
120+
CREATE RESOURCE hms_resource PROPERTIES (
121+
'type'='hms',
122+
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
123+
'hadoop.username' = 'hive',
124+
'dfs.nameservices'='your-nameservice',
125+
'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
126+
'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
127+
'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
128+
'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
129+
);
130+
131+
# 2. Create Catalog and use an existing Resource. The key and value information in the followings will overwrite the corresponding information in the Resource.
132+
CREATE CATALOG hive WITH RESOURCE hms_resource PROPERTIES(
133+
'key' = 'value'
134+
);
135+
```
136+
137+
You can also put the `hive-site.xml` file in the `conf` directories of FE and BE. This will enable Doris to automatically read information from `hive-site.xml`. The relevant information will be overwritten based on the following rules :
138+
139+
140+
* Information in Resource will overwrite that in `hive-site.xml`.
141+
* Information in `CREATE CATALOG PROPERTIES` will overwrite that in Resource.
142+
143+
### Hive Versions
144+
145+
Doris can access Hive Metastore in all Hive versions. By default, Doris uses the interface compatible with Hive 2.3 to access Hive Metastore. You can specify a certain Hive version when creating Catalogs, for example:
146+
147+
```sql
148+
CREATE CATALOG hive PROPERTIES (
149+
'type'='hms',
150+
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
151+
'hive.version' = '1.1.0'
152+
);
153+
```
154+
155+
## Column Type Mapping
156+
157+
This is applicable for Hive/Iceberge/Hudi.
158+
159+
| HMS Type | Doris Type | Comment |
160+
| ------------- | ------------- | ------------------------------------------------- |
161+
| boolean | boolean | |
162+
| tinyint | tinyint | |
163+
| smallint | smallint | |
164+
| int | int | |
165+
| bigint | bigint | |
166+
| date | date | |
167+
| timestamp | datetime | |
168+
| float | float | |
169+
| double | double | |
170+
| char | char | |
171+
| varchar | varchar | |
172+
| decimal | decimal | |
173+
| `array<type>` | `array<type>` | Support nested array, such as `array<array<int>>` |
174+
| other | unsupported | |

docs/en/docs/lakehouse/multi-catalog/hudi.md

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,4 +27,28 @@ under the License.
2727

2828
# Hudi
2929

30-
TODO: translate
30+
## Usage
31+
32+
1. Currently, Doris supports Snapshot Query on Copy-on-Write Hudi tables and Read Optimized Query on Merge-on-Read tables. In the future, it will support Snapshot Query on Merge-on-Read tables and Incremental Query.
33+
2. Doris only supports Hive Metastore Catalogs currently. The usage is basically the same as that of Hive Catalogs. More types of Catalogs will be supported in future versions.
34+
35+
## Create Catalog
36+
37+
Same as creating Hive Catalogs. A simple example is provided here. See [Hive](./hive) for more information.
38+
39+
```sql
40+
CREATE CATALOG hudi PROPERTIES (
41+
'type'='hms',
42+
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
43+
'hadoop.username' = 'hive',
44+
'dfs.nameservices'='your-nameservice',
45+
'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
46+
'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
47+
'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
48+
'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
49+
);
50+
```
51+
52+
## Column Type Mapping
53+
54+
Same as that in Hive Catalogs. See the relevant section in [Hive](./hive).

docs/en/docs/lakehouse/multi-catalog/iceberg.md

Lines changed: 48 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,4 +27,51 @@ under the License.
2727

2828
# Iceberg
2929

30-
TODO: translate
30+
## Usage
31+
32+
When connecting to Iceberg, Doris:
33+
34+
1. Supports Iceberg V1/V2 table formats;
35+
2. Supports Position Delete but not Equality Delete for V2 format;
36+
3. Only supports Hive Metastore Catalogs. The usage is the same as that of Hive Catalogs.
37+
38+
## Create Catalog
39+
40+
Same as creating Hive Catalogs. A simple example is provided here. See [Hive](./hive) for more information.
41+
42+
```sql
43+
CREATE CATALOG iceberg PROPERTIES (
44+
'type'='hms',
45+
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
46+
'hadoop.username' = 'hive',
47+
'dfs.nameservices'='your-nameservice',
48+
'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
49+
'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
50+
'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
51+
'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
52+
);
53+
```
54+
55+
## Column Type Mapping
56+
57+
Same as that in Hive Catalogs. See the relevant section in [Hive](./hive).
58+
59+
## Time Travel
60+
61+
<version since="dev">
62+
63+
Doris supports reading the specified Snapshot of Iceberg tables.
64+
65+
</version>
66+
67+
Each write operation to an Iceberg table will generate a new Snapshot.
68+
69+
By default, a read request will only read the latest Snapshot.
70+
71+
You can read data of historical table versions using the `FOR TIME AS OF` or `FOR VERSION AS OF` statements based on the Snapshot ID or the timepoint the Snapshot is generated. For example:
72+
73+
`SELECT * FROM iceberg_tbl FOR TIME AS OF "2022-10-07 17:20:37";`
74+
75+
`SELECT * FROM iceberg_tbl FOR VERSION AS OF 868895038966572;`
76+
77+
You can use the [iceberg_meta](https://doris.apache.org/docs/dev/sql-manual/sql-functions/table-functions/iceberg_meta/) table function to view the Snapshot details of the specified table.

docs/en/docs/lakehouse/multi-catalog/multi-catalog.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -261,7 +261,7 @@ See [Hudi](./hudi)
261261
262262
### Connect to Elasticsearch
263263
264-
See [Elasticsearch](./elasticsearch)
264+
See [Elasticsearch](./es)
265265
266266
### Connect to JDBC
267267

0 commit comments

Comments
 (0)