Skip to content

Commit bec5cc8

Browse files
committed
[doc](build) Add Hive thirdparty startup README
### What problem does this PR solve? Issue Number: None Related PR: None Problem Summary: Add documentation for Hive thirdparty docker startup, mode/module behavior, component segmentation, and JuiceFS metadata backend configuration. ### Release note None ### Check List (For Author) - Test: No need to test (documentation only) - Regression test / Unit Test / Manual test / No need to test (with reason) - Behavior changed: No - Does this need documentation: No
1 parent 7b7c75a commit bec5cc8

File tree

1 file changed

+169
-0
lines changed
  • docker/thirdparties/docker-compose/hive

1 file changed

+169
-0
lines changed
Lines changed: 169 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,169 @@
1+
<!--
2+
Licensed to the Apache Software Foundation (ASF) under one
3+
or more contributor license agreements. See the NOTICE file
4+
distributed with this work for additional information
5+
regarding copyright ownership. The ASF licenses this file
6+
to you under the Apache License, Version 2.0 (the
7+
"License"); you may not use this file except in compliance
8+
with the License. You may obtain a copy of the License at
9+
10+
http://www.apache.org/licenses/LICENSE-2.0
11+
12+
Unless required by applicable law or agreed to in writing,
13+
software distributed under the License is distributed on an
14+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
KIND, either implied. See the License for the specific
16+
language governing permissions and limitations
17+
under the License.
18+
-->
19+
20+
# Hive Docker Environment
21+
22+
This directory contains Hive2/Hive3 Docker Compose templates and bootstrap scripts used by Doris thirdparty startup.
23+
24+
## Components
25+
26+
- `hive-server`: HiveServer2 endpoint
27+
- `hive-metastore`: Hive Metastore service
28+
- `hive-metastore-postgresql`: metastore backend database
29+
- `namenode` / `datanode`: HDFS services for Hive test data
30+
31+
## Component Segmentation
32+
33+
Hive startup can be understood in 3 layers:
34+
35+
### 1) Docker Service Layer
36+
37+
- Compute/SQL entry:
38+
- `hive-server`
39+
- Metadata:
40+
- `hive-metastore`
41+
- `hive-metastore-postgresql`
42+
- Storage:
43+
- `namenode`
44+
- `datanode`
45+
46+
### 2) Refresh Module Layer (`--hive-modules`)
47+
48+
- `default`: basic default-db external tables
49+
- `multi_catalog`: multi-format and multi-path external table cases
50+
- `partition_type`: partition type coverage cases
51+
- `statistics`: table stats and empty-table stats cases
52+
- `tvf`: tvf data/bootstrap cases
53+
- `regression`: special regression datasets (serde, delimiters, etc.)
54+
- `test`: lightweight smoke test datasets
55+
- `preinstalled_hql`: centralized preinstalled HQL scripts (`create_preinstalled_scripts/*.hql`)
56+
- `view`: view bootstrap (`create_view_scripts/create_view.hql`)
57+
58+
### 3) Bootstrap Group Layer
59+
60+
- `common`: shared items for hive2/hive3
61+
- `hive2_only`: hive2-only items
62+
- `hive3_only`: hive3-only items
63+
64+
By default:
65+
66+
- Hive2 uses: `common,hive2_only`
67+
- Hive3 uses: `common,hive3_only`
68+
69+
This grouping controls which files are selected during `run.sh`/HQL refresh.
70+
71+
## Start/Stop
72+
73+
```bash
74+
# Start Hive3
75+
./docker/thirdparties/run-thirdparties-docker.sh -c hive3
76+
77+
# Start Hive2
78+
./docker/thirdparties/run-thirdparties-docker.sh -c hive2
79+
80+
# Stop Hive3
81+
./docker/thirdparties/run-thirdparties-docker.sh -c hive3 --stop
82+
```
83+
84+
## Startup Modes
85+
86+
Use `--hive-mode` to control startup behavior:
87+
88+
- `fast`: reuse existing state as much as possible
89+
- `refresh` (default): refresh only changed modules by SHA
90+
- `rebuild`: force reset and rebuild hive state
91+
92+
Examples:
93+
94+
```bash
95+
# Default mode (refresh)
96+
./docker/thirdparties/run-thirdparties-docker.sh -c hive3
97+
98+
# Explicit refresh
99+
./docker/thirdparties/run-thirdparties-docker.sh -c hive3 --hive-mode refresh
100+
101+
# Full rebuild
102+
./docker/thirdparties/run-thirdparties-docker.sh -c hive3 --hive-mode rebuild
103+
```
104+
105+
## Module Refresh
106+
107+
Use `--hive-modules` to limit refresh scope:
108+
109+
- `default,multi_catalog,partition_type,statistics,tvf,regression,test,preinstalled_hql,view`
110+
- `all` means all modules
111+
112+
Examples:
113+
114+
```bash
115+
# Refresh only preinstalled HQL scripts
116+
./docker/thirdparties/run-thirdparties-docker.sh -c hive3 --hive-mode refresh --hive-modules preinstalled_hql
117+
118+
# Refresh selected data modules
119+
./docker/thirdparties/run-thirdparties-docker.sh -c hive3 --hive-mode refresh --hive-modules default,multi_catalog
120+
```
121+
122+
## Idempotency Rules
123+
124+
To keep `refresh` repeatable:
125+
126+
- `run.sh` scripts should be idempotent
127+
- HQL should use `DROP ... IF EXISTS` then `CREATE ...`
128+
- avoid relying on `CREATE ... IF NOT EXISTS` for table/view recreation
129+
130+
## JuiceFS Metadata Backend
131+
132+
Hive now defaults JuiceFS metadata to PostgreSQL (Hive metastore DB), so Hive startup no longer auto-requires MySQL.
133+
134+
- Hive2 default (in `hive-2x_settings.env`):
135+
- `postgres://postgres@127.0.0.1:${PG_PORT}/juicefs_meta?sslmode=disable`
136+
- Hive3 default (in `hive-3x_settings.env`):
137+
- `postgres://postgres@127.0.0.1:${PG_PORT}/juicefs_meta?sslmode=disable`
138+
139+
If your environment still needs MySQL metadata, override before startup:
140+
141+
```bash
142+
export JFS_CLUSTER_META="mysql://root:123456@(127.0.0.1:3316)/juicefs_meta"
143+
./docker/thirdparties/run-thirdparties-docker.sh -c hive3
144+
```
145+
146+
## Logs and Debug
147+
148+
- Hive3 startup log: `docker/thirdparties/logs/start_hive3.log`
149+
- Hive2 startup log: `docker/thirdparties/logs/start_hive2.log`
150+
151+
By default, helper scripts keep xtrace off to reduce log noise.
152+
Enable debug trace when needed:
153+
154+
```bash
155+
export HIVE_DEBUG=1
156+
./docker/thirdparties/run-thirdparties-docker.sh -c hive3 --hive-mode refresh
157+
```
158+
159+
## Common Troubleshooting
160+
161+
- Metastore health check fails:
162+
- check `hive-metastore-postgresql` is healthy
163+
- inspect `start_hive3.log` or `start_hive2.log`
164+
- JuiceFS format/status fails:
165+
- verify `JFS_CLUSTER_META` is reachable
166+
- ensure target metadata database exists (startup script auto-creates for local MySQL/PostgreSQL)
167+
- Refresh is unexpectedly slow:
168+
- confirm `--hive-mode refresh` is used
169+
- use `--hive-modules` to narrow refresh scope

0 commit comments

Comments
 (0)