Skip to content

Commit 774ec5f

Browse files
committed
Adds Docker Compose setup for Trino
Adds a Docker Compose configuration for setting up Trino with Iceberg, including support for Hive Metastore and REST catalog types. This allows for easier testing and development with Trino and Iceberg. closes #2219 add uuid partitions test with trino Add Trino as alternative tool for test uuid partitions as Java Iceberg 1.9.2 in spark is not yet supported. <!--related to #2002--> fix: correct conditions in namespace fix: add license to trino config file remove precommit as prek exist use mark to skip BucketTransform in Spark uuid partition test Trino: Restructure to make Trino integration optional and modular Address reviewer feedback from kevinjqliu: 1. Consolidated Trino infrastructure: - All Trino config files remain in dev/trino/ directory - docker-compose-trino.yml moved to dev/ (alongside integration compose) - run-trino.sh moved to dev/ (alongside other run scripts) 2. Removed Trino from main integration docker-compose: - Trino service removed from dev/docker-compose-integration.yml - Trino can now be spun up separately alongside main integration - Keeps Trino testing optional and not part of CI 3. Created dedicated test file: - tests/integration/test_trino.py for all Trino-specific tests - Moved test_schema_exists_in_trino from test_rest_catalog.py - Moved test_uuid_partitioning_with_trino from test_writes.py - Better separation of concerns and easier to maintain 4. Simplified pytest marker: - Changed from @pytest.mark.integration_trino to @pytest.mark.trino - Updated Makefile target: test-integration-trino -> test-trino - Updated pyproject.toml and conftest.py references This makes Trino integration testing opt-in and follows the same pattern as other optional test suites (s3, adls, gcs). Trino: Mount catalog files individually to preserve built-in catalogs Address feedback from ebyhr on PR #2220 discussion r2583421945. Instead of mounting the entire catalog directory, mount individual catalog property files. This allows Trino to preserve its built-in catalogs (memory, TPCH) which are helpful during development, while still providing our custom Iceberg catalogs. Mounted files: - warehouse_rest.properties - REST catalog configuration - warehouse_hive.properties - Hive catalog configuration - config.properties - Trino server configuration make lint
1 parent c0e7c6d commit 774ec5f

File tree

12 files changed

+554
-6
lines changed

12 files changed

+554
-6
lines changed

Makefile

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,10 @@ test-integration-rebuild: ## Rebuild integration Docker services from scratch
121121
docker compose -f dev/docker-compose-integration.yml rm -f
122122
docker compose -f dev/docker-compose-integration.yml build --no-cache
123123

124+
test-trino: ## Run tests marked with @pytest.mark.trino
125+
sh ./dev/run-trino.sh
126+
$(TEST_RUNNER) pytest tests/ -m trino $(PYTEST_ARGS)
127+
124128
test-s3: ## Run tests marked with @pytest.mark.s3
125129
sh ./dev/run-minio.sh
126130
$(TEST_RUNNER) pytest tests/ -m s3 $(PYTEST_ARGS)
@@ -134,7 +138,7 @@ test-gcs: ## Run tests marked with @pytest.mark.gcs
134138
$(TEST_RUNNER) pytest tests/ -m gcs $(PYTEST_ARGS)
135139

136140
test-coverage: ## Run all tests with coverage and report
137-
$(MAKE) COVERAGE=1 test test-integration test-s3 test-adls test-gcs
141+
$(MAKE) COVERAGE=1 test test-integration test-s3 test-adls test-gcs test-s3 test-trino
138142
$(MAKE) coverage-report
139143

140144
coverage-report: ## Combine and report coverage

dev/docker-compose-integration.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,7 @@ services:
5858
- CATALOG_IO__IMPL=org.apache.iceberg.aws.s3.S3FileIO
5959
- CATALOG_S3_ENDPOINT=http://minio:9000
6060
- CATALOG_JDBC_STRICT__MODE=true
61+
6162
minio:
6263
image: minio/minio
6364
container_name: pyiceberg-minio

dev/docker-compose-trino.yml

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
services:
18+
rest:
19+
image: apache/iceberg-rest-fixture
20+
container_name: pyiceberg-rest
21+
networks:
22+
iceberg_net:
23+
ports:
24+
- 8181:8181
25+
environment:
26+
- AWS_ACCESS_KEY_ID=admin
27+
- AWS_SECRET_ACCESS_KEY=password
28+
- AWS_REGION=us-east-1
29+
- CATALOG_WAREHOUSE=s3://warehouse/
30+
- CATALOG_IO__IMPL=org.apache.iceberg.aws.s3.S3FileIO
31+
- CATALOG_S3_ENDPOINT=http://minio:9000
32+
33+
trino:
34+
image: trinodb/trino:478
35+
container_name: pyiceberg-trino
36+
networks:
37+
iceberg_net:
38+
ports:
39+
- 8082:8080
40+
environment:
41+
- CATALOG_MANAGEMENT=dynamic
42+
depends_on:
43+
- rest
44+
- hive
45+
volumes:
46+
- ./trino/catalog/warehouse_rest.properties:/etc/trino/catalog/warehouse_rest.properties
47+
- ./trino/catalog/warehouse_hive.properties:/etc/trino/catalog/warehouse_hive.properties
48+
- ./trino/config.properties:/etc/trino/config.properties
49+
50+
minio:
51+
image: minio/minio
52+
container_name: pyiceberg-minio
53+
environment:
54+
- MINIO_ROOT_USER=admin
55+
- MINIO_ROOT_PASSWORD=password
56+
- MINIO_DOMAIN=minio
57+
networks:
58+
iceberg_net:
59+
aliases:
60+
- warehouse.minio
61+
ports:
62+
- 9001:9001
63+
- 9000:9000
64+
command: ["server", "/data", "--console-address", ":9001"]
65+
mc:
66+
depends_on:
67+
- minio
68+
image: minio/mc
69+
container_name: pyiceberg-mc
70+
networks:
71+
iceberg_net:
72+
environment:
73+
- AWS_ACCESS_KEY_ID=admin
74+
- AWS_SECRET_ACCESS_KEY=password
75+
- AWS_REGION=us-east-1
76+
entrypoint: >
77+
/bin/sh -c "
78+
until (/usr/bin/mc alias set minio http://minio:9000 admin password) do echo '...waiting...' && sleep 1; done;
79+
/usr/bin/mc mb minio/warehouse;
80+
/usr/bin/mc policy set public minio/warehouse;
81+
tail -f /dev/null
82+
"
83+
84+
hive:
85+
build: hive/
86+
container_name: hive
87+
hostname: hive
88+
networks:
89+
iceberg_net:
90+
ports:
91+
- 9083:9083
92+
environment:
93+
SERVICE_NAME: "metastore"
94+
SERVICE_OPTS: "-Dmetastore.warehouse.dir=s3a://warehouse/hive/"
95+
96+
networks:
97+
iceberg_net:

dev/run-trino.sh

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
#!/bin/bash
2+
#
3+
# Licensed to the Apache Software Foundation (ASF) under one
4+
# or more contributor license agreements. See the NOTICE file
5+
# distributed with this work for additional information
6+
# regarding copyright ownership. The ASF licenses this file
7+
# to you under the Apache License, Version 2.0 (the
8+
# "License"); you may not use this file except in compliance
9+
# with the License. You may obtain a copy of the License at
10+
#
11+
# http://www.apache.org/licenses/LICENSE-2.0
12+
#
13+
# Unless required by applicable law or agreed to in writing,
14+
# software distributed under the License is distributed on an
15+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
16+
# KIND, either express or implied. See the License for the
17+
# specific language governing permissions and limitations
18+
# under the License.
19+
#
20+
21+
set -ex
22+
23+
if [ $(docker ps -q --filter "name=pyiceberg-trino" --filter "status=running" ) ]; then
24+
echo "Trino service running"
25+
else
26+
docker compose -f dev/docker-compose-trino.yml kill
27+
docker compose -f dev/docker-compose-trino.yml up -d
28+
while [ -z $(docker ps -q --filter "name=pyiceberg-trino" --filter "status=running" ) ]
29+
do
30+
echo "Waiting for Trino"
31+
sleep 1
32+
done
33+
fi
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
connector.name=iceberg
18+
iceberg.catalog.type=hive_metastore
19+
iceberg.expire-snapshots.min-retention=0d
20+
iceberg.remove-orphan-files.min-retention=0d
21+
iceberg.register-table-procedure.enabled=true
22+
hive.metastore.uri=thrift://hive:9083
23+
iceberg.hive-catalog-name=hive
24+
fs.native-s3.enabled=true
25+
s3.region=us-east-1
26+
s3.aws-access-key=admin
27+
s3.aws-secret-key=password
28+
s3.endpoint=http://minio:9000
29+
s3.path-style-access=false
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
connector.name=iceberg
18+
iceberg.catalog.type=rest
19+
iceberg.rest-catalog.uri=http://rest:8181
20+
iceberg.rest-catalog.warehouse=s3://warehouse/default
21+
iceberg.rest-catalog.nested-namespace-enabled=true
22+
iceberg.rest-catalog.case-insensitive-name-matching=true
23+
iceberg.expire-snapshots.min-retention=0d
24+
iceberg.remove-orphan-files.min-retention=0d
25+
iceberg.register-table-procedure.enabled=true
26+
fs.native-s3.enabled=true
27+
s3.region=us-east-1
28+
s3.aws-access-key=admin
29+
s3.aws-secret-key=password
30+
s3.endpoint=http://minio:9000
31+
s3.path-style-access=false

dev/trino/config.properties

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
coordinator=true
18+
node-scheduler.include-coordinator=true
19+
http-server.http.port=8080
20+
discovery.uri=http://localhost:8080
21+
http-server.process-forwarded=true
22+
http-server.https.enabled=false
23+
catalog.management=${ENV:CATALOG_MANAGEMENT}

pyproject.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -119,6 +119,7 @@ dev = [
119119
"mypy-boto3-dynamodb>=1.28.18",
120120
"pyarrow-stubs>=20.0.0.20251107", # Remove when pyarrow >= 23.0.0 https://github.com/apache/arrow/pull/47609
121121
"sqlalchemy>=2.0.18,<3",
122+
"trino[sqlalchemy]>=0.336.0",
122123
]
123124
# for mkdocs
124125
docs = [
@@ -157,6 +158,7 @@ markers = [
157158
"s3: marks a test as requiring access to s3 compliant storage (use with --aws-access-key-id, --aws-secret-access-key, and --endpoint args)",
158159
"adls: marks a test as requiring access to adls compliant storage (use with --adls.account-name, --adls.account-key, and --adls.endpoint args)",
159160
"integration: marks integration tests against Apache Spark",
161+
"trino: marks integration tests against Trino",
160162
"gcs: marks a test as requiring access to gcs compliant storage (use with --gs.token, --gs.project, and --gs.endpoint)",
161163
"benchmark: collection of tests to validate read/write performance before and after a change",
162164
]

tests/conftest.py

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@
4646
from moto import mock_aws
4747
from pydantic_core import to_json
4848
from pytest_lazyfixture import lazy_fixture
49+
from sqlalchemy import Connection
4950

5051
from pyiceberg.catalog import Catalog, load_catalog
5152
from pyiceberg.catalog.memory import InMemoryCatalog
@@ -146,6 +147,18 @@ def pytest_addoption(parser: pytest.Parser) -> None:
146147
"--gcs.oauth2.token", action="store", default="anon", help="The GCS authentication method for tests marked gcs"
147148
)
148149
parser.addoption("--gcs.project-id", action="store", default="test", help="The GCP project for tests marked gcs")
150+
parser.addoption(
151+
"--trino.rest.endpoint",
152+
action="store",
153+
default="trino://test@localhost:8082/warehouse_rest",
154+
help="The Trino REST endpoint URL for tests marked as trino",
155+
)
156+
parser.addoption(
157+
"--trino.hive.endpoint",
158+
action="store",
159+
default="trino://test@localhost:8082/warehouse_hive",
160+
help="The Trino Hive endpoint URL for tests marked as trino",
161+
)
149162

150163

151164
@pytest.fixture(scope="session")
@@ -2583,6 +2596,28 @@ def bound_reference_uuid() -> BoundReference:
25832596
return BoundReference(field=NestedField(1, "field", UUIDType(), required=False), accessor=Accessor(position=0, inner=None))
25842597

25852598

2599+
@pytest.fixture(scope="session")
2600+
def trino_hive_conn(request: pytest.FixtureRequest) -> Generator[Connection, None, None]:
2601+
from sqlalchemy import create_engine
2602+
2603+
trino_endpoint = request.config.getoption("--trino.hive.endpoint")
2604+
engine = create_engine(trino_endpoint)
2605+
connection = engine.connect()
2606+
yield connection
2607+
connection.close()
2608+
2609+
2610+
@pytest.fixture(scope="session")
2611+
def trino_rest_conn(request: pytest.FixtureRequest) -> Generator[Connection, None, None]:
2612+
from sqlalchemy import create_engine
2613+
2614+
trino_endpoint = request.config.getoption("--trino.rest.endpoint")
2615+
engine = create_engine(trino_endpoint)
2616+
connection = engine.connect()
2617+
yield connection
2618+
connection.close()
2619+
2620+
25862621
@pytest.fixture(scope="session")
25872622
def session_catalog() -> Catalog:
25882623
return load_catalog(

0 commit comments

Comments
 (0)