Skip to content

Commit ae8cd38

Browse files
fix: add Docker healthcheck for Dremio in CI (#941)
* fix: add retry logic for transient Dremio connection errors in CI Dremio OSS Docker intermittently drops TCP connections under concurrent load from 8 pytest-xdist workers, producing RemoteDisconnected errors during dbt seed and dbt run operations. Changes: - Add retry logic (3 attempts, 10s delay) to dbt seed in data_seeder.py - Add retry logic (3 attempts, 15s delay) to dbt run in env.py init() - Add Docker healthcheck to Dremio container in docker-compose-dremio.yml - Use 'docker compose up -d --wait' to ensure Dremio is healthy before tests - Make dremio-setup depend on Dremio health check passing Co-Authored-By: Itamar Hartstein <haritamar@gmail.com> * fix: replace --wait with manual Dremio health check wait docker compose --wait fails when one-shot containers (minio-setup) exit, even with exit code 0. Instead, poll the Dremio container's health status directly with docker inspect. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com> * fix: catch exceptions in retry loops for seed and init operations Co-Authored-By: Itamar Hartstein <haritamar@gmail.com> * fix: preserve exception chain in retry loops for better debugging Co-Authored-By: Itamar Hartstein <haritamar@gmail.com> * refactor: remove redundant retry wrappers from env.py and data_seeder.py The generic retry logic in elementary PR #2125 (CommandLineDbtRunner) now handles transient error detection and retries for all dbt commands, including seed and run. The CI test harness no longer needs its own retry wrappers since it calls dbt_runner.run() and dbt_runner.seed() which go through the runner's retry logic. This simplifies the PR to just the Docker healthcheck improvements for Dremio. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com> --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Itamar Hartstein <haritamar@gmail.com>
1 parent fc5d873 commit ae8cd38

2 files changed

Lines changed: 15 additions & 2 deletions

File tree

.github/workflows/test-warehouse.yml

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,13 @@ jobs:
9191
- name: Start Dremio
9292
if: inputs.warehouse-type == 'dremio'
9393
working-directory: ${{ env.TESTS_DIR }}
94-
run: docker compose -f docker-compose-dremio.yml up -d
94+
run: |
95+
docker compose -f docker-compose-dremio.yml up -d
96+
# Wait for Dremio to be healthy (one-shot containers like
97+
# minio-setup exit immediately, so --wait would fail).
98+
echo "Waiting for Dremio to become healthy..."
99+
timeout 180 bash -c 'until [ "$(docker inspect -f {{.State.Health.Status}} dremio 2>/dev/null)" = "healthy" ]; do sleep 5; done'
100+
echo "Dremio is healthy."
95101
96102
- name: Setup Python
97103
uses: actions/setup-python@v6

integration_tests/docker-compose-dremio.yml

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -65,12 +65,19 @@ services:
6565
- dremio_data:/opt/dremio/data:rw
6666
# Workaround for permission issues in podman
6767
user: "0"
68+
healthcheck:
69+
test: ["CMD-SHELL", "curl -sf http://localhost:9047 || exit 1"]
70+
interval: 5s
71+
timeout: 5s
72+
retries: 30
73+
start_period: 15s
6874

6975
dremio-setup:
7076
image: alpine:latest
7177
container_name: dremio-setup
7278
depends_on:
73-
- dremio
79+
dremio:
80+
condition: service_healthy
7481
volumes:
7582
- ./docker/dremio/dremio-setup.sh:/dremio-setup.sh
7683
command: sh /dremio-setup.sh

0 commit comments

Comments
 (0)