fix(go-services): mysql healthcheck false-healthy race (kafka-ecommerce CI flake) by slayerjain · Pull Request #14 · keploy/ecommerce_sample_app

slayerjain · 2026-06-15T08:52:00Z

What

Fix the intermittent mysql-users/products/orders "container unhealthy" → "dependency failed to start" flake (seen in keploy/enterprise's kafka-ecommerce CI lane, which clones this repo's go branch and runs go-services under keploy record).

Root cause (traced)

The mysql healthchecks used mysqladmin ping -h localhost. MySQL 8.0's entrypoint first runs a temporary, socket-only init server (to apply the seed db.sql), then stops it and starts the real server on :3306. ping -h localhost hits that unix socket and checks exit code only — and mysqladmin exits 0 even on "Access denied". A per-second trace caught the probe reporting mysqld is alive at t=8s against the temp server, ~3s before the real :3306 listener came up (t=11s).

So docker marked the container healthy on the temp server. A dependent service (condition: service_healthy) that connects over TCP mysql-users:3306 then started too early and failed during the temp→real restart gap (~4-6s, wider under load); docker's next probes hit the now-stopped temp server, driving the failing streak to the 20-retry limit → unhealthy. There was also no start_period, so cold-init failures under CI contention burned the retry budget.

Fix

Probe the real TCP listener with root creds — mysqladmin ping -h 127.0.0.1 -P 3306 -uroot -proot (only the fully-started real server answers on :3306, never the temp server) — and add start_period: 60s so slow cold init doesn't count against the retries. Applied to all three mysql services.

Validation

docker compose config validates.
Per-second health trace before: healthy at t=8s with Access denied … (using password: NO) — i.e. healthy without a real connection, against the temp server.
After: FailingStreak stays 0 through cold init (absorbed by start_period); healthy only at t=16s with mysqld is alive over real TCP :3306 — never the temp server. 5/5 cold-start loops healthy.

Only the three mysql healthchecks change (+30/−3); no service/app changes.

…it server The mysql-users/products/orders healthchecks used `mysqladmin ping -h localhost`, which hits MySQL 8.0's socket-only TEMPORARY init server (run to apply the seed db.sql before the real server starts) and passes on exit-code-only — it returns 0 even on "Access denied". So docker marked the container healthy ~3s before the real :3306 TCP listener was up. A service depending on it via `condition: service_healthy` then connected over TCP too early and failed during the temp-server -> real-server restart gap; docker's subsequent probes against the now-stopped temp server drove the failing streak to the retry limit -> "container unhealthy" -> "dependency failed to start". This is the intermittent kafka-ecommerce CI flake (it passes whenever timing happens to favour it). Fix: probe the REAL TCP listener with root creds (`mysqladmin ping -h 127.0.0.1 -P 3306 -uroot -proot`) — only the fully-started real server answers there, never the temp server — and add `start_period: 60s` so slow cold init under CI contention doesn't burn the retry budget before :3306 is up. Applied to all three mysql services. Signed-off-by: Shubham Jain <shubham@keploy.io>

slayerjain requested a review from charankamarapu as a code owner June 15, 2026 08:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix(go-services): mysql healthcheck false-healthy race (kafka-ecommerce CI flake)#14

fix(go-services): mysql healthcheck false-healthy race (kafka-ecommerce CI flake)#14
slayerjain wants to merge 1 commit into
gofrom
fix/mysql-healthcheck-false-healthy-race

slayerjain commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Uh oh!

Conversation

slayerjain commented Jun 15, 2026

What

Root cause (traced)

Fix

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant