Skip to content

Add smart retry with timeout on startup #21

@iliia-veselov

Description

@iliia-veselov

Summary

When SDMX Proxy is deployed as part of a Helm chart, some dependent components
(e.g. Redis, upstream registries) may be temporarily unavailable during startup.
Currently the proxy fails immediately on connection errors. It should instead
retry with a configurable timeout, similar to how statgpt-backend handles
startup dependencies.

Motivation

  • The proxy is now deployed via Helm alongside other services that start
    concurrently. Transient unavailability at boot time is expected.
  • Immediate failure forces manual restarts or complex init-container
    orchestration that a simple retry loop would eliminate.

Proposed behavior

  1. On startup, if a required dependency (Redis, registry health check, config
    endpoint, etc.) is unreachable, retry the connection instead of failing.
  2. Retries should use exponential back-off with a maximum total timeout.
  3. Both the retry count / interval and the overall timeout should be
    configurable (e.g. via environment variables or application.yaml).
  4. After the timeout expires without a successful connection, fail with a clear
    error message indicating which dependency could not be reached.

Reference

  • statgpt-backend already implements startup retries and timeouts for both of
    its components -- use it as a reference implementation.

Priority

Nice to have (not critical, but important for production stability).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions