Skip to content

Retry Tests for Transient External Dependency / Infrastructure Failures #5398

@Subhanshu20135

Description

@Subhanshu20135

Summary

Introduce a Retry Test feature aimed at test failures caused by transient infrastructure or external dependency issues (e.g., network hiccups, temporary service unavailability, throttling), rather than application logic defects. The goal is to reduce flaky CI outcomes while keeping true failures visible and actionable.

Problem / Motivation

Some test cases rely on external systems such as databases, message brokers, HTTP services, or cloud resources. These dependencies can intermittently fail due to infrastructure conditions, causing:

  • False-negative test failures in CI
  • Manual re-runs of entire test suites
  • Reduced confidence in test signal and slower feedback loops

We need a controlled retry mechanism that:

  • Retries only when the failure is likely transient
  • Does not mask genuine logic or assertion failures
  • Clearly reports when a test required retrying to pass

Goals

  • Reduce CI flakiness caused by transient external dependency failures.
  • Preserve correctness by avoiding retries for deterministic logic/assertion failures.
  • Improve observability by recording retry attempts, reason, and final outcome.

Non-Goals

  • Blindly retrying all test failures.
  • Hiding flaky tests or dependency instability (retries must remain visible).
  • Replacing proper test isolation, cleanup, and resilience improvements.

Proposed Solution

Implement a retry policy that applies to tests interacting with external dependencies, I would be happy to provide my solution if it aligns with the Project Direction

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions