[Feat][Router]: Add automatic retry with exponential backoff and jitter by ikaadil · Pull Request #939 · vllm-project/production-stack

ikaadil · 2026-05-01T22:49:01Z

Overview

Implements automatic retry for transient HTTP failures with configurable exponential backoff and jitter. This enhances the router's resilience by automatically retrying failed requests instead of immediately returning errors to clients.

Details: https://medium.com/@avnein4988/mitigating-the-thundering-herd-problem-exponential-backoff-with-jitter-b507cdf90d62

Changes

Core Implementation

RetryConfig dataclass: Configurable retry parameters with exponential backoff calculation
Retry logic: Integrated into request processing flow in route_general_request()
Retryable status detection: Function to identify transient failures

CLI Arguments

Added retry configuration options:

--retry-max-retries (default: 5)
--retry-initial-backoff-ms (default: 50)
--retry-max-backoff-ms (default: 30000)
--retry-backoff-multiplier (default: 1.5)
--retry-jitter-factor (default: 0.2)
--disable-retries: Disable all retries

Key Features

Retryable Status Codes

Automatically retries on transient failures:

408 - Request Timeout
429 - Too Many Requests
500 - Internal Server Error
502 - Bad Gateway
503 - Service Unavailable
504 - Gateway Timeout

Exponential Backoff with Jitter

Prevents thundering herd through randomized delays:
Formula: delay = min(initial_backoff_ms × (multiplier ^ attempt), max_backoff_ms)
With jitter: D' = D × (1 + U[-j, +j])
Example with defaults:

Attempt 0: ~50ms
Attempt 1: ~75ms
Attempt 2: ~112ms
Attempt 3: ~168ms
Attempt 4: ~253ms

Backward Compatibility

Removed existing max_instance_failover_reroute_attempts behavior
Falls back to single attempt when set to 0
No breaking changes to existing functionality

Usage

vllm-router --port 8000 \
    --service-discovery static \
    --static-backends "http://localhost:9001,http://localhost:9002" \
    --static-models "facebook/opt-125m,facebook/opt-125m" \
    --routing-logic roundrobin \
    --retry-max-retries 5 \
    --retry-initial-backoff-ms 100 \
    --retry-max-backoff-ms 60000 \
    --retry-backoff-multiplier 2.0 \
    --retry-jitter-factor 0.1

gemini-code-assist

Code Review

This pull request introduces a retry mechanism with exponential backoff and jitter for transient failures, aligning the router's behavior with the sglang model gateway. The changes include a new RetryConfig dataclass, CLI arguments for configuration, updated documentation, and logic in the request service to handle retryable HTTP status codes (408, 429, 500, 502, 503, 504). Feedback identifies a logic error where retries are effectively disabled by default due to the max_attempts calculation, the inclusion of an unused last_response variable, and a concern that blacklisting URLs for transient errors prevents retrying the same backend in single-node environments.

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Increase timeout values in e2e test workflow Signed-off-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

…te attempts (vllm-project#839) * Add max instance failover reroute attempts configuration - Introduced a new command-line argument to specify the number of reroute attempts for failed requests. - Updated the routing logic to utilize this new configuration, allowing for better handling of request failures. - Enhanced the request routing service to incorporate the maximum reroute attempts in its logic. This change improves the robustness of the routing mechanism by allowing for configurable failover behavior. Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> * Add command-line argument for LMCache health check interval Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> * Refactor routing logic to directly set max instance failover reroute attempts - Removed the set_max_instance_failover_reroute_attempts method and directly assigned the value to the router's attribute. - Simplified the request routing logic by consolidating endpoint filtering and error handling, improving readability and maintainability. This change enhances the clarity of the routing logic and streamlines the handling of reroute attempts for failed requests. Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> * Add unit tests for instance failover routing logic Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> * Trigger pipeline Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> * Refactor request routing to improve request tracking - Moved the tracking of valid incoming requests to a more appropriate location in the routing logic. - Simplified the retrieval of endpoint information by ensuring it is called only once, enhancing code clarity. This change improves the maintainability of the request routing service. Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> * Add space Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> * Add the comments Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> * Add empty line Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> * Add comment Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> * Addressed the comment Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> * Fix the log Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> * Add comment Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> * Resolve conflict Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> --------- Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> Signed-off-by: Ifta khairul Alam Adil <25082512+ikaadil@users.noreply.github.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

…m-project#847) Signed-off-by: enneitex <etienne.divet@gmail.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

…llm-project#760) (vllm-project#844) Allow the router to be served under a subpath (e.g. /vllm) by passing root_path through to uvicorn. Also adds Helm chart support via routerSpec.rootPath. Signed-off-by: Keyu Chen <54015474+keyuchen21@users.noreply.github.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

…t to INFO. (vllm-project#846) * Expose LMCache log level as configurable Helm value and default to INFO. Signed-off-by: nargit <NargiT@users.noreply.github.com> * Fix names Signed-off-by: NargiT <NargiT@users.noreply.github.com> * fix tests and code Signed-off-by: NargiT <NargiT@users.noreply.github.com> * fix test for ray-cluster Signed-off-by: NargiT <NargiT@users.noreply.github.com> * fix typo Signed-off-by: NargiT <NargiT@users.noreply.github.com> * fix yet another typo Signed-off-by: NargiT <NargiT@users.noreply.github.com> * update doc Signed-off-by: NargiT <NargiT@users.noreply.github.com> --------- Signed-off-by: nargit <NargiT@users.noreply.github.com> Signed-off-by: NargiT <NargiT@users.noreply.github.com> Co-authored-by: nargit <NargiT@users.noreply.github.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

…ect#849) * feat(router): add --log-format json option for structured logging Add a JsonFormatter that outputs log records as JSON with timestamp, level, logger, message, filename, and lineno fields. The new --log-format flag (choices: text, json) controls the output format for both the router loggers and uvicorn. Signed-off-by: Keyu Chen <54015474+keyuchen21@users.noreply.github.com> * test: add tests for JsonFormatter and --log-format parser arg Add TestJsonFormatter class covering JSON output validation, exception inclusion/exclusion, format switching via set_log_format, and init_logger format respect. Add parser tests verifying --log-format defaults to text and accepts json. Update README logging options documentation. Signed-off-by: Keyu Chen <54015474+keyuchen21@users.noreply.github.com> * refactor: address review feedback for JsonFormatter Instantiate formatter once outside the loop in set_log_format to avoid redundant allocations. Add stack_info support and default=str fallback to JsonFormatter for robustness. Add tests for stack_info inclusion and non-serializable object handling. Signed-off-by: Keyu Chen <54015474+keyuchen21@users.noreply.github.com> * style: fix black formatting in test files Signed-off-by: Keyu Chen <54015474+keyuchen21@users.noreply.github.com> --------- Signed-off-by: Keyu Chen <54015474+keyuchen21@users.noreply.github.com> Signed-off-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

* [Router][Image Edit]: routing multi-part form requests Signed-off-by: Nuno Ramos <nmiguel123@gmail.com> * [Router][Refactor]: abstraction for proxying multipart form requests Signed-off-by: Nuno Ramos <nmiguel123@gmail.com> --------- Signed-off-by: Nuno Ramos <nmiguel123@gmail.com> Co-authored-by: Nuno Ramos <nmiguel123@gmail.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Signed-off-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

* feat(helm) add pdb and expose various options in the values. Add tests Signed-off-by: enneitex <etienne.divet@gmail.com> * feat(helm) update README and json schema with new fields Signed-off-by: enneitex <etienne.divet@gmail.com> --------- Signed-off-by: enneitex <etienne.divet@gmail.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

…#834) Includes: - GPU Bare-Metal node orchestration - Secure Traefik ingress + TLS Endpoints (cert-manager) - Prometheus + Grafana monitoring - Built-in vLLM production stack + Vllm inference dashboards - Terraform + Helm integration Signed-off-by: Kosseila (CloudThrill) <klouddude@gmail.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

…roject#777) * [Feat][Router] Add disaggregated prefill orchestrated routing Implements support for disaggregated prefill as outlined in the 2025 Q1 roadmap. This enables prefill/decode disaggregation with router-orchestrated KV cache transfer. Closes vllm-project#26 Signed-off-by: Yahav <yahavb@amazon.com> * [CI/Build] Lower Python version requirement to 3.10 for Neuron SDK compatibility Signed-off-by: Yahav <yahavb@amazon.com> * [Feat][Router] Address PR review feedback for disaggregated prefill orchestrated routing - Remove dead code (handle_orchestrated_request method in routing_logic.py) - Fix prefill request to use max_tokens=1 per proposal spec - Use shared aiohttp client instead of creating new session per request - Fix streaming to yield chunks immediately (true streaming) - Remove redundant isinstance check for DisaggregatedPrefillOrchestratedRouter - Use router's _find_endpoints method to avoid code duplication Signed-off-by: Yahav <yahavb@amazon.com> * fix: use kv_transfer_params instead of disagg_prefill_resp - Add kv_transfer_params to prefill request to enable disaggregated mode - Extract kv_transfer_params from prefill response and forward to decode - Set remote_host to prefill endpoint for KV cache retrieval Signed-off-by: Yahav <yahavb@amazon.com> * docs: add example for disaggregated_prefill_orchestrated mode - Add README with usage instructions and configuration notes - Add sanitized Kubernetes manifests (router, prefill, decode) - Include example curl command and expected router logs Signed-off-by: Yahav <yahavb@amazon.com> * style: fix black formatting Signed-off-by: Yahav <yahavb@amazon.com> * style: fix markdownlint errors in README.md Signed-off-by: Yahav <yahavb@amazon.com> * style: fix markdownlint errors in proposal doc Signed-off-by: Yahav <yahavb@amazon.com> * docs: clean up DisaggregatedPrefillOrchestratedRouter docstring Signed-off-by: Yahav <yahavb@amazon.com> * feat: return 503 with distinct codes for prefill/decode unavailability - PREFILL_SERVICE_UNAVAILABLE: No prefill endpoints discovered - DECODE_SERVICE_UNAVAILABLE: No decode endpoints discovered This allows automated tests to distinguish transient startup issues from real bugs. Signed-off-by: Yahav <yahavb@amazon.com> * revert: restore requires-python = 3.12 Signed-off-by: Yahav <yahavb@amazon.com> * fix: replace angle bracket placeholders with uppercase format Angle brackets like <your-pvc-name> are interpreted as shell redirections by shellcheck, causing CI failures. Use uppercase format instead: YOUR-PVC-NAME, YOUR-MODEL-PATH, etc. Signed-off-by: Yahav <yahavb@amazon.com> * fix: remove trailing whitespace from YAML files Signed-off-by: Yahav <yahavb@amazon.com> --------- Signed-off-by: Yahav <yahavb@amazon.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

* bugfix: deprecate disable log request Signed-off-by: Rui Zhang <zrfishnoodles@gmail.com> * Update helm/README.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> --------- Signed-off-by: Rui Zhang <zrfishnoodles@gmail.com> Signed-off-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

…#875) * feat(helm): add configurable NodePort to router service Add optional `routerSpec.nodePort` field that, when set alongside `routerSpec.serviceType: NodePort`, pins the NodePort to a fixed value instead of letting Kubernetes assign a random one on every helm upgrade. Closes vllm-project#763 Signed-off-by: Keyu Chen <54015474+keyuchen21@users.noreply.github.com> * fix(helm): move nodePort schema to routerSpec and use truthiness check - Move nodePort JSON schema property from servingEngineSpec to routerSpec where it belongs - Replace hasKey check with truthiness check in service-router.yaml to correctly handle nodePort: null Signed-off-by: Keyu Chen <54015474+keyuchen21@users.noreply.github.com> * docs(helm): document nodePort field in Router Configuration table Add routerSpec.nodePort entry to the Helm README to document the configurable NodePort introduced for the router service. Signed-off-by: Keyu_Chen <54015474+km5ar@users.noreply.github.com> --------- Signed-off-by: Keyu Chen <54015474+keyuchen21@users.noreply.github.com> Signed-off-by: Keyu_Chen <54015474+km5ar@users.noreply.github.com> Co-authored-by: Keyu_Chen <54015474+km5ar@users.noreply.github.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

…oning models emitting reasoning_content instead of content (vllm-project#873) Signed-off-by: Kosseila (CloudThrill) <klouddude@gmail.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Signed-off-by: Rui Zhang <zrfishnoodles@gmail.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Signed-off-by: enneitex <etienne.divet@gmail.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

…ect#880) * fix: Detect the media_type instead of hardcode to text/event-stream Signed-off-by: Shern Shiou Tan <shernshiou@gmail.com> * test: Add test for audio/wav and text/event-stream Signed-off-by: Shern Shiou Tan <shernshiou@gmail.com> * fix: Move media-type before header Signed-off-by: Shern Shiou Tan <shernshiou@gmail.com> --------- Signed-off-by: Shern Shiou Tan <shernshiou@gmail.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

…lm-project#889) Signed-off-by: Nejc Habjan <nejc.habjan@siemens.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Signed-off-by: Ifta khairul Alam Adil <25082512+ikaadil@users.noreply.github.com>

- Fix Helm chart version regression (0.1.10 -> 0.1.19) - Remove dead max_instance_failover_reroute_attempts parameter - Clarify max_retries semantics in docs (total attempts, not retries) - Add input validation for retry CLI arguments - Fix thread safety in RoutingInterface singleton init - Add super().__init__() to DisaggregatedPrefillOrchestratedRouter - Add comment explaining HTTPException retry behavior - Rename test to test_non_retryable_http_exception_not_retried - Add test for retryable HTTPException (503) Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

… 0.1.10. Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Move media_type extraction after headers are received to properly capture the Content-Type from backend responses. This fixes the audio content type forwarding test. Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

- Add tests for is_retryable_status function - Fix retry logic to exclude backends with retryable errors - Update validation message for retry_max_retries - Fix type annotation in routing_logic.py Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

ikaadil · 2026-05-04T09:43:00Z

@ruizhang0101 could you please review the MR? Thanks!

ruizhang0101 · 2026-05-04T16:37:46Z

@aeon-x Could you take a look at this?

- Removed req_id, sorted_endpoints, last_endpoints_id, and last_endpoints_hash from the RoundRobinRouter class initialization. - Streamlined the constructor to focus on essential attributes. Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

- Removed unused retry_urls set to simplify the retry mechanism. - Updated logic to filter remaining endpoints based solely on error_urls. Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

aeon-x · 2026-05-06T17:24:15Z

Hey @ikaadil, i think retrying should be an optional, as most users would need fast fail over.

Can you make sure that this retry mechanism is turned off unless it is explictly turned on by a flag?

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

…e as int Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

ikaadil · 2026-05-06T18:10:18Z

Can you make sure that this retry mechanism is turned off unless it is explictly turned on by a flag?

Done

Signed-off-by: Ifta khairul Alam Adil <ikaadil007@gmail.com>

gemini-code-assist Bot reviewed May 1, 2026

View reviewed changes

Comment thread src/vllm_router/services/request_service/request.py Outdated

Comment thread src/vllm_router/services/request_service/request.py Outdated

Comment thread src/vllm_router/services/request_service/request.py

ikaadil force-pushed the request-retry branch from bd6ced6 to 2190348 Compare May 2, 2026 08:03

ikaadil and others added 28 commits May 2, 2026 10:07

Bump Helm chart version to 0.1.10

1b544ca

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Increase timeout values in e2e test workflow (vllm-project#848)

71c5e17

Increase timeout values in e2e test workflow Signed-off-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

feat(helm) add support for extra manifests and annotation on pvc (vll…

091e76b

…m-project#847) Signed-off-by: enneitex <etienne.divet@gmail.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Update roadmap year from 2025 to 2026 (vllm-project#856)

2996d29

Signed-off-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Bump Helm chart version to 0.1.11

a454441

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Bump Helm chart version to 0.1.12

b61976a

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Bump Helm chart version to 0.1.13

6dddbcb

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Bump Helm chart version to 0.1.14

becb7d1

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Bump Helm chart version to 0.1.15

1ef3919

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Bump Helm chart version to 0.1.16

b353576

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Bump Helm chart version to 0.1.17

fb58295

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Bump Helm chart version to 0.1.18

bc543e9

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Bump Helm chart version to 0.1.19

38ecc59

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

fix(benchmark/multi-round-qa): fix TTFT NoneType crash caused by reas…

ca73ff4

…oning models emitting reasoning_content instead of content (vllm-project#873) Signed-off-by: Kosseila (CloudThrill) <klouddude@gmail.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

fix: fix cache server start command (vllm-project#872)

0f161ea

Signed-off-by: Rui Zhang <zrfishnoodles@gmail.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

feat(helm): refactor monitoring installation (vllm-project#860)

14d09cb

Signed-off-by: enneitex <etienne.divet@gmail.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

fix(service_discovery): correctly return 503 on missing endpoints (vl…

205dae3

…lm-project#889) Signed-off-by: Nejc Habjan <nejc.habjan@siemens.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

ikaadil and others added 4 commits May 2, 2026 10:12

Merge branch 'main' into request-retry

fe5d61a

Signed-off-by: Ifta khairul Alam Adil <25082512+ikaadil@users.noreply.github.com>

Remove AGENTS.md file and downgrade Helm chart version from 0.1.19 to…

946eae0

… 0.1.10. Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

[Bugfix] Fix media_type extraction for streaming responses

1c7980e

Move media_type extraction after headers are received to properly capture the Content-Type from backend responses. This fixes the audio content type forwarding test. Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

ikaadil force-pushed the request-retry branch from a5bac11 to 1c7980e Compare May 2, 2026 09:54

ikaadil added 2 commits May 2, 2026 15:53

Fix code review issues

997d18a

- Add tests for is_retryable_status function - Fix retry logic to exclude backends with retryable errors - Update validation message for retry_max_retries - Fix type annotation in routing_logic.py Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Trigger pipeline

2f6a063

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

ikaadil added 6 commits May 6, 2026 16:20

Refactor retry logic in route_general_request function

6d126ee

- Removed unused retry_urls set to simplify the retry mechanism. - Updated logic to filter remaining endpoints based solely on error_urls. Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Refactor RetryConfig to use instance methods for delay calculation

fa09fad

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Remove unused RetryConfig import from request.py to streamline code.

9677341

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Add test for handling retryable HTTP errors

ce25992

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Removed comment

4f970ae

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

ikaadil added 4 commits May 6, 2026 19:28

Update the last error

9f0f1b9

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Refactor retry configuration handling in routing logic and tests

eb11e23

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Enhance retry configuration to be disabled by default for fast failover

60526b1

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Fix type hint for PrefixAwareRouter constructor to specify return typ…

9164e36

…e as int Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

ikaadil and others added 4 commits May 6, 2026 21:45

Merge branch 'main' into request-retry

18d2451

Merge branch 'main' into request-retry

442c698

Merge branch 'main' into request-retry

fc1fc66

Merge branch 'main' into request-retry

fc24834

ikaadil force-pushed the request-retry branch from b947c53 to fc24834 Compare May 22, 2026 09:11

ikaadil and others added 4 commits May 22, 2026 11:11

Trigger pipeline

01a2330

Signed-off-by: Ifta khairul Alam Adil <ikaadil007@gmail.com>

Trigger pipeline

198e57f

Signed-off-by: Ifta khairul Alam Adil <ikaadil007@gmail.com>

Merge branch 'main' into request-retry

2f8ee6c

Merge branch 'main' into request-retry

f6e4ec5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feat][Router]: Add automatic retry with exponential backoff and jitter#939

[Feat][Router]: Add automatic retry with exponential backoff and jitter#939
ikaadil wants to merge 71 commits into
vllm-project:mainfrom
ikaadil:request-retry

ikaadil commented May 1, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ikaadil commented May 4, 2026

Uh oh!

ruizhang0101 commented May 4, 2026

Uh oh!

aeon-x commented May 6, 2026

Uh oh!

ikaadil commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Uh oh!

Conversation

ikaadil commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Changes

Core Implementation

CLI Arguments

Key Features

Retryable Status Codes

Exponential Backoff with Jitter

Backward Compatibility

Usage

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ikaadil commented May 4, 2026

Uh oh!

ruizhang0101 commented May 4, 2026

Uh oh!

aeon-x commented May 6, 2026

Uh oh!

ikaadil commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

ikaadil commented May 1, 2026 •

edited

Loading