Skip to content

[Feature][High] Add retry logic, request timeout configuration, and client-side rate limiting #7

@numbers-official

Description

@numbers-official

Summary

The SDK currently lacks retry logic for transient network failures, configurable request timeouts, and client-side rate limiting. These are important for production resilience, especially given the SDK communicates with 5 distinct API endpoints across different providers (Numbers API, AWS Lambda, Google Cloud Functions, Pipedream).

Findings

1. No Retry Logic for Transient Failures

Python (python/numbersprotocol_capture/client.py, lines 184-209):

try:
    response = self._client.request(method, url, ...)
except httpx.RequestError as e:
    raise create_api_error(0, f"Network error: {e}", nid) from e

TypeScript (ts/src/client.ts, lines 176-193):

const response = await fetch(url, { method, headers, body: requestBody })

Both SDKs fail immediately on any network error. For production usage against distributed backends (AWS Lambda cold starts, GCF scaling), retrying on 429 (rate limit), 502/503/504 (transient server errors), and connection timeouts would significantly improve reliability.

2. Hardcoded Timeout (Python) / No Timeout (TypeScript)

Python (client.py, line 159):

self._client = httpx.Client(timeout=30.0)

The 30-second timeout is reasonable but not configurable by the caller.

TypeScript (client.ts, line 176):

const response = await fetch(url, { ... })

No timeout is configured at all — fetch will wait indefinitely by default, which can cause hanging requests in production.

3. No Client-Side Rate Limiting

Neither SDK implements rate limiting. If a consumer makes rapid successive calls (e.g., batch registration), they may overwhelm the backend APIs and receive 429 errors with no backoff strategy.

Suggested Implementation

Retry with Exponential Backoff

  • Retry on status codes: 429, 500, 502, 503, 504
  • Retry on network/connection errors
  • Max 3 retries with exponential backoff (1s, 2s, 4s)
  • Configurable via CaptureOptions (e.g., max_retries, retry_delay)

Configurable Timeout

  • Add timeout parameter to CaptureOptions (Python and TypeScript)
  • TypeScript: Use AbortController with setTimeout for fetch timeout
  • Default: 30 seconds (maintain current Python behavior)

Optional Rate Limiter

  • Simple token-bucket or sliding-window limiter
  • Configurable requests-per-second limit
  • Enabled via CaptureOptions (e.g., rate_limit: 10 for 10 req/s)

Expected Impact

  • Reliability: Significantly improves resilience against transient failures across 5 different backend services
  • Developer Experience: Reduces boilerplate — consumers won't need to implement their own retry/timeout logic
  • Production Readiness: Essential for any production deployment doing batch operations

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions