Summary
The SDK currently lacks retry logic for transient network failures, configurable request timeouts, and client-side rate limiting. These are important for production resilience, especially given the SDK communicates with 5 distinct API endpoints across different providers (Numbers API, AWS Lambda, Google Cloud Functions, Pipedream).
Findings
1. No Retry Logic for Transient Failures
Python (python/numbersprotocol_capture/client.py, lines 184-209):
try:
response = self._client.request(method, url, ...)
except httpx.RequestError as e:
raise create_api_error(0, f"Network error: {e}", nid) from e
TypeScript (ts/src/client.ts, lines 176-193):
const response = await fetch(url, { method, headers, body: requestBody })
Both SDKs fail immediately on any network error. For production usage against distributed backends (AWS Lambda cold starts, GCF scaling), retrying on 429 (rate limit), 502/503/504 (transient server errors), and connection timeouts would significantly improve reliability.
2. Hardcoded Timeout (Python) / No Timeout (TypeScript)
Python (client.py, line 159):
self._client = httpx.Client(timeout=30.0)
The 30-second timeout is reasonable but not configurable by the caller.
TypeScript (client.ts, line 176):
const response = await fetch(url, { ... })
No timeout is configured at all — fetch will wait indefinitely by default, which can cause hanging requests in production.
3. No Client-Side Rate Limiting
Neither SDK implements rate limiting. If a consumer makes rapid successive calls (e.g., batch registration), they may overwhelm the backend APIs and receive 429 errors with no backoff strategy.
Suggested Implementation
Retry with Exponential Backoff
- Retry on status codes: 429, 500, 502, 503, 504
- Retry on network/connection errors
- Max 3 retries with exponential backoff (1s, 2s, 4s)
- Configurable via
CaptureOptions (e.g., max_retries, retry_delay)
Configurable Timeout
- Add
timeout parameter to CaptureOptions (Python and TypeScript)
- TypeScript: Use
AbortController with setTimeout for fetch timeout
- Default: 30 seconds (maintain current Python behavior)
Optional Rate Limiter
- Simple token-bucket or sliding-window limiter
- Configurable requests-per-second limit
- Enabled via
CaptureOptions (e.g., rate_limit: 10 for 10 req/s)
Expected Impact
- Reliability: Significantly improves resilience against transient failures across 5 different backend services
- Developer Experience: Reduces boilerplate — consumers won't need to implement their own retry/timeout logic
- Production Readiness: Essential for any production deployment doing batch operations
Summary
The SDK currently lacks retry logic for transient network failures, configurable request timeouts, and client-side rate limiting. These are important for production resilience, especially given the SDK communicates with 5 distinct API endpoints across different providers (Numbers API, AWS Lambda, Google Cloud Functions, Pipedream).
Findings
1. No Retry Logic for Transient Failures
Python (
python/numbersprotocol_capture/client.py, lines 184-209):TypeScript (
ts/src/client.ts, lines 176-193):Both SDKs fail immediately on any network error. For production usage against distributed backends (AWS Lambda cold starts, GCF scaling), retrying on 429 (rate limit), 502/503/504 (transient server errors), and connection timeouts would significantly improve reliability.
2. Hardcoded Timeout (Python) / No Timeout (TypeScript)
Python (
client.py, line 159):The 30-second timeout is reasonable but not configurable by the caller.
TypeScript (
client.ts, line 176):No timeout is configured at all —
fetchwill wait indefinitely by default, which can cause hanging requests in production.3. No Client-Side Rate Limiting
Neither SDK implements rate limiting. If a consumer makes rapid successive calls (e.g., batch registration), they may overwhelm the backend APIs and receive 429 errors with no backoff strategy.
Suggested Implementation
Retry with Exponential Backoff
CaptureOptions(e.g.,max_retries,retry_delay)Configurable Timeout
timeoutparameter toCaptureOptions(Python and TypeScript)AbortControllerwithsetTimeoutfor fetch timeoutOptional Rate Limiter
CaptureOptions(e.g.,rate_limit: 10for 10 req/s)Expected Impact