Skip to content

add retry logic to stream() and stream_typed() on initial connection#8

Open
ayaan-kapoor wants to merge 2 commits into
gumloop:mainfrom
ayaan-kapoor:feat/stream-retry
Open

add retry logic to stream() and stream_typed() on initial connection#8
ayaan-kapoor wants to merge 2 commits into
gumloop:mainfrom
ayaan-kapoor:feat/stream-retry

Conversation

@ayaan-kapoor

Copy link
Copy Markdown

What

Add retry logic to HttpClient.stream(), HttpClient.stream_typed(), and their async counterparts for transient failures on the initial connection.

Why

A bot review on #7 flagged that stream() and stream_typed() had no retry logic while _request() and post_to_stream_host() did — callers using the streaming path would get an immediate RateLimitError or ServerError where the non-streaming path would transparently retry.

Retry only covers the initial connection handshake. Once the server starts sending events we're committed to the response and can't restart without re-delivering events to the caller, so mid-stream errors are left as-is.

Same rules as _request(): idempotent methods retry on both 429 and 5xx; POST/PATCH only retry on 429. max_retries=0 disables it entirely.

Tests

7 new tests in tests/sdk/test_stream_retry.py covering sync and async paths, 429/500 retry-then-succeed, exhaustion after max retries, no-retry on 4xx, and max_retries=0.

@greptile-apps

greptile-apps Bot commented Jun 10, 2026

Copy link
Copy Markdown

Greptile Summary

This PR adds retry logic with exponential back-off and jitter to HttpClient.stream(), HttpClient.stream_typed(), and their async counterparts, mirroring the retry behaviour already present on the non-streaming path. Retry only applies to the initial connection handshake; once events are flowing the connection is committed.

  • _http.py introduces _should_retry, _parse_retry_after, and _retry_delay helpers; idempotent methods (GET, HEAD, PUT, DELETE, OPTIONS) retry on both 429 and 5xx, while POST/PATCH only retry on 429.
  • _client.py threads a max_retries parameter through Gumloop/AsyncGumloop down to the HTTP clients, with DEFAULT_MAX_RETRIES = 2.
  • tests/sdk/test_stream_retry.py adds seven tests, though the "raises_after_max_retries" test uses POST + 500 which is raised immediately without retrying, leaving true retry-exhaustion behaviour untested.

Confidence Score: 4/5

The implementation is correct and safe to merge; the retry loop, back-off math, and _should_retry guard all behave as intended. The only concern is in the test suite.

The production code in _http.py and _client.py is solid — the retry helpers are correct, the exhaustion condition is properly bounded by _max_retries, and the generator continue/return control flow exits the with block cleanly on every path. The gap is in the tests: test_stream_raises_after_max_retries uses POST + 500 which is raised on the first attempt (no retries), so the retry-exhaustion code path — the case where a retryable condition repeats until all attempts are consumed — has no test. A regression there would go undetected.

tests/sdk/test_stream_retry.py — the exhaustion scenario needs a retryable method/status combination (GET + 500 or POST + 429) with enough mocked failures to consume all retries.

Important Files Changed

Filename Overview
src/gumloop/_http.py Adds _should_retry, _parse_retry_after, and _retry_delay helpers, then wraps the stream/stream_typed bodies (sync and async) in a retry loop using exponential back-off with jitter; max_retries is threaded through __init__.
src/gumloop/_client.py Adds max_retries parameter to Gumloop and AsyncGumloop constructors and forwards it to HttpClient/AsyncHttpClient; imports DEFAULT_MAX_RETRIES for the default value.
tests/sdk/test_stream_retry.py Seven tests cover sync/async paths, 429 retry-then-succeed, POST+500 no-retry, 404 no-retry, and max_retries=0; however the "raises_after_max_retries" test uses POST+500 which never retries, so true exhaustion of the retry loop is not covered.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[stream / stream_typed called] --> B[attempt = 0]
    B --> C[Open HTTP stream connection]
    C --> D{status >= 400?}
    D -- No --> E[Yield SSE events]
    E --> F[return]
    D -- Yes --> G[read response body\ncreate exc via to_api_error]
    G --> H{attempt < max_retries\nAND _should_retry?}
    H -- No --> I[raise exc]
    H -- Yes --> J[_retry_delay\nexponential back-off + jitter\nor Retry-After header]
    J --> K[time.sleep / asyncio.sleep]
    K --> L[attempt += 1]
    L --> C
Loading

Reviews (2): Last reviewed commit: "fix review: correct POST+500 test, fix c..." | Re-trigger Greptile

Comment on lines +42 to +55
@respx.mock
def test_stream_retries_on_500_then_yields_events(monkeypatch: pytest.MonkeyPatch, client: Gumloop) -> None:
monkeypatch.setattr(time, "sleep", lambda _: None)

session_id = "sess_abc"
respx.post(f"{_STREAM_BASE}/sessions/{session_id}/stream").mock(
side_effect=[
httpx.Response(500, json={}),
httpx.Response(200, content=_sse('{"type": "done"}'), headers={"content-type": "text/event-stream"}),
]
)

events = list(client._http.stream("POST", f"sessions/{session_id}/stream"))
assert len(events) == 1

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 POST + 500 retry test will fail

_should_retry explicitly returns False when the method is not in _IDEMPOTENT_METHODS and the exception is a ServerError. POST is not in that set, so a 500 from a POST stream is raised immediately on the first attempt — it is never retried. The test issues a POST then expects the second (successful) mock to be consumed, but stream() will raise exc after the first 500 and the list(...) call will propagate the ServerError before events is even assigned.

Comment thread src/gumloop/_http.py Outdated
Comment on lines +33 to +35
# Streaming connections are always GET-like: the caller hasn't mutated state,
# so it is safe to retry on any transient server error, not just 429.
_IDEMPOTENT_METHODS = frozenset({"GET", "HEAD", "DELETE", "OPTIONS", "PUT"})

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Misleading comment contradicts the retry policy

The comment says "Streaming connections are always GET-like: the caller hasn't mutated state, so it is safe to retry on any transient server error, not just 429." This implies every streaming call will retry on 5xx regardless of method. But _should_retry immediately returns False for POST/PATCH + ServerError, so POST streams are not retried on 500 — only on 429. The comment should either be removed or updated to describe the actual rule ("same as _request(): only idempotent methods retry on 5xx; POST/PATCH only on 429").

Comment on lines +99 to +127
@respx.mock
def test_async_stream_retries_on_429_then_yields_events() -> None:
async def run() -> None:
import asyncio as _asyncio

slept: list[float] = []

async def _noop_sleep(s: float) -> None:
slept.append(s)

_asyncio.sleep = _noop_sleep # type: ignore[assignment]

respx.post(f"{_STREAM_BASE}/sessions/sess_abc/stream").mock(
side_effect=[
httpx.Response(429, headers={"retry-after": "0"}, json={}),
httpx.Response(
200,
content=_sse('{"type": "message"}'),
headers={"content-type": "text/event-stream"},
),
]
)

async with AsyncGumloop(access_token="token") as client:
events = [e async for e in client._http.stream("POST", "sessions/sess_abc/stream")]

assert len(events) == 1

asyncio.run(run())

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Global asyncio.sleep patch is not test-isolated

_asyncio.sleep = _noop_sleep replaces the module-level function permanently for the duration of the process. If another test in the same worker imports asyncio after this runs, it will get the patched version. Using monkeypatch.setattr(asyncio, "sleep", _noop_sleep) (as done for the sync time.sleep in the other tests) is the correct approach — it is automatically reverted after the test finishes.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant