Flaky tests: eager-flush transcript-mirror tests fail with 2 sleep(0) yields (test_transcript_mirror.py)

## Summary

Two tests in `tests/test_transcript_mirror.py` covering `session_store_flush="eager"` fail reliably on Python 3.11.14 / macOS arm64 against current `main` (HEAD `9aafd84`). The implementation is correct; the tests are timing-fragile — they assume 2 `await asyncio.sleep(0)` yields are enough for the second eager flush to reach `store.append`, but the actual path needs ~4 yields when there's lock contention between consecutive drains.

PR #905 (May 3) merged with all CI green; the failures appear locally despite no intervening commits to `transcript_mirror_batcher.py`. Suggests CI got lucky on event-loop scheduling at merge time.

## Affected tests

- `tests/test_transcript_mirror.py::TestBuildMirrorBatcherFlushMode::test_eager_mode_flushes_per_frame` (unit-level)
- `tests/test_transcript_mirror.py::TestReceiveLoopFramePeeling::test_eager_flush_mode_appends_per_frame_before_result` (integration-level)

Both fail with the same symptom: `assert appends_at_assistant == 2` (or `len(store.append_calls) == 2`) sees only 1 append.

## Reproducer

5/5 consecutive failures on my machine:

```
$ for i in 1 2 3 4 5; do uv run pytest tests/test_transcript_mirror.py::TestBuildMirrorBatcherFlushMode::test_eager_mode_flushes_per_frame -q 2>&1 | tail -2 | head -1; done
FAILED tests/test_transcript_mirror.py::TestBuildMirrorBatcherFlushMode::test_eager_mode_flushes_per_frame
FAILED tests/test_transcript_mirror.py::TestBuildMirrorBatcherFlushMode::test_eager_mode_flushes_per_frame
FAILED tests/test_transcript_mirror.py::TestBuildMirrorBatcherFlushMode::test_eager_mode_flushes_per_frame
FAILED tests/test_transcript_mirror.py::TestBuildMirrorBatcherFlushMode::test_eager_mode_flushes_per_frame
FAILED tests/test_transcript_mirror.py::TestBuildMirrorBatcherFlushMode::test_eager_mode_flushes_per_frame
```

Sweeping the yield count with a standalone probe (same `_RecordingStore` and `build_mirror_batcher` setup) shows the boundary clearly:

```
yields=  2: after_first=1, after_second=1   <-- FAILS (test asserts 2)
yields=  4: after_first=1, after_second=2
yields=  6: after_first=1, after_second=2
yields= 10: after_first=1, after_second=2
yields=100: after_first=1, after_second=2
```

The implementation works — it just needs more event-loop turns than the tests provide.

## Root cause

The first enqueue → drain happens cleanly within 2 yields because there's no contention. The second drain has to traverse:

1. First drain's `await asyncio.wait_for(store.append, ...)` returns (1 yield, since `wait_for` wraps in an inner Task)
2. First drain exits `_do_flush`, releases the `async with self._lock` (1 yield)
3. Second drain acquires the lock (1 yield)
4. Second drain's `wait_for(store.append)` schedules its inner task (1 yield)
5. `_RecordingStore.append` records synchronously into `append_calls`

That's ~4 yields. Tests allot 2. On the unit test, the second drain task gets cancelled at event-loop teardown before reaching `store.append` — visible in the `asyncio.exceptions.CancelledError` traceback the failing test prints from `add_done_callback(lambda t: t.exception())` at `src/claude_agent_sdk/_internal/transcript_mirror_batcher.py:91`.

## Severity

Test-suite reliability / contributor experience. Production behavior is unaffected — eager flush works correctly given enough event-loop time. But running the full suite locally fails out of the box, which makes contribution awkward.

## Suggested fixes (in order of preference)

1. **Replace fixed-yield-count with deterministic wait.** Loop `await asyncio.sleep(0)` with a deadline until the expected condition holds:
   ```python
   async def _wait_until(predicate, timeout=1.0):
       deadline = time.monotonic() + timeout
       while not predicate():
           if time.monotonic() > deadline:
               raise AssertionError("timed out waiting")
           await asyncio.sleep(0)
   ```
2. **Expose a test-only `wait_quiescent()` on the batcher** that awaits `_flush_task` if set, and use it between enqueues in the tests.
3. **Make the test await `batcher.flush()` after each enqueue.** Defeats the original intent of "verify automatic eager flush triggers", so least preferred.

Happy to send a PR for option (1) if it's the preferred direction.

## Environment

- claude-agent-sdk-python @ `9aafd84` (current `main`)
- Python 3.11.14 (uv-managed venv)
- pytest 9.0.3, pytest-asyncio 1.3.0, anyio 4.13.0
- macOS 25.3.0 (Darwin arm64)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flaky tests: eager-flush transcript-mirror tests fail with 2 sleep(0) yields (test_transcript_mirror.py) #928

Summary

Affected tests

Reproducer

Root cause

Severity

Suggested fixes (in order of preference)

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Flaky tests: eager-flush transcript-mirror tests fail with 2 sleep(0) yields (test_transcript_mirror.py) #928

Description

Summary

Affected tests

Reproducer

Root cause

Severity

Suggested fixes (in order of preference)

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions