fix(anam): make AnamAvatarPublisher.start() idempotent#550
Conversation
`Agent._apply("start")` calls subsystem starts sequentially. Anam's
`start()` blocks for ~3-5 seconds (REST session creation + WebRTC
handshake + first video frame), which sits in the critical path of
`agent.join()` and makes the avatar appear ~5-7 seconds after the
user dials in.
Callers can move that latency out of the join path by kicking off
`avatar.start()` as a background task before `agent.join()` — but only
if `start()` is safe to call twice (once eagerly, once from the
framework's `_apply`).
Guard `start()` with an `asyncio.Lock` and a `_started` flag: the
first call runs `_connect()`, concurrent callers wait on the lock,
and a successful start short-circuits subsequent calls. A failed
connect leaves `_started=False` so retries still work.
📝 WalkthroughWalkthrough
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 373c5ffd-daca-4e03-b6c0-2a4031cce4e5
📒 Files selected for processing (2)
plugins/anam/tests/test_anam_plugin.pyplugins/anam/vision_agents/plugins/anam/anam_avatar_publisher.py
| async def fake_connect(): | ||
| nonlocal call_count | ||
| call_count += 1 | ||
|
|
||
| pub._connect = fake_connect # type: ignore[method-assign] | ||
|
|
||
| await pub.start() | ||
| await pub.start() | ||
| await pub.start() | ||
|
|
||
| assert call_count == 1 | ||
| assert pub._started is True |
There was a problem hiding this comment.
Replace call-path spying with behavior/state assertions.
These tests validate _connect() invocation count by overriding a private method and asserting call_count, which is call-path testing and effectively mocking. Rework them to assert externally observable behavior/state transitions of start() without spying on private method invocations.
As per coding guidelines: "Never mock in tests" and "ALWAYS test behavior, not calling a path. Assert on outputs and state, not method calls."
Also applies to: 119-133, 142-157
| nonlocal call_count | ||
| call_count += 1 | ||
|
|
||
| pub._connect = fake_connect # type: ignore[method-assign] |
There was a problem hiding this comment.
Remove # type: ignore[method-assign] from tests.
Please avoid # type: ignore here; use a typed test double approach (fixture/class) so the assignment is type-safe without suppressions.
As per coding guidelines: "Avoid # type: ignore comments."
Also applies to: 124-124, 148-148
Why
Agent._apply(\"start\")runs subsystem starts sequentially.AnamAvatarPublisher.start()blocks for ~3-5 seconds (Anam REST session creation + WebRTC handshake + first frame), so the avatar appears ~5-7 seconds after the user dials in — and that wait sits in the critical path ofagent.join().Callers want to kick
avatar.start()off as a background task beforeagent.join()so the Anam connection runs in parallel with SFU join. The framework then callsstart()again from_apply— that second call needs to be a no-op.The current
_connect()is not safe to call twice. It clears_connected/_session_readyevents infinally, so a second call deadlocks waiting on a cleared event. Concurrent callers also race into the exit-stackenter_async_context, double-opening the session.Changes
start()in anasyncio.Lockand a_startedflag. First successful call runs_connect(); subsequent calls short-circuit. Concurrent callers serialize on the lock._connectonce, failed connect leaves_started=Falseso retries still work._connect()itself is unchanged — the lock and flag live one level up instart(), which is the only public entrypoint.